check GPU on an integrated server

wy Lv3

Using nvidia-smi on Integrated Servers

When logged into an integrated server and attempting to use the nvidia-smi command directly in the command line, you might encounter the following error:

“bash: nvidia-smi: command not found”

This error occurs because, although you are logged into the server, you have not been allocated a GPU. On an integrated server, simply logging in does not grant access to GPU resources. To properly access and utilize GPUs, you must submit a specific job that requests a GPU, and execute the nvidia-smi command within the context of that job. The necessary GPU information can then be obtained from the job’s output file.

Specific Steps to Fetch GPU Information

To fetch GPU information via nvidia-smi on an integrated server, follow these steps:

  1. Create a Shell Script: On the server, create a shell script file named check_gpu_status.sh. This script will request a GPU and execute the nvidia-smi command. Here’s a sample script:
1
2
3
4
5
6
7
8
9
#!/bin/bash
#BSUB -J check_gpu_status # Job name is check_gpu_status
#BSUB -o gpu_status_output.txt # Standard output redirected to gpu_status_output.txt
#BSUB -e gpu_status_error.txt # Standard error redirected to gpu_status_error.txt
#BSUB -q gpu # Submit to GPU queue
#BSUB -gpu "num=1:mode=exclusive_process" # Request 1 GPU in exclusive mode

# Execute the nvidia-smi command
nvidia-smi

This script configures the job to run on a GPU queue with one GPU allocated in exclusive mode. It captures the output and errors in respective files, facilitating easy review and troubleshooting.

  1. Submit the Script: To submit this job to the server’s job scheduler, use the bsub command in the terminal:
1
bsub < check_gpu_status.sh

This command submits the script to the job queue specified in the script (gpu queue), and the job scheduler handles the allocation of GPU resources and execution of the script.

However, if you could access a gpu node, just type ‘nvidia-smi’ on it.

  • Title: check GPU on an integrated server
  • Author: wy
  • Created at : 2024-07-14 00:51:17
  • Updated at : 2024-07-19 18:39:16
  • Link: https://yuuee-www.github.io/blog/2024/07/14/check-GPU-on-an-integrated-server/
  • License: This work is licensed under CC BY-NC-SA 4.0.
Comments