check GPU on an integrated server
Using nvidia-smi
on Integrated Servers
When logged into an integrated server and attempting to use the nvidia-smi
command directly in the command line, you might encounter the following error:
“bash: nvidia-smi: command not found”
This error occurs because, although you are logged into the server, you have not been allocated a GPU. On an integrated server, simply logging in does not grant access to GPU resources. To properly access and utilize GPUs, you must submit a specific job that requests a GPU, and execute the nvidia-smi
command within the context of that job. The necessary GPU information can then be obtained from the job’s output file.
Specific Steps to Fetch GPU Information
To fetch GPU information via nvidia-smi
on an integrated server, follow these steps:
- Create a Shell Script: On the server, create a shell script file named
check_gpu_status.sh
. This script will request a GPU and execute thenvidia-smi
command. Here’s a sample script:
1 |
|
This script configures the job to run on a GPU queue with one GPU allocated in exclusive mode. It captures the output and errors in respective files, facilitating easy review and troubleshooting.
- Submit the Script: To submit this job to the server’s job scheduler, use the
bsub
command in the terminal:
1 | bsub < check_gpu_status.sh |
This command submits the script to the job queue specified in the script (gpu queue), and the job scheduler handles the allocation of GPU resources and execution of the script.
However, if you could access a gpu node, just type ‘nvidia-smi’ on it.
- Title: check GPU on an integrated server
- Author: wy
- Created at : 2024-07-14 00:51:17
- Updated at : 2024-07-19 18:39:16
- Link: https://yuuee-www.github.io/blog/2024/07/14/check-GPU-on-an-integrated-server/
- License: This work is licensed under CC BY-NC-SA 4.0.