Skip to content

Check if jobs are running on multiple GPU machines

Notifications You must be signed in to change notification settings

cromz22/gpustats

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

gpustats

A python script to check the current status of multiple GPU machines.

Preliminaries

When running tasks using multiple GPU machines, it is hard to remember which ones you are already using and which ones not. This script enables you to connect asynchronically to multiple GPU machines and get the number of used GPU cards for each machine.

To use this script, you need to have:

  • Access to (multiple) GPU machines where the nvidia-smi command can be run
  • All machines be accessible through the ssh command with public key authentication from the client

If you do not have a key pair to connect to the machines, you can create one in the following way:

  1. Create an ssh key pair at the client machine

    cd ~/.ssh
    ssh-keygen -t ed25519
    
  2. Copy the created public key to the host machine (e.g. using rsync), and add the key to ~/.ssh/authorized_keys

    cat id_ed25519.pub >> authorized_keys
    
  3. Check the permission of the files inside the ~/.ssh directory:

    • .ssh: 700
    • private key: 600
    • public key: 644
    • authorized_keys: 600

Requirements

  • python >= 3.6
  • paramiko

Usage

  1. Change the username (and ssh_private_key, if necessary) at gpustats.py line 11

  2. Define the dictionary of hostnames and the number of GPU cards of the host machines at gpustats.py line 38

  3. Execute the script

    python3 gpustats.py
    

    The output will be something like this:

    host1   :  4 /  8 GPU cards are used   # in yellow
    host2   :  0 /  4 GPU cards are used   # in green
    host3   :  4 /  4 GPU cards are used   # in red
    

About

Check if jobs are running on multiple GPU machines

Resources

Stars

Watchers

Forks

Languages