Test SLURM

Q: Where is slurm.conf?

A: in /etc/slurm-llnl/slurm.conf

Q: Why can't I run 2 "srun" on the same node at the same time?

A: We should use "--mem-per-cup=<sth in MB>"

Q: How to get resource usage for a job?

A: Let me give some examples. First run the job:

srun -w minion01 -p minion_superfast --ntasks=1 --nodes=1 --cpus-per-task=1 --mem-per-cpu=10 ping www.google.com

note: --mem-per-cpu=10 means 10 MB

Tasks and Nodes allocated to the job:

pduan@gru ~ % squeue -n ping  --Format=numnodes
NODES               
1                   
pduan@gru ~ % squeue -n ping  --Format=numtasks
TASKS               
1  

 CPU number used by the job:

Number of CPUs requested by the job or allocated to it if already running. As a job is completing this number will reflect the current number of CPUs allocated. (Valid for jobs only)

pduan@gru ~ % squeue -n ping  --format="%C"
CPUS
2

or

pduan@gru ~ % squeue -n ping  --Format=numcpus
CPUS                
2  

Min memories requested by the job:

Minimum size of memory (in MB) requested by the job. (Valid for jobs only)

pduan@gru ~ % squeue -n ping  --format="%m"
MIN_MEMORY
10M

or

pduan@gru ~ % squeue -n ping  --Format=minmemory
MIN_MEMORY          
10M 

Tracble resource usage:

Print the trackable resources allocated to the job.

pduan@gru ~ % squeue -n ping  --Format=tres   
TRES                
cpu=2,mem=20M,node=1

Note: I found that "--cpus-per-task=<>" makes no difference because when I remove "--cpus-per-task=1" for the above job, the resource usage shows the same

Q: How to get resource usage for a node?

A: When the above srun job is running, let's use sinfo to get such statistics.

CPUs a node owns:

pduan@gru ~ % sinfo -n minion01 --format=%c
CPUS
40

CPUs a node owns in the format "allocated/idle/other/total":

pduan@gru ~ % sinfo -n minion01 --format=%C
CPUS(A/I/O/T)
2/38/0/40

Size of temporary disk space per node in megabytes:

pduan@gru ~ % sinfo -n minion01 --format=%d
TMP_DISK
0

Free memory of a node:

pduan@gru ~ % sinfo -n minion01 --format=%e
FREE_MEM
1112

Size of memory per node in megabytes:

SLURM imposes a memory limit on each job. By default, it is deliberately relatively small — 128 MB per node.

pduan@gru ~ % sinfo -n minion01 --format=%m
MEMORY
128

How many memories have been allocated in MB? 

pduan@gru ~ % sinfo -n minion01 --Format=allocmem
ALLOCMEM            
20

why 20? because the job took 2 CPUs, and we set --mem-per-cpu=10MB

X:Y:Z

pduan@gru ~ % sinfo -n minion01 --format=%X
SOCKETS
2
pduan@gru ~ % sinfo -n minion01 --format=%Y
CORES
10
pduan@gru ~ % sinfo -n minion01 --format=%Z
THREADS
2

or

pduan@gru ~ % sinfo -n minion01 --format=%z
S:C:T
2:10:2
原文地址:https://www.cnblogs.com/chaseblack/p/10274868.html