smap - Online in the Cloud

This is the command smap that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

PROGRAM:

NAME


smap - graphically view information about Slurm jobs, partitions, and set configurations
parameters.

SYNOPSIS


smap [OPTIONS...]

DESCRIPTION


smap is used to graphically view job, partition and node information for a system running
Slurm. Note that information about nodes and partitions to which you lack access will
always be displayed to avoid obvious gaps in the output. This is equivalent to the --all
option of the sinfo and squeue commands.

OPTIONS


-c, --commandline
Print output to the commandline, no curses.

-D <option>, --display=<option>
sets the display mode for smap, showing relevant information about the selected
view and displaying a corresponding node chart. Note that unallocated nodes are
indicated by a '.' and nodes in the DOWN, DRAINED or FAIL state by a '#'. When the
--iterate=<seconds> option is also selected, you can switch displays by typing a
different letter from the list below (except 'c').

b Displays information about BlueGene partitions on the system

c Displays current BlueGene node states and allows users to configure
the system. Type 'quit' to end the configure mode. Type 'exit' to
end the configuration mode and exit smap.

j Displays information about jobs running on system.

r Display information about advanced reservations. While all current
and future reservations will be listed, only currently active
reservations will appear on the node map.

s Displays information about slurm partitions on the system

-h, --noheader
Do not print a header on the output.

-H, --show_hidden
Display hidden partitions and their jobs.

--help,
Print a message describing all smap options.

-i <seconds> , --iterate=<seconds>
Print the state on a periodic basis. Sleep for the indicated number of seconds
between reports. User can exit at anytime by typing 'q' or hitting the return key.
If user is in configure mode type 'exit' to exit program, 'quit' to exit configure
mode.

-I, --ionodes
Only show objects with these ionodes this support is only for bluegene systems.
This should be used inconjuction with the '-n' option. Only specify the ionode
number range here. Specify the node name with the '-n' option.

-M, --clusters=<string>
Clusters to issue commands to.

-n, --nodes
Only show objects with these nodes. If querying to the ionode level use the option
'-I' in conjunction with this option.

-Q, --quiet
Avoid printing error messages.

-R <RACK_MIDPLANE_ID/XYZ>, --resolve=<RACK_MIDPLANE_ID/XYZ>
Returns the XYZ coords for a Rack/Midplane id or vice-versa.

To get the XYZ coord for a Rack/Midplane id input -R R101 where 10 is the rack and
1 is the midplane.

To get the Rack/Midplane id from a XYZ coord input -R 101 where X=1 Y=1 Z=1 with no
leading 'R'.

--usage
Print a brief message listing the smap options.

-V , --version
Print version information and exit.

INTERACTIVE OPTIONS


When using smap in curses mode and when the --iterate=<seconds> option is also selected,
you can scroll through the different windows using the arrow keys. The up and down arrow
keys scroll the window containing the grid, and the left and right arrow keys scroll the
window containing the text information.

With the iterate option selected, you can use any of the options available to the -D
option listed above (except 'c') to change screens. You can also hide or make visible
hidden partitions by pressing 'h' at any moment.

OUTPUT FIELD DESCRIPTIONS


ACCESS_CONTROL
Identifies the users or bank accounts which can use this advanced reservation. A
prefix of "A:" indicates that the following account names may use this reservation.
A prefix of "U:" indicates that the following user names may use this reservation.

AVAIL Partition state: up or down.

BG_BLOCK
BlueGene Block Name.

CONN Connection Type: TORUS or MESH or SMALL (for small blocks).

END_TIME
The time when an advanced reservation ended.

ID Key to identify the nodes associated with this entity in the node chart.

MODE Mode Type: COPROCESS or VIRTUAL.

NAME Name of the job or advanced reservation.

NODELIST or BP_LIST
Names of nodes or base partitions associated with this configuration, partition or
reservation.

NODES Count of nodes or base partitions with this particular configuration.

PARTITION
Name of a partition. Note that the suffix "*" identifies the default partition.

ST State of a job in compact form. Possible states include: PD (pending), R (running),
S (suspended), CD (completed), CF (configuring), CG (completing), F (failed), TO
(timeout), and NF (node failure). See JOB STATE CODES section below for more
information.

START_TIME
The time when an advanced reservation started.

STATE State of the nodes. Possible states include: allocated, completing, down, drained,
draining, fail, failing, idle, and unknown plus their abbreviated forms: alloc,
comp, donw, drain, drng, fail, failg, idle, and unk respectively. Note that the
suffix "*" identifies nodes that are presently not responding. See NODE STATE
CODES section below for more information.

TIMELIMIT
Maximum time limit for any user job in days-hours:minutes:seconds. infinite is
used to identify jobs or partitions without a job time limit.

TOPOGRAPHY INFORMATION

The node chart is designed to indicate relative locations of the nodes. On most Linux
clusters this will represent a one-dimensional array of nodes. Larger clusters will
utilize multiple as needed with right side of one line being logically followed by the
left side of the next line.

On BlueGene systems, the node chart will indicate the three
dimensional topography of the system.
The X dimension will increase from left to right on a given line.
The Y dimension will increase in planes from bottom to top.
The Z dimension will increase within a plane from the back
line to the front line of a plane.
Note the example below:

a a a a b b d d
a a a a b b d d
a a a a b b c c
a a a a b b c c

a a a a b b d d
a a a a b b d d
a a a a b b c c
a a a a b b c c

a a a a . . d d
a a a a . . d d
a a a a . . e e Y
a a a a . . e e |
|
a a a a . . d d 0----X
a a a a . . d d /
a a a a . . . . /
a a a a . . . # Z

ID JOBID PARTITION BG_BLOCK USER NAME ST TIME NODES BP_LIST
a 12345 batch RMP0 joseph tst1 R 43:12 32k bgl[000x333]
b 12346 debug RMP1 chris sim3 R 12:34 8k bgl[420x533]
c 12350 debug RMP2 danny job3 R 0:12 4k bgl[622x733]
d 12356 debug RMP3 dan colu R 18:05 8k bgl[600x731]
e 12378 debug RMP4 joseph asx4 R 0:34 2k bgl[612x713]

CONFIGURATION INSTRUCTIONS


For Admin use. From this screen one can create a configuration file that is used to
partition and wire the system into usable blocks.

OUTPUT

BG_BLOCK
BlueGene Block Name.

CONN Connection Type: TORUS or MESH or SMALL (for small blocks).

ID Key to identify the nodes associated with this entity in the node chart.

MODE Mode Type: COPROCESS or VIRTUAL.

INPUT COMMANDS

resolve <RACK_MIDPLANE_ID/XYZ>
Returns the XYZ coords for a Rack/Midplane id or vice-versa.

To get the XYZ coord for a Rack/Midplane id input -R R101 where 10 is the
rack and 1 is the midplane.

To get the Rack/Midplane id from a XYZ coord input -R 101 where X=1 Y=1 Z=1
with no leading 'R'.

load <bluegene.conf file>
Load an already existent bluegene.conf file. This will verify and mapout a
bluegene.conf file. After loaded the configuration may be edited and saved
as a new file.

create <size> <options>
Submit request for partition creation. The size may be specified either as a
count of base partitions or specific dimensions in the X, Y and Z directions
separated by "x", for example "2x3x4". A variety of options may be
specified. Valid options are listed below. Note that the option and their
values are case insensitive (e.g. "MESH" and "mesh" are equivalent).

Start = XxYxZ
Identify where to start the partition. This is primarily for testing
purposes. For convenience one can only put the X coord or XxY will also
work. The default value is 0x0x0.

Connection = MESH | TORUS | SMALL
Identify how the nodes should be connected in network. The default value is
TORUS.

Small Equivalent to "Connection=Small". If a small connection is specified
the base partition chosen will create smaller partitions based on
options 32CNBlocks and 128CNBlocks respectively for a Bluegene L
system. 16CNBlocks, 64CNBlocks, and 256CNBlocks are also available
for Bluegene P systems. Keep in mind you must have enough ionodes to
make all these configurations possible.
These number will be altered to take up the entire base partition.
Size does not need to be specified with a small request, we will
always default to 1 base partition for allocation.

Mesh Equivalent to "Connection=Mesh".

Torus Equivalent to "Connection=Torus".

Rotation = TRUE | FALSE
Specifies that the geometry specified in the size parameter may be rotated
in space (e.g. the Y and Z dimensions may be switched). The default value
is FALSE.

Rotate Equivalent to "Rotation=true".

Elongation = TRUE | FALSE
If TRUE, permit the geometry specified in the size parameter to be altered
as needed to fit available resources. For example, an allocation of "4x2x1"
might be used to satisfy a size specification of "2x2x2". The default value
is FALSE.

Elongate
Equivalent to "Elongation=true".

copy <id> <count>
Submit request for partition to be copied. You may copy a specific
partition by specifying its id, by default the last configured partition is
copied. You may also specify a number of copies to be made. By default,
one copy is made.

delete <id>
Delete the specified block.

down <node_range>
Down a specific node or range of nodes. i.e. 000, 000-111 [000x111]

up <node_range>
Bring a specific node or range of nodes up. i.e. 000, 000-111 [000x111]

alldown
Set all nodes to down state.

allup Set all nodes to up state.

save <file_name>
Save the current configuration to a file. If no file_name is specified, the
configuration is written to a file named "bluegene.conf" in the current
working directory.

clear Clear all partitions created.

NODE STATE CODES


Node state codes are shortened as required for the field size. These node states may be
followed by a special character to identify state flags associated with the node. The
following node sufficies and states are used:

* The node is presently not responding and will not be allocated any new work. If the
node remains non-responsive, it will be placed in the DOWN state (except in the case
of COMPLETING, DRAINED, DRAINING, FAIL, FAILING nodes).

~ The node is presently in a power saving mode (typically running at reduced frequency).

# The node is presently being powered up or configured.

$ The node is currently in a reservation with a flag value of "maintenance" or is
scheduled to be rebooted.

ALLOCATED The node has been allocated to one or more jobs.

ALLOCATED+ The node is allocated to one or more active jobs plus one or more jobs are in
the process of COMPLETING.

COMPLETING All jobs associated with this node are in the process of COMPLETING. This
node state will be removed when all of the job's processes have terminated and
the Slurm epilog program (if any) has terminated. See the Epilog parameter
description in the slurm.conf man page for more information.

DOWN The node is unavailable for use. Slurm can automatically place nodes in this
state if some failure occurs. System administrators may also explicitly place
nodes in this state. If a node resumes normal operation, Slurm can
automatically return it to service. See the ReturnToService and SlurmdTimeout
parameter descriptions in the slurm.conf(5) man page for more information.

DRAINED The node is unavailable for use per system administrator request. See the
update node command in the scontrol(1) man page or the slurm.conf(5) man page
for more information.

DRAINING The node is currently executing a job, but will not be allocated to additional
jobs. The node state will be changed to state DRAINED when the last job on it
completes. Nodes enter this state per system administrator request. See the
update node command in the scontrol(1) man page or the slurm.conf(5) man page
for more information.

FAIL The node is expected to fail soon and is unavailable for use per system
administrator request. See the update node command in the scontrol(1) man
page or the slurm.conf(5) man page for more information.

FAILING The node is currently executing a job, but is expected to fail soon and is
unavailable for use per system administrator request. See the update node
command in the scontrol(1) man page or the slurm.conf(5) man page for more
information.

IDLE The node is not allocated to any jobs and is available for use.

MAINT The node is currently in a reservation with a flag value of "maintainence".

UNKNOWN The Slurm controller has just started and the node's state has not yet been
determined.

JOB STATE CODES


Jobs typically pass through several states in the course of their execution. The typical
states are PENDING, RUNNING, SUSPENDED, COMPLETING, and COMPLETED. An explanation of each
state follows.

BF BOOT_FAIL Job terminated due to launch failure, typically due to a hardware
failure (e.g. unable to boot the node or block and the job can not be
requeued).

CA CANCELLED Job was explicitly cancelled by the user or system administrator. The
job may or may not have been initiated.

CD COMPLETED Job has terminated all processes on all nodes with an exit code of
zero.

CG COMPLETING Job is in the process of completing. Some processes on some nodes may
still be active.

CF CONFIGURING Job has been allocated resources, but are waiting for them to become
ready for use (e.g. booting).

F FAILED Job terminated with non-zero exit code or other failure condition.

NF NODE_FAIL Job terminated due to failure of one or more allocated nodes.

PD PENDING Job is awaiting resource allocation.

PR PREEMPTED Job terminated due to preemption.

R RUNNING Job currently has an allocation.

SE SPECIAL_EXIT The job was requeued in a special state. This state can be set by
users, typically in EpilogSlurmctld, if the job has terminated with a
particular exit value.

ST STOPPED Job has an allocation, but execution has been stopped with SIGSTOP
signal. CPUS have been retained by this job.

S SUSPENDED Job has an allocation, but execution has been suspended and CPUs have
been released for other jobs.

TO TIMEOUT Job terminated upon reaching its time limit.

ENVIRONMENT VARIABLES


The following environment variables can be used to override settings compiled into smap.

SLURM_CONF The location of the Slurm configuration file.

COPYING


Copyright (C) 2004-2007 The Regents of the University of California. Produced at Lawrence
Livermore National Laboratory (cf, DISCLAIMER).
Copyright (C) 2008-2009 Lawrence Livermore National Security.
Copyright (C) 2010-2013 SchedMD LLC.

This file is part of Slurm, a resource management program. For details, see
<http://slurm.schedmd.com/>.

Slurm is free software; you can redistribute it and/or modify it under the terms of the
GNU General Public License as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.

Slurm is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

Use smap online using onworks.net services



Latest Linux & Windows online programs