likwid-pin - Online in the Cloud

Run likwid-pin in OnWorks free hosting provider over Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

This is the command likwid-pin that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

Run in Ubuntu Run in Fedora Run in Windows Sim Run in MACOS Sim

PROGRAM:

NAME

likwid-pin - pin a sequential or threaded application to dedicated processors

SYNOPSIS

likwid-pin [-vhqipS] [-c <core_list>] [-s <skip_mask>] [-d <delimiter>]

DESCRIPTION

likwid-pin is a command line application to pin a sequential or multithreaded applications
to dedicated processors. It can be used as replacement for taskset(1). Opposite to
taskset no affinity mask but single processors are specified. For multithreaded
applications based on the pthread library the pthread_create library call is overloaded
through LD_PRELOAD and each created thread is pinned to a dedicated processor as specified
in core_list

Per default every generated thread is pinned to the core in the order of calls to
pthread_create. It is possible to skip single threads using -s commandline option.

For OpenMP implementations gcc and icc compilers are explicitly supported. Others may also
work. likwid-pin sets the environment variable OMP_NUM_THREADS for you if not already
present. It will set as many threads as present in the pin expression. Be aware that
with pthreads the parent thread is always pinned. If you create for example 4 threads with
pthread_create and do not use the parent process as worker you still have to provide
num_threads+1 processor ids.

likwid-pin supports different numberings for pinning. Per default physical numbering of
the cores is used. This is the numbering also likwid-topology(1) reports. But also
logical numbering inside the node or the sockets can be used. If using with a N (e.g. -c
N:0-6) the cores are logical numbered over the whole node. Physical cores come first. If
a system e.g. has 8 cores with 16 SMT threads with -c N:0-7 you get all physical cores.
If you specify -c N:0-15 you get all physical cores and all SMT threads. With S you can
specify logical numberings inside sockets, again physical cores come first. You can mix
different domains separated with @. E.g. -c S0:0-3@S2:2-3 you pin thread 0-3 to logical
cores 0-3 on socket 0 and threads 4-5 on logical cores 2-3 on socket 2.

For applications where first touch policy on numa systems cannot be employed likwid-pin
can be used to turn on interleave memory placement. This can significantly speed up the
performance of memory bound multithreaded codes. All numa nodes the user pinned threads to
are used for interleaving.

OPTIONS

-v prints version information to standard output, then exits.

-h prints a help message to standard output, then exits.

-c <processor_list> OR <thread_expression> OR <scatter policy>
specify a numerical list of processors. The list may contain multiple items,
separated by comma, and ranges. For example 0,3,9-11. You can also use logical
numberings, either within a node (N), a socket (S<id>) or a numa domain (M<id>).
likwid-pin also supports logical pinning within a cpuset with a L prefix. If you
ommit this option likwid-pin will pin the threads to the processors on the node
with physical cores first. See below for details on using a thread expression or
scatter policy

-s <skip_mask>
Specify skip mask as HEX number. For each set bit the corresponding thread is
skipped.

-S All ccNUMA memory domains belonging to the specified threadlist will be cleaned
before the run. Can solve file buffer cache problems on Linux.

-p prints the available thread domains for logical pinning. If used in combination
with -c, the physical processor IDs are printed to stdout.

-i set numa memory policy to interleave spanning all numa nodes involved in pinning

-q silent execution without output

-d <delimiter>
set delimiter used to output the physical processor list (-p & -c)

EXAMPLE

1. For standard pthread application:

likwid-pin -c 0,2,4-6 ./myApp

The parent process is pinned to processor 0. Thread 0 to processor 2, thread 1 to
processor 4, thread 2 to processor 5 and thread 3 to processor 6. If more threads are
created than specified in the processor list, these threads are pinned to processor 0 as
fallback.

2. For gcc OpenMP as many ids must be specified in processor list as there are threads:

OMP_NUM_THREADS=4; likwid-pin -c 0,2,1,3 ./myApp

3. Full control over the pinning can be achieved by specifying a skip mask. For example
the following command skips the pinning of thread 1:

OMP_NUM_THREADS=4; likwid-pin -s 0x1 -c 0,2,1,3 ./myApp

4. The -c switch supports the definition of threads in a specific affinity domain like
NUMA node or cache group. The available affinity domains can be retrieved with the -p
switch and no further option on the commandline. The common affinity domains are N
(whole Node), SX (socket X), CX (cache group X) and MX (memory group X). Multiple
affinity domains can be set separated by @. In order to pin 2 threads on each socket
of a 2-socket system:

OMP_NUM_THREADS=4; likwid-pin -c S0:0-1@S1:0-1 ./myApp

5. Another argument definition of the -c switch allows the threads to be pinned according
to an expression like E:N:4:1:2. The syntax is E:<thread domain>:<number of
threads>(:<chunk size>:<stride>). The example pins 8 threads with 2 SMT threads per
core on a SMT 4 machine:

OMP_NUM_THREADS=4; likwid-pin -c E:N:8:2:4 ./myApp

6. The last alternative for the -c switch is the automatic scattering of threads on
affinity domains. For example to scatter the threads over all memory domains in a
system:

OMP_NUM_THREADS=4; likwid-pin -c M:scatter ./myApp

Use likwid-pin online using onworks.net services