

OnWorks favicon

likwid-bench - Online in the Cloud

Run likwid-bench in OnWorks free hosting provider over Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

This is the command likwid-bench that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator



likwid-bench - low-level benchmark suite and microbenchmarking framework


likwid-bench [-hap] [-l <testname>] [-i <iterations>] [-g <number_of_workgroups>] [-t
<testname>] [-w <workgroup_expression>]


likwid-bench is a benchmark suite for low-level (assembly) benchmarks to measure
bandwidths and instruction throughput for specific instruction code on x86 systems. The
currently included benchmark codes include common data access patterns like load and store
but also calculations like vector triad and sum. likwid-bench includes architecture
specific benchmarks for x86, x86_64 and x86 for Intel Xeon Phi coprocessors. The
performance values can either be calculated by likwid-bench or measured using performance
counters by using. likwid-perfctr as a wrapper to likwid-bench. This requires to build
likwid-bench. with Instrumentation which can be enabled in config.mk.


-h prints a help message to standard output, then exits.

-a list available benchmark codes for the current system.

-p list available thread domains.

-l <testname>
list properties of a benchmark code.

-i <iterations>
number of iterations to perform inside the benchmark code.

-t <testname>
Name of the benchmark code to run (mandatory).

-g <number_of_workgroups>
specify the number of workgroups to perform the benchmark code on (mandatory).

-w <workgroup_expression>
Specify the affinity domain, thread count and data set size for the current
benchmarking run (mandatory).


<thread_domain>:<size> [:<num_threads>[:<chunk_size>:<stride>]] [-<streamId>:<domain_id>]
with size in kB, MB or GB. Where thread domain is where threads are placed. Size is the
total data set size for the benchmark. num_threads specifies how many threads are used.
Threads are always placed using a compact policy in likwid-bench. This means that per
default all SMT threads are used. Optionally similar a the expression based syntax in
likwid-pin a chunk size and stride can be provided. Optionally for every stream means
array the placement can be controlled. Per default all arrays are placed in the same
thread domain the threads are running in. To place the data in a different domain for
every stream of a benchmark case (the total number of streams can be aquired by the -l
option) the domain to place the data in can be specified. Multiple streams are comma
separated. Either the placement is provided or all streams have to be explicitly placed.
Please refer to the Wiki pages on http://code.google.com/p/likwid/wiki/LikwidBench for
further details and examples on usage.


1. Run the copy benchmark with 1000 iterations on socket 0 with a total data set size of

likwid-bench -t copy -i 1000 -g 1 -w S0:100kB

Since no num_thread is given in the workload expression, each core of socket 0 gets one
thread. The workload is split up between all threads.

2. Run the triad benchmark code with 100 iterations with 2 threads on the socket 0 and a
data size of 1 GB.

likwid-bench -t triad -i 100 -g 1 -w S0:1GB:2:1:2

Assuming socket 0 has 4 SMT threads, one thread is assigned to each physical core of
socket 0.

3. Run the update benchmark with 1000 iterations on socket 0 with a workload of 100kB and
on socket 1 with the same workload.

likwid-bench -t update -i 1000 -g 2 -w S0:100kB -w S1:100kB

The results of both workgroups are combinded for the output. Hence the workload in each
workgroup expression should have the same size.

4. Run the copy benchmark but measure the memory traffic with likwid-perfctr. The option
INSTRUMENT_BENCH in config.mk needs to be true at compile time to use that feature.

likwid-perfctr -C E:S0:4 -g MEM -m likwid-bench -t update -i 1000 -g 1 -w S0:100kB

likwid-perfctr will configure and start the performance counters on socket 0 with 4
threads prior to the execution of likwid-bench. The performance counters are read right
before and after running the benchmarking code to minimize the interferences of the

5. Run the copy benchmark and place the data on other socket

likwid-bench -t copy -i 50 -g 1 -w S0:1GB:10:1:2-0:S1,1:S1

Stream id 0 and 1 are placed in thread domains S1, which is socket 1. This can be verified
as the initialization threads output where they are running.

Use likwid-bench online using onworks.net services

Free Servers & Workstations

Download Windows & Linux apps

Linux commands
