EnglishFrenchSpanish

OnWorks favicon

paexec - Online in the Cloud

Run paexec in OnWorks free hosting provider over Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

This is the command paexec that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

PROGRAM:

NAME


paexec - parallel executor, distribute tasks over network or CPUs

SYNOPSIS


paexec [options]

paexec -C [options] command [args...]

DESCRIPTION


Suppose you have a long list of AUTONOMOUS tasks that need to be done, for example, you
want to convert thousands of .wav audio files to .ogg format. Also suppose that multiple
CPUs are available, e.g. multi-CPU SMP system (or modern multikernel CPU) or a cluster
consisting of individual computers connected to the network or internet. paexec can
efficiently do this work, that is paexec efficiently distributes different tasks to
different processors (or computers), receives the results of processing from them and
sends these results to stdout.

There are several notions that should be defined: task, command, transport, node.

Tasks are read by paexec from stdin and are represented as one line of text, i.e. one
input line - one task.

node identifier - remote computer or CPU identifier, for example CPU ordinal number or
computer's DNS name like node12.cluster.company.com.

Command - user's program that reads one-line task from stdin and sends multiline result to
stdout where an empty line means JOB_IS_DONE__I_AM_READY_FOR_THE_NEXT_ONE. After sending
the empty line to stdout, stdout MUST BE FLUSHED. Remember that empty line MUST NOT
appears in general result lines. Otherwise paexec may hang because of deadlock.

Transport - special program that helps to run command on the node. It takes the node
identifier as its first argument and command with its arguments as the rest. For example,
is '/usr/bin/ssh'. Both transport and command may be specified with their arguments, i.e.
'/usr/bin/ssh -x' is allowed as a transport program.

Algorithm. Commands are run on each node with a help of transport program. Then, tasks are
read from stdin line-by-line (one task per line) and are sent to free node (exactly one
task per node at a time). At the same time result lines are read from command's stdout and
are output to paexec's stdout. When an empty line is obtained from the node (this means
that node finished its job) it is marked as free and becomes ready for the next job. These
steps repeat until the end of stdin is reached and all nodes finish their job.

More formally (to better understand how paexec works):

run_command_on_each_node
mark_all_nodes_as_free
while not(end_of_stdin) or not(all_nodes_are_free)
while there_is_free_node/i and not(end_of_stdin)
task = read_task_from_stdin
send_task_to_node(task, i)
mark_node_as_busy(i)
end
while result_line_from_node_is_available/i
result = read_result_line_from_node(i)
send_line_to_stdout(result)
if is_empty_line(result)
# end of job
mark_node_as_free(i)
end
end
end
close_command_on_each_node

Note that command that does your actual task is run once (per node), it is not restarted
for every task.

Also note that output contains result lines (obtained from different nodes) in the mixed
order. That is, the first line of the output may contain a result line obtain from the
first node, the second line of output - from the second node, but the third output line
may contain result line from the first node again. It is also not guaranteed that the
first line of output will be from the first node or from the first task. All result lines
are output as soon as they are read by paexec, i.e as soon as they are ready to be output.
paexec works this way for the efficiency reasons. You can play with -l, -r and -p options
to see what happens.

OPTIONS


-h Display help information.

-V Display version information.

-c command
Command with its arguments.

-C Command with its arguments are specified following the options.

-t transport
Transport command

-n +number
A number of commands to run in parallel.

-n nodes
List of nodes separated by space character. The first character must be
alphanumeric, `_' or `/'. All other characters are reserved for future extensions.

-n :filename
Filename containing list of nodes, one per line.

-x Run command specificed by -c for each task. Its stdout is passed to paexec. If both
"-x" and "-g" are specified, task is considered failed if command's exit status is
non-zero.

-X Implies -x and ignore calculator's stdout.

-r Include node identifier or node number (0-based) to the output, i.e. id/number of
node that produces this particular output line. This identifier or number appears
before line number if -l is also applied. Space character is used as a separator.

-l Include a 0-based task number (input line number) to the output, i.e. line number
from which this particular output line was produced. It appears before pid if -p is
also applied. Space character is used as a separator.

-p Include pid of paexec's subprocess that communicates with node+command to the
output. Pid prepends the actual result line. Space character is used as a separator.

-e When end-of-task marker is obtained from node, an empty line is printed to stdout.
This option may be useful together with -l and/or -r.

-E Imply -e and flushes stdout.

-d Turn on a debugging mode (for debugging purposes only)

-i Copy input lines (i.e. tasks) to stdout.

-I Imply -i and flushes stdout.

-s|-g Orgraph of tasks (partially ordered set) is read from stdin.

Instead of autonomous tasks, graph of the tasks is read from stdin. In this mode
every task can either FAIL or SUCCEED. As always an empty line output by command
means end of task. The line before it shows an EXIT STATUS of the task. The word
"failure" means failure, "success" - success and "fatal" means that the current task
is reassigned to another node (and restarted, of course) (see option -z). See
examples/1_div_x/cmd for the sample. An input line (paexec's stdin) should contain
either single task without spaces inside or two tasks separated by single space
character, e.g. task1<SPC>task2. task1<SPC>task2 line means that task1 must be done
before task2 and it is mandatory, that is if task1 fail all dependent tasks
(including task2) are also failed recursively. Tasks having dependencies are
started only after all dependencies are succeeded. When a task succeeds paexec
outputs "success" word just before end_of_task marker (see -e or -E), otherwise
"failure" word is output followed by a list of tasks failed because of it.

Samples:

tasks (examples/make_package/tasks file)

textproc/dictem
devel/autoconf wip/libmaa
devel/gmake wip/libmaa
wip/libmaa wip/dict-server
wip/libmaa wip/dict-client
devel/m4 wip/dict-server
devel/byacc wip/dict-server
devel/byacc wip/dict-client
devel/flex wip/dict-server
devel/flex wip/dict-client
devel/glib2
devel/libjudy

command (examples/make_package/cmd__flex)

#!/usr/bin/awk -f
{
print $0
if ($0 == "devel/flex")
print "failure"
else
print "success"

print "" # end of task marker
fflush()
}

output of "paexec -s -l -c cmd__flex -n +10 \
< tasks"

3 devel/autoconf
3 success
4 devel/gmake
4 success
7 devel/m4
7 success
8 devel/byacc
8 success
9 devel/flex
9 failure
9 devel/flex wip/dict-server wip/dict-client
10 devel/glib2
10 success
11 devel/libjudy
11 success
1 textproc/dictem
1 success
2 wip/libmaa
2 success

-z If applied, read/write(2) operations from/to nodes becomes not critical. In case
paexec has lost connection to the node, it will reassign failed task to another node
and, if -s applied, will output "fatal" string to stdout ("success" + "failure" +
"fatal"). This makes paexec resistant to the I/O errors, as a result you can create
paexec clusters even over network consisting of unreliable hosts (Internet?). Failed
hosts are marked as such and will not be used during the current run of paexec.

-Z timeout
When -z applied, if a command fails, appropriate node is marked as broken and is
excluded from the following task distribution. But if -Z applied, every timeout
seconds an attempt to rerun a comand on a failed node is made. -Z implies -z. This
option makes possible to organize clusters over unreliable networks/hardware.

-w If -Z option were applied, paexec exits with error if ALL nodes failed. With -w it
will not exit and will wait for restoring nodes.

-m s=success
-m f=failure
-m F=fatal
-m t=eot
-m d=delimiter
Set alternative string for 'success', 'failure', 'fatal', '' (end of task) and ' '
(task delimiter character). An empty string for 'fatal' means it will not be output
to stdout in case of fatal error.

-W num
When multiple machines or CPUs are used for tasks processing, it makes sense to
start "heavier" tasks as soon as possible in order to minimize total calculatiion
time. If -W is specified, special weight is assigned to each tasks which is used
for reordering tasks. If num is 0, weights themselves are used for reordering
tasks. The bigger weight is, the more priority of the task is. If num is 1, the
total weight of task is a sum of its own weight (specified on input) and weights of
all tasks depending on it directly or indirectly. If num is 2, the total weight of
task is a maximum value of task's own weight and weights of all tasks depending on
it directly or indirectly. Weights are specified with a help of "weight:" keyword.
If weight is not specified, it defaults to 1. The following is the example for
input graph of tasks with weights.

weight: gtk2 30
weight: glib2 20
gtk2 firefox
weight: firefox 200
glib2 gtk2
weight: qt4 200
weight: kcachegrind 2
qt4 kcachegrind
qt4 djview4
tiff djview4
png djview4
weight: twm 1
weight: gqview 4

-y If applied, the magic string is used as an end-of-task marker instead of empty line.
It is unlikely that this line appears on calculator's output. This option has
higher priority than PAEXEC_EOT environment variable.

EXAMPLES


1.
paexec -t '/usr/bin/ssh -x' -n 'host1 host2 host3' \
-le -g -c calculate-me < tasks.txt |
paexec_reorder -Mf -Sl

2.
ls -1 *.wav | paexec -x -n +4 -c 'oggenc -Q'

3.
ls -1 *.wav | paexec -xCil -n+4 flac -f --silent

4.
{ uname -s; uname -r; uname -m; } |
paexec -x -lp -n+2 -c banner |
paexec_reorder -l

For more examples see paexec.pdf and examples/ subdirectory in the distribution.

NOTES


select(2) system call and non-blocking read(2) are used to read result lines from nodes.

At the moment blocking write(2) is used to send task to the node. This may slow down an
entire processing if tasks are too big. So, it is recommended to use shorter tasks, for
example, filename or URI (several tens of bytes in size) instead of multi-megabyte
content. Though this may be fixed in the future.

Original paexec tarball contains a number of sample of use in presentation/paexec.pdf
file. After installation you can find this file under share/doc/paexec/paexec.pdf or
nearby.

ENVIRONMENT


PAEXEC_BUFSIZE
Overrides the compile time initial size for internal buffers used to store tasks and
the result lines. Versions of paexec prior to 0.9.0 used this value as a maximum
buffer size. Now internal buffers are resized automatically. If unsure, do not set
PAEXEC_BUFSIZE variable. See the default value in Makefile.

PAEXEC_ENV
A list of variables passed to calculator.

PAEXEC_EOT
This variable sets the end-of-task marker which is an empty line by default. Also,
through this variable an end-of-task marker is passed to all calculators.

PAEXEC_NODES
Unless option -n was applied, this variables specifies the nodes.

PAEXEC_TRANSPORT
Unless option -t was applied, this variables specifies the transport.

Use paexec online using onworks.net services


Free Servers & Workstations

Download Windows & Linux apps

Linux commands

Ad