perl_performance - Online in the Cloud

Run perl_performance in OnWorks free hosting provider over Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

This is the command perl_performance that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

Run in Ubuntu Run in Fedora Run in Windows Sim Run in MACOS Sim

PROGRAM:

NAME

makepp_perl_performance -- How to make Perl faster

DESCRIPTION

The biggest tuning gains will usually come from algorithmic improvements. But while these
can be hard to find, there is also a lot you can do mechanically.

Makepp is a big heavy-duty program, where speed is a must. A lot of effort has been put
into optimizing it. This documents some general things we have found. Currently the
concrete tests leading to these results have mostly been discarded, but I plan to
gradually add them.

If you are looking at how to speedup makepp (beyond the Perl programming you put into your
makefiles), look at makepp_speedup. This page is completely independent of makepp, only
intended to make our results available to the Perl community. Some of these measures are
common sence, but you sometimes forget them. Others need measuring to believe them, so:

Measure, don't guess
Profile your program
Makepp comes with a module profiler.pm in its cvs repository. This is first run as a
program on a copy(!) of your code, which it instruments. Then you run your copy and
get configurable statistics per interval and a final total on the most frequently
called functions and on the most time spent in functions (minus subcalls). Both are
provided absolutely and in caller-callee pairs. (Documentation within.)

This tells you which functions are the most promising candidates for tuning. It also
gives you a hint where your algorithm might be wrong, either within surprisingly
expensive functions, or through surprisingly frequent calls.

Time your solution
Either one of

perl -Mstrict -MBenchmark -we 'my <initialization>; timethis -10, sub { <code> }'
time perl -Mstrict -we 'my <initialization>; for( 0..999_999 ) { <code> }'

when run on different variants of code you can think of, can give surprising results.
Even small modifications can matter a lot. Be careful not to "measure" code that can
get optimized away, because you discard the result, or because it depends on
constants.

Depending on your system, this will tell you in kb how fat Perl got:

perl -Mstrict -we '<build huge data>; system "ps -ovsz $$"'

Below we only show the code within the "-e" option as one liners.

Regexps
Use simple regexps
Several matches combined with "||" are faster than a big one with "|".

Use precompiled regexps
Instead of interpolating strings into regexps (except if the string will never change
and you use the "o" modifier), precompile the regexp with "qr//" and interpolate that.

Use (?:...)
If you don't use what the grouping matches, don't make Perl save it with "(...)".

Anchor at beginning of string
Don't make Perl look through your whole string, if you want a match only at the
beginning.

Don't anchor at end after greedy
If you have a "*" or "+" that will match till the end of string, don't put a "$" after
it.

Use tr///
This is twice as fast as s/// when it is applicable.

Functions
Avoid object orientation
Dynamic method lookup is slower in any language, and Perl, being loosely typed, can
never do it at compile time. Don't use it, unless you need the benefit of
polymorphism through inheritance. The following call methods are ordered from slowest
to fastest:

$o->method( ... ); # searched in class of $o and its @ISA
Class::method( $o, ... ); # static function, new stack
Class::method $o, ...; # static function, new stack, checked at compile time
&Class::method; # static function, reuse stack

This last form always possible if method (or normal function) takes no arguments. If
it does take arguments, watch out that you don't inadvertently supply any optional
ones! If you use this form a lot, it is best to keep track of the minimum and maximum
number of arguments each function can take. Reusing a stack with extra arguments is
no problem, they'll get ignored.

Don't modify stack
The following sin is frequently found even in the Perl doc:

my $self = shift;

Unless you have a pertinent reason for this, use this:

my( $self, $x, $y, @z ) = @_;

Use few functions and modules
Every function (and that alas includes constants) takes up over 1kb for it's mere
existence. With each module requiring other ones, most of which you never need, that
can add up. Don't pull in a big module, just to replace two lines of Perl code with a
single more elegant looking function call.

If you have a function only called in one place, and the two combined would still be
reasonably short, merge them with due comments.

Don't have one function only call another with the same arguments. Alias it instead:

*alias = \&function;

Group calls to print
Individual calls to print, or print with separate arguments are very expensive. Build
up the string in memory and print it in one go. If you can accumulate over 3kb,
syswrite is more efficient.

perl -MBenchmark -we 'timethis -10, sub { print STDERR $_ for 1..5 }' 2>/dev/null
perl -MBenchmark -we 'timethis -10, sub { print STDERR 1..5 }' 2>/dev/null
perl -MBenchmark -we 'timethis -10, sub { my $str = ""; $str .= $_ for 1..5; print STDERR $str }' 2>/dev/null

Miscellaneous
Avoid hashes
Perl becomes quite slow with many small hashes. If you don't need them, use something
else. Object orientation works just as well on an array, except that the members
can't be accessed by name. But you can use numeric constants to name the members.
For the sake of comparability we use plain numeric keys here:

my $i = 0; our %a = map +($i++, $_), "a".."j"; timethis -10, sub { $b = $a{int rand 10} }
our @a = "a".."j"; timethis -10, sub { $b = $a[rand 10] }

my $i = 0; my %a = map +($i++, $_), "a".."j"; timethis -10, sub { $b = $a{int rand 10} }
my @a = "a".."j"; timethis -10, sub { $b = $a[rand 10] }

Use int keys for ref sets
When you need a unique reference representation, e.g. for set ops with hashes, using
the integer form of refs is three times as fast as using the pretty printed default
string representation. Caveat: the HP/UX 64bitall variant of Perl, at least up to
5.8.8 has a buggy "int" function, where this doesn't work reliably. There a hex form
is still a fair bit faster than default strings. Actually this can even be faster
than stringified int, depending on the version or maybe configuration of perl. As of
5.8.1 there is also the equivalent but hopefully reliable Scalar::Util::refaddr

my @list = map { bless { $_ => 1 }, "someclass" } 0..9; my( %a, %b );
timethis -10, sub { $a{$_} = 1 for @list };
timethis -10, sub { $b{int()} = 1 for @list };
timethis -10, sub { $b{sprintf '%x', $_} = 1 for @list };
timethis -10, sub { $b{refaddr $_} = 1 for @list };

There is also sprintf '%p' which supposedly outputs a pointer, but depending on which
expression leads to the same ref, you get different values, so it's useless.

Beware of strings
Perl is awful for always copying strings around, even if you're never going to modify
them. This wastes CPU and memory. Try to avoid that wherever reasonably possible.
If the string is a function parameter and the function has a modest length, don't copy
the string into a "my" variable, access it with $_[0] and document the function well.
Elsewhere, the aliasing feature of "for(each)" can help. Or just use references to
strings, which are fast to copy. If you somehow ensure that same strings get stored
only once, you can do numerical comparison for equality.

Avoid bit operations
If you have disjoint bit patterns you can add them instead of or`ing them. Shifting
can be performed my multiplication or integer division. Retaining only the lowest
bits can be achieved with modulo.

Separate boolean hash members are faster than stuffing everything into an integer with
bit operations or into a string with "vec".

Use order of boolean operations
If you only care whether an expression is true or false, check the cheap things, like
boolean variables, first, and call functions last.

Use undef instead of 0
It takes up a few percent less memory, at least as hash or list values. You can still
query it as a boolean.

my %x; $x{$_} = 0 for 0..999_999; system "ps -ovsz $$"
my %x; undef $x{$_} for 0..999_999; system "ps -ovsz $$"

my @x = (0) x 999_999; system "ps -ovsz $$"
my @x = (undef) x 999_999; system "ps -ovsz $$"

Choose for or map
These are definitely not equivalent. Depending on your use (i.e. the list and the
complexity of your code), one or the other may be faster.

my @l = 0..99;
for( 0..99_999 ) { map $a = " $_ ", @l }
for( 0..99_999 ) { map $a = " $_ ", 0..99 }
for( 0..99_999 ) { $a = " $_ " for @l }
for( 0..99_999 ) { $a = " $_ " for 0..99 }

Don't alias $_
While it is convenient, it is rather expensive, even copying reasonable strings is
faster. The last example is twice as fast as the first "for".

my $x = "abcdefg"; my $b = 0;
for( "$x" ) { $b = 1 - $b if /g/ } # Copy needed only if modifying.
for( $x ) { $b = 1 - $b if /g/ }
local *_ = \$x; $b = 1 - $b if /g/;
local $_ = $x; $b = 1 - $b if /g/; # Copy cheaper than alias.
my $y = $x; $b = 1 - $b if $y =~ /g/;

Use perl_performance online using onworks.net services