mu-index - Online in the Cloud

Run mu-index in OnWorks free hosting provider over Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

This is the command mu-index that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

Run in Ubuntu Run in Fedora Run in Windows Sim Run in MACOS Sim

PROGRAM:

NAME

mu_index - index e-mail messages stored in Maildirs

SYNOPSIS

mu index [options]

DESCRIPTION

mu index is the mu command for scanning the contents of Maildir directories and storing
the results in a Xapian database. The data can then be queried using mu-find(1)

index understands Maildirs as defined by Daniel Bernstein for qmail(7). In addition, it
understands recursive Maildirs (Maildirs within Maildirs), Maildir++. It can also deal
with VFAT-based Maildirs which use '!' as the separators instead of ':' as used by
Tinymail/Modest and some other e-mail programs.

E-mail messages which are not stored in something resembling a maildir leaf-directory (cur
and new) are ignored, as are the cache directories for notmuch and gnus.

Symlinks are not followed.

If there is a file called .noindex in a directory, the contents of that directory and all
of its subdirectories will be ignored. This can be useful to exclude certain directories
from the indexing process, for example directories with spam-messages.

If there is a file called .noupdate in a directory, the contents of that directory and all
of its subdirectories will be ignored, unless we do a full rebuild (with --rebuild). This
can be useful to speed up things you have some maildirs that never change. Note that you
can still search for these messages, this only affects updating the database.

The first run of mu index may take a few minutes if you have a lot of mail (tens of
thousands of messages). Fortunately, such a full scan needs to be done only once; after
that it suffices to index the changes, which goes much faster. See the 'Note on
performance' below for more information.

The optional 'phase two' of the indexing-process is the removal of messages from the
database for which there is no longer a corresponding file in the Maildir. If you do not
want this, you can use -n, --nocleanup.

When mu index catches one of the signals SIGINT, SIGHUP or SIGTERM (e.g., when you press
Ctrl-C during the indexing process), it tries to shutdown gracefully; it tries to save and
commit data, and close the database etc. If it receives another signal (e.g., when
pressing Ctrl-C once more), mu index will terminate immediately.

OPTIONS

Note, some of the general options are described in the mu(1) man-page and not here, as
they apply to multiple mu commands.

-m, --maildir=<maildir>
starts searching at <maildir>. By default, mu uses whatever the MAILDIR environment
variable is set to; if it is not set, it tries ~/Maildir. See the note on mixing
sub-maildirs below.

--my-address=<my-email-address>

specifies that some e-mail address is 'my-address' (--my-address can be used
multiple times). This is used by mu cfind -- any e-mail address found in the
address fields of a message which also has <my-email-address> in one of its address
fields is considered a personal e-mail address. This allows you, for example, to
filter out (mu cfind --personal) addresses which were merely seen in mailing list
messages.

--nocleanup
disables the database cleanup that mu does by default after indexing.

--rebuild
clear all messages from the database before indexing. --rebuild guarantees that
after the indexing has finished, there are no 'old' messages in the database
anymore, which is not true with --reindex when indexing only a part of messages
(using --maildir). For this reason, it is necessary to run mu index --rebuild when
there is an upgrade in the database format. mu index will issue a warning about
this.

--autoupgrade
automatically use -y, --empty when mu notices that the database version is not up-
to-date. This option is for use in cron scripts and the like, so they won't require
any user interaction, even when mu introduces a new database version.

--xbatchsize=<batch size>
set the maximum number of messages to process in a single Xapian transaction. In
practice, this option is only useful if you find that mu is running out of memory
while indexing; in that case, you can set the batch size to (for example) 1000,
which will reduce memory consumption, but also substantially reduce the indexing
performance.

--max-msg-size=<max msg size>
set the maximum size (in bytes) for messages. The default maximum (currently at
50Mb) should be enough in most cases, but if you encounter warnings from mu about
ignoring messsage because they are too big, you may want to increase this. Note
that the reason for having a maximum size is that big message require big memory
allocations, which may lead to problems.

NOTE: It is not recommended to mix maildirs and sub-maildirs within the hierarchy
in the same database; for example, it's better not to index both with
--maildir=~/MyMaildir and --maildir=~/MyMaildir/foo, as this may lead to unexpected
results when searching with the 'maildir:' search parameter (see below).

A note on performance (i)
As a non-scientific benchmark, a simple test on the author's machine (a Thinkpad X61s
laptop using Linux 2.6.35 and an ext3 file system) with no existing database, and a
maildir with 27273 messages:

$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
66,65s user 6,05s system 27% cpu 4:24,20 total
(about 103 messages per second)

A second run, which is the more typical use case when there is a database already, goes
much faster:

$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
0,48s user 0,76s system 10% cpu 11,796 total
(more than 56818 messages per second)

Note that each test flushes the caches first; a more common use case might be to run mu
index when new mail has arrived; the cache may stay quite 'warm' in that case:

$ time mu index --quiet
0,33s user 0,40s system 80% cpu 0,905 total
which is more than 30000 messages per second.

A note on performance (ii)
As per June 2012, we did the same non-scientific benchmark, this time with an Intel)
i5-2500 CPU @ 3.30GHz, an ext4 file system and a maildir with 22589 messages.

$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
27,79s user 2,17s system 48% cpu 1:01,47 total
(about 813 messages per second)

A second run, which is the more typical use case when there is a database already, goes
much faster:

$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
0,13s user 0,30s system 19% cpu 2,162 total
(more than 173000 messages per second)

In general, mu has been getting faster with each release, even with relatively expensive
new features such as text-normalization (for case-insensitve/accent-insensitive matching).
The profiles are dominated by operations in the Xapian database now.

Use mu-index online using onworks.net services