qsf - Online in the Cloud

Run qsf in OnWorks free hosting provider over Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

This is the command qsf that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

Run in Ubuntu Run in Fedora Run in Windows Sim Run in MACOS Sim

PROGRAM:

NAME

qsf - quick spam filter

SYNOPSIS

Filtering: qsf [-snrAtav] [-d DB] [-g DB]
[-L LVL] [-S SUBJ] [-H MARK] [-Q NUM]
[-X NUM]
Training: qsf -T SPAM NONSPAM [MAXROUNDS] [-d DB]
Retraining: qsf -[m|M] [-d DB] [-w WEIGHT] [-ayN]
Database: qsf -[p|D|R|O] [-d DB]
Database merge: qsf -E OTHERDB [-d DB]
Allowlist query: qsf -e EMAIL [-m|-M|-t] [-d DB] [-g DB]
Denylist query: qsf -y -e EMAIL [-m -m|-M -M|-t] [-d DB] [-g DB]
Help: qsf -[h|V]

DESCRIPTION

qsf reads a single email on standard input, and by default outputs it on standard output.
If the email is determined to be spam, an additional header ("X-Spam: YES") will be added,
and optionally the subject line can have "[SPAM]" prepended to it.

qsf is intended to be used in a procmail(1) recipe, in a ruleset such as this:

:0 wf
| qsf -ra

:0 H:
* X-Spam: YES
$HOME/mail/spam

For more examples, including sample procmail(1) recipes, see the EXAMPLES section below.

TRAINING

Before qsf can be used properly, it needs to be trained. A good way to train qsf is to
collect a copy of all your email into two folders - one for spam, and one for non-spam.
Once you have done this, you can use the training function, like this:

qsf -aT spam-folder non-spam-folder

This will generate a database that can be used by qsf to guess whether email received in
the future is spam or not. Note that this initial training run may take a long time, but
you should only need to do it once.

To mark a single message as spam, pipe it to qsf with the --mark-spam or -m ("mark as
spam") option. This will update the database accordingly and discard the email.

To mark a single message as non-spam, pipe it to qsf with the --mark-nonspam or -M ("mark
as non-spam") option. Again, this will discard the email.

If a message has been mis-tagged, simply send it to qsf as the opposite type, i.e. if it
has been mistakenly tagged as spam, pipe it into qsf --mark-nonspam --weight=2 to add it
to the non-spam side of the database with double the usual weighting.

OPTIONS

The qsf options are listed below.

-d, --database [TYPE:]FILE
Use FILE as the spam/non-spam database. The default is to use /var/lib/qsfdb and,
if that is not available or is read-only, $HOME/.qsfdb. This option can also be
useful if there is a system-wide database but you do not want to use it -
specifying your own here will override the default.

If you prefix the filename with a TYPE, of the form btree:$HOME/.qsfdb, then this
will specify what kind of database FILE is, such as list, btree, gdbm, sqlite and
so on. Check the output of qsf -V to see which database backends are available.
The default is to auto-detect the type, or, if the file does not already exist, use
list. Note that TYPE is not case-sensitive.

-g, --global [TYPE:]FILE
Use FILE as the default global database, instead of /var/lib/qsfdb. If you also
specify a database with -d, then this "global" database will be used in read-only
mode in conjunction with the read-write database specified with -d. The -g option
can be used a second time to specify a third database, which will also be used in
read-only mode. Again, the filename can optionally be prefixed with a TYPE which
specifies the database type.

-P, --plain-map FILE
Maintain a mapping of all database tokens to their non-hashed counterparts in FILE,
one token per line. This can be useful if you want to be able to list the contents
of your database at a later date, for instance to get a list of email addresses in
your allow-list. Note that using this option may slow qsf down, and only entries
written to the database while this option is active will be stored in FILE.

-s, --subject
Rewrite the Subject line of any email that turns out to be spam, adding "[SPAM]" to
the start of the line.

-S, --subject-marker SUBJECT
Instead of adding "[SPAM]", add SUBJECT to the Subject line of any email that turns
out to be spam. Implies -s.

-H, --header-marker MARK
Instead of setting the X-Spam header to "YES", set it to MARK if email turns out to
be spam. This can be useful if your email client can only search all headers for a
string, rather than one particular header (so searching for "YES" might match more
than just the output of qsf).

-n, --no-header
Do not add an X-Spam header to messages.

-r, --add-rating
Insert an additional header X-Spam-Rating which is a rating of the "spamminess" of
a message from 0 to 100; 90 and above are counted as spam, anything under 90 is not
considered spam. If combined with -t, then the rating (0-100) will be output, on
its own, on standard output.

-A, --asterisk
Insert an additional header X-Spam-Level which will contain between 0 and 20
asterisks (*), depending on the spam rating.

-t, --test
Instead of passing the message out on standard output, output nothing, and exit 0
if the message is not spam, or exit 1 if the message is spam. If combined with -r,
then the spam rating will be output on standard output.

-a, --allowlist
Enable the allow-list. This causes the email addresses given in the message's
"From:" and "Return-Path:" headers to be checked against a list; if either one
matches, then the message is always treated as non-spam, regardless of what the
token database says. When specified with a retraining flag, -a -m (mark as spam)
will remove that address from the allow-list as well as marking the message as
spam, and -a -M (mark as non-spam) will add that address to the allow-list as well
as marking the message as non-spam. The idea is that you add all of your friends
to the allow-list, and then none of their messages ever get marked as spam.

-y, --denylist
Enable the deny-list. This causes the email addresses given in the message's
"From:" and "Return-Path:" headers to be checked against a second list; if either
one matches, then theh message is always treated as spam. Training works in the
same way as with -a, except that you must specify -m or -M twice to modify the
deny-list instead of the allow-list, and with the reverse syntax: -y -m -m (mark as
spam) will add that address to the deny-list, whereas -y -M -M (mark as non-spam)
will remove that address from the deny-list. This double specification is so that
the usual retraining process never touches the deny-list; the deny-list should be
carefully maintained rather than automatically generated.

Normally you would not need to use the deny-list.

-L, --level, --threshold LEVEL
Change the spam scoring threshold level which must be reached before an email is
classified as spam. The default is 90.

-Q, --min-tokens NUM
Only give a score if more than NUM tokens are found in the message - otherwise the
message is assumed to be non-spam, and it is not modified in any way. The default
is 0. This option might be useful if you find that very short messages are being
frequently miscategorised.

-e, --email, --email-only EMAIL
Query or update the allow-list entry for the email address EMAIL. With no other
options, this will simply output "YES" if EMAIL is in the allow-list, or "NO" if it
is not. With -t, it will not output anything, but will exit 0 (success) if EMAIL is
in the allow-list, or 1 (failure) if it is not. With the -m (mark-spam) option, any
previous allow-list entry for EMAIL will be removed. Finally, with the -M (mark-
nonspam) option, EMAIL will be added to the allow-list if it is not already on it.

If EMAIL is just the word MSG on its own, then an email will be read from standard
input, and the email addresses given in the "From:" and "Return-Path:" headers will
be used.

Using -e automatically switches on -a.

If you also specify -y, then the deny-list will be operated on. Remember that -m
and -M are reversed with the deny-list.

If you specify an email address of the form @domain (nothing before the @), then
the whole domain will be allow or deny listed.

-v, --verbose
Add extra X-QSF-Info headers to any filtered email, containing error messages and
so on if applicable. Specify -v more than once to increase verbosity.

-T, --train SPAM NONSPAM [MAXROUNDS]
Train the database using the two mbox folders SPAM and NONSPAM, by testing each
message in each folder and updating the database each time a message is
miscategorised. This is done several times, and may take a while to run. Specify
the -a (allow-list) flag to add every sender in the NONSPAM folder to your allow-
list as a side-effect of the training process. If MAXROUNDS is specified, training
will end after this number of rounds if the results are still not good enough. The
default is a maximum of 200 rounds.

-m, --mark-spam
Instead of passing the message out on standard output, mark its contents as spam
and update the database accordingly. If the allow-list (-a) is enabled, the
message's "From:" and "Return-Path:" addresses are removed from the allow-list. If
the deny-list (-y) is enabled and you specify -m twice, the message's addresses are
added to the deny-list instead.

-M, --mark-nonspam
Instead of passing the message out on standard output, mark its contents as non-
spam and update the database accordingly. If the allow-list (-a) is enabled, the
message's "From:" and "Return-Path:" addresses are added to the allow-list (see the
-a option above). If the deny-list (-y) is enabled and you specify -M twice, the
message's addresses are removed from the deny-list instead.

-w, --weight WEIGHT
When marking as spam or non-spam, update the database with a weighting of WEIGHT
per token instead of the default of 1. Useful when correcting mistakes, eg a
message that has been mistakenly detected as spam should be marked as non-spam
using a weighting of 2, i.e. double the usual weighting, to counteract the error.

-D, --dump [FILE]
Dump the contents of the database as a platform-independent text file, suitable for
archival, transfer to another machine, and so on. The data is output on stdout or
into the given FILE.

-R, --restore [FILE]
Rebuild the database from scratch from the text file on stdin. If a FILE is given,
data is read from there instead of from stdin.

-O, --tokens
Instead of filtering, output a list of the tokens found in the message read from
standard input, along with the number of times each token was found. This is only
useful if you want to use qsf as a general tokeniser for use with another filtering
package.

-E, --merge OTHERDB
Merge the OTHERDB database into the current database. This can be useful if you
want to take one user's mailbox and merge it into the system-wide one, for instance
(this would be done by, as root, doing qsf -d /var/lib/qsfdb -E /home/user/.qsfdb
and then removing /home/user/.qsfdb).

-B, --benchmark SPAM NONSPAM [MAXROUNDS]
Benchmark the training process using the two mbox folders SPAM and NONSPAM. A
temporary database is created and trained using the first 75% of the messages in
each folder, and then the entire contents of each folder is tested to see how many
false positives and false negatives occur. Some timing information is also
displayed.

This can be used to decide which backend is best on your system. Use -d to select
a backend, eg qsf -B spam nonspam -d GDBM - this will create a temporary database
which is removed afterwards.

The exception to this is the MySQL backend, where a full database specification
must be given (-d MySQL:database=db;host=localhost;...) and the database table
given will not be wiped beforehand or dropped afterwards.

As with -T, if MAXROUNDS is specified, training will never be done for more than
this number of rounds; the default is 200.

-h, --help
Print a usage message on standard output and exit successfully.

-V, --version
Print version information, including a list of available database backends, on
standard output and exit successfully.

DEPRECATED OPTIONS

The following options are only for use with the old binary tree database backend or old
databases that haven't been upgraded to the new format that came in with version 1.1.0.

-N, --no-autoprune
When marking as spam or nonspam, never automatically prune the database. Usually
the database is pruned after every 500 marks; if you would rather --prune manually,
use -N to disable automatic pruning.

-p, --prune
Remove redundant entries from the database and clean it up a little. This is
automatically done after several calls to --mark-spam or --mark-nonspam, and during
training with --train if the training takes a large number of rounds, so it should
rarely be necessary to use --prune manually unless you are using -N /
--no-autoprune.

-X, --prune-max NUM
When the database is being pruned, no more than NUM entries will be considered for
removal. This is to prevent CPU and memory resources being taken over. The
default is 100,000 but in some circumstances (if you find that pruning takes too
long) this option may be used to reduce it to a more manageable number.

Use qsf online using onworks.net services