This is the command webalizer that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator
webalizer - A web server log file analysis tool.
webalizer [ option ... ] [ log-file ]
webazolver [ option ... ] [ log-file ]
The Webalizer is a web server log file analysis program which produces usage statistics in
HTML format for viewing with a browser. The results are presented in both columnar and
graphical format, which facilitates interpretation. Yearly, monthly, daily and hourly
usage statistics are presented, along with the ability to display usage by site, URL,
referrer, user agent (browser), username, search strings, entry/exit pages, and country
(some information may not be available if not present in the log file being processed).
The Webalizer supports CLF (common log format) log files, as well as Combined log formats
as defined by NCSA and others, and variations of these which it attempts to handle
intelligently. In addition, the Webalizer supports xferlog formatted (FTP) log files,
squid proxy logs and W3C extended format logs. Logs may also be compressed, via gzip
(.gz) or, if enabled at compile time, bzip2 (.bz2). If a compressed log file is detected,
it will be automatically uncompressed while it is read. Compressed logs must have the
standard gzip extension of .gz or bzip2 extension of .bz2.
webazolver is normally just a symbolic link to the Webalizer. When run as webazolver,
only DNS file creation/updates are performed, and the program will exit once complete.
All normal options and configuration directives are available, however many will not be
used. In addition, a DNS cache file must be specified. If the number of DNS children
processes to use are not specified, the webazolver will default to 5.
This documentation applies to The Webalizer Version 2.23
RUNNING THE WEBALIZER
The Webalizer was designed to be run from a Unix command line prompt or as a crond(8) job.
Once executed, the general flow of the program is:
o A default configuration file is scanned for. A file named webalizer.conf is
searched for in the current directory, and if found, it's configuration data is
parsed. If the file is not present in the current directory, the file
/etc/webalizer.conf is searched for and, if found, is used instead.
o Any command line arguments given to the program are parsed. This may include the
specification of a configuration file, which is processed at the time it is
o If a log file was specified, it is opened and made ready for processing. If no
log file was given, STDIN is used for input. If the log filename '-' is
specified, STDIN will be forced.
o If an output directory was specified, the program does a chdir(2) to that
directory in preparation for generating output. If no output directory was given,
the current directory is used.
o If a non-zero number of DNS Children processes were specified, they will be
started, and the specified log file will be processed, creating or updating the
specified DNS cache file.
o If no hostname was given, the program attempts to get the hostname using a
uname(2) system call. If that fails, localhost is used.
o A history file is searched for in the current directory (output directory) and
read if found. This file keeps totals for previous months, which is used in the
main index.html HTML document. Note: The file location can now be specified with
the HistoryName configuration option.
o If incremental processing was specified, a data file is searched for and loaded if
found, containing the 'internal state' data of the program at the end of a
previous run. Note: The file location can now be specified with the
IncrementalName configuration option.
o Main processing begins on the log file. If the log spans multiple months, a
separate HTML document is created for each month.
o After main processing, the main index.html page is created, which has totals by
month and links to each months HTML document.
o A new history file is saved to disk, which includes totals generated by The
Webalizer during the current run.
o If incremental processing was specified, a data file is written that contains the
'internal state' data at the end of this run.
The Webalizer supports incremental run capability. Simply put, this allows processing
large log files by breaking them up into smaller pieces, and processing these pieces
instead. What this means in real terms is that you can now rotate your log files as often
as you want, and still be able to produce monthly usage statistics without the loss of any
detail. Basically, The Webalizer saves and restores all internal data in a file named
webalizer.current. This allows the program to 'start where it left off' so to speak, and
allows the preservation of detail from one run to the next. The data file is placed in
the current output directory, and is a plain ASCII text file that can be viewed with any
standard text editor. It's location and name may be changed using the IncrementalName
Some special precautions need to be taken when using the incremental run capability of The
Webalizer. Configuration options should not be changed between runs, as that could cause
corruption of the internal data stored. For example, changing the MangleAgents level will
cause different representations of user agents to be stored, producing invalid results in
the user agents section of the report. If you need to change configuration options, do it
at the end of the month after normal processing of the previous month and before
processing the current month. You may also want to delete the webalizer.current file as
The Webalizer also attempts to prevent data duplication by keeping track of the timestamp
of the last record processed. This timestamp is then compared to current records being
processed, and any records that were logged previous to that timestamp are ignored. This,
in theory, should allow you to re-process logs that have already been processed, or
process logs that contain a mix of processed/not yet processed records, and not produce
duplication of statistics. The only time this may break is if you have duplicate
timestamps in two separate log files... any records in the second log file that do have
the same timestamp as the last record in the previous log file processed, will be
discarded as if they had already been processed. There are lots of ways to prevent this
however, for example, stopping the web server before rotating logs will prevent this
situation. This setup also necessitates that you always process logs in chronological
order, otherwise data loss will occur as a result of the timestamp compare.
REVERSE DNS LOOKUPS
The Webalizer fully supports IPv4 and IPv6 DNS lookups, and maintains a cache of those
lookups to reduce processing the same addresses in subsequent runs. The cache file can be
created at run-time, or may be created before running the webalizer using either the stand
alone 'webazolver' program, or The Webalizer (DNS) Cache file manager program 'wcmgr'. In
order to perform reverse lookups, a DNSCache file must be specified, either on the command
line or in a configuration file. In order to create/update the cache file at run-time,
the number of DNSChildren must also be specified, and can be anything between 1 and 100.
This specifies the number of child processes to be forked, each of which will perform
network DNS queries in order to lookup up the addresses and update the cache. Cached
entries that are older than a specified TTL (time to live) will be expired, and if
encountered again in a log, will be looked up at that time in order to 'freshen' them
(verify the name is still the same and update its timestamp). The default TTL is 7 days,
however may be set to anything between 1 and 100 days. Using the 'wcmgr' program, entries
may also be marked as 'permanent', in which case they will persist (with an infinite TTL)
in the cache until manually removed. See the file DNS.README for additional information
The Webalizer has the ability to perform geolocation lookups on IP addresses using either
it's own internal GeoDB database, or optionally the GeoIP database from MaxMind, Inc.
(www.maxmind.com). If used, unresolved addresses will be searched for in the database and
its country of origin will be returned if found. This actually produces more accurate
Country information than DNS lookups, since the DNS address space has additional gcTLDs
that do not necessarily map to a specific country (such as .net and .com). It is possible
to use both DNS lookups and geolocation lookups at the same time, which will cause any
addresses that could not be resolved using DNS lookups to then be looked up in the
database, greatly reducing the number of Unknown/Unresolved entries in the generated
reports. The native GeoDB geolocation database provided by The Webalizer fully supports
both IPv4 and IPv6 lookups, is updated regularly and is the preferred geolocation method
for use with The Webalizer. The most current version of the database can be obtained from
our ftp site (ftp://ftp.mrunix.net/).
COMMAND LINE OPTIONS
The Webalizer supports many different configuration options that will alter the way the
program behaves and generates output. Most of these can be specified on the command line,
while some can only be specified in a configuration file. The command line options are
listed below, with references to the corresponding configuration file keywords.
-h Display all available command line options and exit program.
-v Be verbose. Will cause the program to output informational and Debug messages at
-V Display the program version and exit. Additional program specific information
will be displayed if verbose mode is also used (e.g. '-vV'), which can be useful
when submitting bug reports.
-d Debug. Display debugging information for errors and warnings.
-i IgnoreHist. Ignore history. USE WITH CAUTION. This will cause The Webalizer to
ignore any previous monthly history file only. Incremental data (if present) is
-b IgnoreState. Ignore incremental data file. USE WITH CAUTION. This will cause
The Webalizer to ignore any existing incremental data file. By ignoring the
incremental data file, all previous processing for the current month will be lost
and those logs must be re-processed.
-p Incremental. Preserve internal data between runs.
-q Quiet. Suppress informational messages. Does not suppress warnings or errors.
-Q ReallyQuiet. Suppress all messages including warnings and errors.
-T TimeMe. Force display of timing information at end of processing.
-c file Use configuration file file.
-n name HostName. Use the hostname name.
-o dir OutputDir. Use output directory dir.
-t name ReportTitle. Use name for report title.
-F ( clf | ftp | squid | w3c )
LogType. Specify log type to be processed. Value can be either clf, ftp, squid
or w3c format. If not specified, will default to CLF format. FTP logs must be in
standard wu-ftpd xferlog format.
-f FoldSeqErr. Fold out of sequence log records back into analysis, by treating as
if they were the same date/time as the last good record. Normally, out of
sequence log records are simply ignored.
-Y CountryGraph. Suppress country graph.
-G HourlyGraph. Suppress hourly graph.
-x name HTMLExtension. Defines HTML file extension to use. If not specified, defaults to
html. Do not include the leading period.
-H HourlyStats. Suppress hourly statistics.
-K num IndexMonths. Specify how many months should be displayed in the main index
(yearly summary) table. Default is 12 months. Can be set to anything between 12
and 120 months (1 to 10 years).
-k num GraphMonths. Specify how many months should be displayed in the main index
(yearly summary) graph. Default is 12 months. Can be set to anything between 12
and 72 months (1 to 6 years).
-L GraphLegend. Suppress color coded graph legends.
-l num GraphLines. Specify number of background lines. Default is 2. Use zero ('0') to
disable the lines.
-P name PageType. Specify file extensions that are considered pages. Sometimes referred
to as pageviews.
-O name OmitPage. Specify URLs to exclude from being counted as pages.
-m num VisitTimeout. Specify the Visit timeout period. Specified in number of seconds.
Default is 1800 seconds (30 minutes).
-I name IndexAlias. Use the filename name as an additional alias for index..
-M num MangleAgents. Mangle user agent names according to the mangle level specified by
num. Mangle levels are:
5 Browser name and major version.
4 Browser name, major and minor version.
3 Browser name, major version, minor version to two decimal places.
2 Browser name, major and minor versions and sub-version.
1 Browser name, version and machine type if possible.
0 All information (left unchanged).
-g num GroupDomains. Automatically group sites by domain. The grouping level specified
by num can be thought of as 'the number of dots' to display in the grouping. The
default value of 0 disables any domain grouping.
-D name DNSCache. Use the DNS cache file name.
-N num DNSChildren. Use num DNS children processes to perform DNS lookups, either
creating or updating the DNS cache file. Specify zero (0) to disable cache file
creation/updates. If given, a DNS cache filename must be specified.
-j Enable GeoDB. This enables the internal GeoDB geolocation services provided by
-J name GeoDBDatabase. Use the alternate GeoDB database name.
-w Enable GeoIP. Enables GeoIP (by MaxMind Inc.) geolocation services. If native
GeoDB services are also enabled, then this option will have no effect.
-W name GeoIPDatabase. Use the alternate GeoIP database name.
-z name FlagDir. Specify location of the country flag graphics and enable their display
in the top country table. The directory name is relative to the output directory
being used unless an absolute path is given (ie: starts with a leading '/').
-a name HideAgent. Hide user agents matching name.
-r name HideReferrer. Hide referrer matching name.
-s name HideSite. Hide site matching name.
-X HideAllSites. Hide all individual sites (only display groups).
-u name HideURL. Hide URL matching name.
Table size options
-A num TopAgents. Display the top num user agents table.
-R num TopReferrers. Display the top num referrers table.
-S num TopSites. Display the top num sites table.
-U num TopURLs. Display the top num URLs table.
-C num TopCountries. Display the top num countries table.
-e num TopEntry. Display the top num entry pages table.
-E num TopExit. Display the top num exit pages table.
Configuration files are standard ASCII(7) text files that may be created or edited using
any standard editor. Blank lines and lines that begin with a pound sign ('#') are
ignored. Any other lines are considered to be configuration lines, and have the form
"Keyword Value", where the ´Keyword´ is one of the currently available configuration
keywords defined below, and 'Value' is the value to assign to that particular option. Any
text found after the keyword up to the end of the line is considered the keyword's value,
so you should not include anything after the actual value on the line that is not actually
part of the value being assigned. The file sample.conf provided with the distribution
contains lots of useful documentation and examples as well.
General Configuration Keywords
Use log file named name. If none specified, STDIN will be used.
Specify log file type as name. Values can be either clf, squid, ftp or w3c, with
the default being clf.
Create output in the directory dir. If none specified, the current directory will
Filename to use for history file. Relative to output directory unless absolute
name is given (ie: starts with '/'). Defaults to ´webalizer.hist' in the standard
Use the title string name for the report title. If none specified, use the
default of (in english) "Usage Statistics for ".
Set the hostname for the report as name. If none specified, an attempt will be
made to gather the hostname via a uname(2) system call. If that fails, localhost
will be used.
UseHTTPS ( yes | no )
Use https:// on links to URLS, instead of the default http://, in the 'Top URLs'
HTAccess ( yes | no )
Enables the creation of a default .htaccess file in the output directory.
Quiet ( yes | no )
Suppress informational messages. Warning and Error messages will not be
ReallyQuiet ( yes | no )
Suppress all messages, including Warning and Error messages.
Debug ( yes | no )
Print extra debugging information on Warnings and Errors.
TimeMe ( yes | no )
Force timing information at end of processing.
GMTTime ( yes | no )
Use GMT (UTC) time instead of local timezone for reports.
IgnoreHist ( yes | no )
Ignore previous monthly history file. USE WITH CAUTION. Does not prevent
Incremental file processing.
IgnoreState ( yes | no )
Ignore incremental data file. USE WITH CAUTION. By ignoring the incremental data
file, all previous processing for the current month will be lost and those logs
must be re-processed.
FoldSeqErr ( yes | no )
Fold out of sequence log records back into analysis by treating them as if they
had the same date/time as the last good record. Normally, out of sequence log
records are ignored.
CountryGraph ( yes | no )
Display Country Usage Graph in output report.
CountryFlags ( yes | no )
Enable or disable the display of flags in the top country table.
Specifies the directory name where the flag graphics are located. If not
specified, the default is in the flags directory directly under the output
directory being used. If specified, the display of country flags will be enabled
by default. Using 'FlagDir flags' is identical to using 'CountryFlags yes'.
DailyGraph ( yes | no )
Display Daily Graph in output report.
DailyStats ( yes | no )
Display Daily Statistics in output report.
HourlyGraph ( yes | no )
Display Hourly Graph in output report.
HourlyStats ( yes | no )
Display Hourly Statistics in output report.
Define the file extensions to consider as a page. If a file is found to have the
same extension as name, it will be counted as a page (sometimes called a
Allows URLs with the prefix name to be counted as a page type regardless of actual
file type. This allows you to treat contents under specified directories as pages
no matter what their extension is.
Specifies URLs which should not be counted as pages, regardless of their extension
(or lack thereof).
GraphLegend ( yes | no )
Allows the color coded graph legends to be enabled/disabled.
Specify the number of background reference lines displayed on the graphs produced.
Disable by using zero ('0'), default is 2.
Specify the number of months to display in the main index (yearly summary) table.
Default is 12 months. Can be set to anything between 12 and 120 months (1 to 10
YearHeaders ( yes | no )
Enable/disable the display of year headers in the main index (yearly summary)
table. If enabled, year headers will be shown when the table is displaying more
than 16 months worth of data. Values can be 'yes' or 'no'. Default is 'yes'.
YearTotals ( yes | no )
Enable/disable the display of year totals in the main index (yearly summary)
table. If enabled, year totals will be shown when the table is displaying more
than 16 months worth of data. Values can be 'yes' or 'no'. Default is 'yes'.
Specify the number of months to display in the main index (yearly summary) graph.
Default is 12 months. Can be set to anything between 12 and 72 months (1 to 6
Specifies the visit timeout value. Default is 1800 seconds (30 minutes). A visit
is determined by looking at the difference in time between the current and last
request from a specific site. If the difference is greater or equal to the
timeout value, the request is counted as a new visit. Specified in seconds.
Use name as an additional alias for index.*.
DefaultIndex ( yes | no )
Enables or disables the use of 'index.' as a default index name to be stripped
from the end of URLs. This does not effect any index names that may be defined
with the IndexAlias option.
Mangle user agent names based on mangle level num. See the -M command line switch
for mangle levels and their meaning. The default is 0, which doesn't mangle user
agents at all.
StripCGI ( yes | no )
Determines if URL CGI variables should be stripped from the end of URLs. Values
may be 'yes' or 'no', with the default being 'yes'.
Allows squid log URLs to be reduced in granularity by truncating them after num
slashes ('/') after the http:// prefix. A setting of one (1) will cause all URLs
to be summarized by domain only. The default value is zero (0), which will
disable any URL modifications and leave them exactly as found in the log file.
SearchEngine name variable
Allows the specification of search engines and their query strings. The name is
the name to match against the referrer string for a given search engine. The
variable is the cgi variable that the search engine uses for queries. See the
sample.conf file for example usage with common search engines.
SearchCaseI ( yes | no )
Determines if search strings should be treated case insensitive or not. The
default is 'yes', which lowercases all search strings (treat as case insensitive).
Incremental ( yes | no )
Enable Incremental mode processing.
Filename to use for incremental data. Relative to output directory unless an
absolute name is given (ie: starts with '/'). Defaults to ´webalizer.current' in
the standard output directory.
Filename to use for the DNS cache. Relative to output directory unless an
absolute name is given (ie: starts with '/').
Number of children DNS processes to run in order to create/update the DNS cache
file. Specify zero (0) to disable.
CacheIPs ( yes | no )
Cache unresolved IP addresses in the DNS database. Default is 'no'.
DNS cache entry time to live (TTL) in days. Default is 7 days. May be any value
between 1 and 100.
GeoDB ( yes | no )
Allows native GeoDB geolocation services to be enabled or disabled. Default value
Allows the use of an alternate GeoDB database name. If not specified, the default
database will be used.
GeoIP ( yes | no )
Allows GeoIP (by MaxMind Inc.) geolocation services to be enabled or disabled.
Default is 'no'. If native GeoDB geolocation services are also enabled, then this
option will have no effect (and the native GeoDB services will be used).
Allows the use of an alternate GeoIP database name. If not specified, the default
database will be used.
Top Table Keywords
Display the top num User Agents table. Use zero to disable.
AllAgents ( yes | no )
Create separate HTML page with All User Agents.
Display the top num Referrers table. Use zero to disable.
AllReferrers ( yes | no )
Create separate HTML page with All Referrers.
Display the top num Sites table. Use zero to disable.
Display the top num Sites (by KByte) table. Use zero to disable.
AllSites ( yes | no )
Create separate HTML page with All Sites.
Display the top num URLs table. Use zero to disable.
Display the top num URLs (by KByte) table. Use zero to disable.
AllURLs ( yes | no )
Create separate HTML page with All URLs.
Display the top num Countries in the table. Use zero to disable.
Display the top num Entry Pages in the table. Use zero to disable.
Display the top num Exit Pages in the table. Use zero to disable.
Display the top num Search Strings in the table. Use zero to disable.
AllSearchStr ( yes | no )
Create separate HTML page with All Search Strings.
Display the top num Usernames in the table. Use zero to disable. Usernames are
only available if using http based authentication.
AllUsers ( yes | no )
Create separate HTML page with All Usernames.
Hide User Agents that match name.
Hide Referrers that match name.
Hide Sites that match name.
HideAllSites ( yes | no )
Hide all individual sites. This causes only grouped sites to be displayed.
Hide URLs that match name.
Hide Usernames that match name.
Ignore User Agents that match name.
Ignore Referrers that match name.
Ignore Sites that match name.
Ignore URLs that match name.
Ignore Usernames that match name.
GroupAgent name [Label]
Group User Agents that match name. Display Label in 'Top Agent' table if given
(instead of name). name may be enclosed in quotes.
GroupReferrer name [Label]
Group Referrers that match name. Display Label in 'Top Referrer' table if given
(instead of name). name may be enclosed in quotes.
GroupSite name [Label]
Group Sites that match name. Display Label in 'Top Site' table if given (instead
of name). name may be enclosed in quotes.
Automatically group sites by domain. The value num specifies the level of
grouping, and can be thought of as the 'number of dots' to be displayed. The
default value of 0 disables domain grouping.
GroupURL name [Label]
Group URLs that match name. Display Label in 'Top URL' table if given (instead of
name). name may be enclosed in quotes.
GroupUser name [Label]
Group Usernames that match name. Display Label in 'Top Usernames' table if given
(instead of name). name may be enclosed in quotes.
Force inclusion of sites that match name. Takes precedence over Ignore* keywords.
Force inclusion of URLs that match name. Takes precedence over Ignore* keywords.
Force inclusion of Referrers that match name. Takes precedence over Ignore*
Force inclusion of User Agents that match name. Takes precedence over Ignore*
Force inclusion of Usernames that match name. Takes precedence over Ignore*
HTML Generation Keywords
Defines the HTML file extension to use. Default is html. Do not include the
Insert text at the very beginning of the generated HTML file. Defaults to a
standard html 3.2 DOCTYPE record.
Insert text within the <HEAD></HEAD> block of the HTML file.
Insert text in HTML page, starting with the <BODY> tag. If used, the first line
must be a <BODY ...> tag. Multiple lines may be specified.
Insert text at top (before horiz. rule) of HTML pages. Multiple lines may be
Insert text at bottom of the HTML page. The text is top and right aligned within
a table column at the end of the report.
Insert text at the very end of the HTML page. If not specified, the default is to
insert the ending </BODY> and </HTML> tags. If used, you must supply these tags
LinkReferrer ( yes | no )
Determines if the referrers listed in the top referrers table should be displayed
as plain text, or as a link to the referrer URL.
ColorHit ( rrggbb | 00805c )
Sets the graph's hit-color to the specified html color (no '#').
ColorFile ( rrggbb | 0040ff )
Sets the graph's file-color to the specified html color (no '#').
ColorSite ( rrggbb | ff8000 )
Sets the graph's site-color to the specified html color (no '#').
ColorKbyte ( rrggbb | ff0000 )
Sets the graph's kilobyte-color to the specified html color (no '#').
ColorPage ( rrggbb | 00e0ff )
Sets the graph's page-color to the specified html color (no '#').
ColorVisit ( rrggbb | ffff00 )
Sets the graph's visit-color to the specified html color (no '#').
ColorMisc ( rrggbb | 00e0ff )
Sets the 'miscellaneous' color for table headers (not graphs) to the specified
html color (no '#').
PieColor1 ( rrggbb | 800080 )
Sets the pie's first optional color to the specified html color (no '#').
PieColor2 ( rrggbb | 80ffc0 )
Sets the pie's second optional color to the specified html color (no '#').
PieColor3 ( rrggbb | ff00ff )
Sets the pie's third optional color to the specified html color (no '#').
PieColor4 ( rrggbb | ffc480 )
Sets the pie's fourth optional color to the specified html color (no '#').
Dump Object Keywords
The Webalizer allows you to export processed data to other programs by using tab delimited
text files. The Dump* commands specify which files are to be written, and where.
Save dump files in directory name. If not specified, the default output directory
will be used. Do not specify a trailing slash ('/').
Use name as the filename extension for dump files. If not given, the default of
tab will be used.
DumpHeader ( yes | no )
Print a column header as the first record of the file.
DumpSites ( yes | no )
Dump the sites data to a tab delimited file.
DumpURLs ( yes | no )
Dump the url data to a tab delimited file.
DumpReferrers ( yes | no )
Dump the referrer data to a tab delimited file. This data is only available if
using a log that contains referrer information (ie: a combined format web log).
DumpAgents ( yes | no )
Dump the user agent data to a tab delimited file. This data is only available if
using a log that contains user agent information (ie: a combined format web log).
DumpUsers ( yes | no )
Dump the username data to a tab delimited file. This data is only available if
processing a wu-ftpd xferlog or a web log that contains http authentication
DumpSearchStr ( yes | no )
Dump the search string data to a tab delimited file. This data is only available
if processing a web log that contains referrer information and had search string
Use webalizer online using onworks.net services