webcheck - Online in the Cloud

This is the command webcheck that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

Run in Ubuntu Run in Fedora Run in Windows Sim Run in MACOS Sim

PROGRAM:

NAME

webcheck - website link checker

SYNOPSIS

webcheck [OPTION]... URL

DESCRIPTION

webcheck will check the document at the specified URL for links to other documents, follow
these links recursively and generate an HTML report.

-i, --internal=PATTERN
Mark URLs matching the PATTERN (perl-type regular expression) as an internal link.
Can be used multiple times. Note that the PATTERN is matched against the full URL.
URLs matching this PATTERN will be considered internal, even if they match one of
the --external PATTERNs.

-x, --external=PATTERN
Mark URLs matching the PATTERN (perl-type regular expression) as an external link.
Can be used multiple times. Note that the PATTERN is matched against the full URL.

-y, --yank=PATTERN
Do not check URLs matching the PATTERN (perl-type regular expression). Like the -x
flag, though this option will cause webcheck to not check the link matched by regex
whereas -x will check the link but not its children. Can be used multiple times.
Note that the PATTERN is matched against the full URL.

-b, --base-only
Consider any URL not starting with the base URL to be external. For example, if
you run
webcheck -b http://www.example.com/foo
then http://www.example.com/foo/bar will be considered internal whereas
http://www.example.com/ will be considered external. By default all the pages on
the site will be considered internal.

-a, --avoid-external
Avoid external links. Normally if webcheck is examining an HTML page and it finds
a link that points to an external document, it will check to see if that external
document exists. This flag disables that action.

--ignore-robots
Do not retrieve and parse robots.txt files. By default robots.txt files are
retrieved and honored. If you are sure you want to ignore and override the
webmaster's decision this option can be used.
For more information on robots.txt handling see the NOTES section below.

-q, --quiet, --silent
Do not print out progress as webcheck traverses a site.

-d, --debug
Print debugging information while crawling the site. This option is mainly useful
for developers.

-o, --output=DIRECTORY
Output directory. Use to specify the directory where webcheck will dump its
reports. The default is the current directory or as specified by config.py. If this
directory does not exist it will be created for you (if possible).

-c, --continue
Try to continue from a previous run. When using this option webcheck will look for
a webcheck.dat in the output directory. This file is read to restore the state
from the previous run. This allows webcheck to continue a previously interrupted
run. When this option is used, the --internal, --external and --yank options will
be ignored as well as any URL arguments. The --base-only and --avoid-external
options should be the same as the previous run.
Note that this option is experimental and it's semantics may change with coming
releases (especially in relation to other options). Also note that the stored
files are not guaranteed to be compatible between releases.

-f, --force
Overwrite files without asking. This option is required for running webcheck non-
interactively.

-r, --redirects=N
Redirect depth. the number of redirects webcheck should follow when following a
link. 0 implies to follow all redirects.

-u, --userpass=URL
Specify a URL with username and password information to use for basic
authentication when visiting the site.
e.g. http://test:secret@example.com/
This option may be specified multiple times.

-w, --wait=SECONDS
Wait SECONDS between document retrievals. Usually webcheck will process a url and
immediately move on to the next. However on some loaded systems it may be desirable
to have webcheck pause between requests. This option can be set to any non-
negative number.

-v, --version
Show version of program.

-h, --help
Show short summary of options.

URL CLASSES

URLs are divided into two classes:

Internal URLs are retrieved and the retrieved item is checked for syntax. Also, the
retrieved item is searched for links to other items (of any class) and these links are
followed.

External URLs are only retrieved to test whether they are valid and to gather some basic
information from them (title, size, content-type, etc). The retrieved items are not
inspected for links to other items.

Apart from their class, URLs can also be considered yanked (as specified with the --yank
or --avoid-external options). The URLs can be either internal or external and will not be
retrieved or checked at all. URLs of unsupported schemes are also considered yanked.

EXAMPLES

Check the site www.example.com but consider any path with "/webcheck" in it to be
external.
webcheck http://www.example.com/ -x /webcheck

NOTES

When checking internal URLs webcheck honors the robots.txt file, identifying itself as
user-agent webcheck. Disallowed links will not be checked at all as if the -y option was
specified for that URL. To allow webcheck to crawl parts of a site that other robots are
disallowed, use something like:
User-agent: *
Disallow: /foo

User-agent: webcheck
Allow: /foo

ENVIRONMENT

<scheme>_proxy
Proxy url for <scheme>.

REPORTING BUGS

Bug reports shoult be sent to the mailing list <webcheck-users@lists.arthurdejong.org>.
More information on reporting bugs can be found on the webcheck homepage:
http://arthurdejong.org/webcheck/

COPYRIGHT

Copyright © 1998, 1999 Albert Hopkins (marduk)
Copyright © 2002 Mike W. Meyer
Copyright © 2005, 2006, 2007, 2008, 2009, 2010 Arthur de Jong
webcheck is free software; see the source for copying conditions. There is NO warranty;
not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
The files produced as output from the software do not automatically fall under the
copyright of the software, unless explicitly stated otherwise.

Use webcheck online using onworks.net services

Latest Linux & Windows online programs

gocql

GoCQL automatically discovers all data
centers, racks and hosts in your cluster
manages a pool of connections to them
and distributes queries in a reasonable
a...

Enter

LaTeX Template CN

LaTeX-Template-Cn is a curated
collection of Chinese (??) LaTeX
templates and configurations for
academic writing, reports, theses,
slides, and papers, adapted...

Enter

Deep Learning for Medical Applications

Deep-Learning-for-Medical-Applications
is a repository that compiles deep
learning methods, code implementations,
and examples applied to medical imaging
and h...

Enter

Flogger

Flogger is a structured logging API for
Java that decouples how you log from
where logs ultimately go, making it easy
to change backends without rewriting
appl...

Enter

CLIP

CLIP (Contrastive Language-Image
Pretraining) is a neural model that
links images and text in a shared
embedding space, allowing zero-shot
image classification...

Enter

Fuzzer Test Suite

The Fuzzer Test Suite is a collection
of real-world, bug-rich targets used to
evaluate and compare fuzzers under
controlled conditions. Rather than
synthetic m...

Enter

ChatGPT Retrieval Plugin

The chatgpt-retrieval-plugin repository
implements a semantic retrieval backend
that lets ChatGPT (or GPT-powered tools)
access private or organizational docum...

Enter

JSONAPI

jsonapi provides helpers and reference
code for working with the JSON:API
specification, focusing on predictable
serialization, deserialization, and
linkage of...

Enter

Baselines

Unlike the other two, openai/baselines
is not currently a maintained or
prominent repo in the OpenAI
organization (and I found no strong
reference in OpenAIs ...

Enter

webcheck - Online in the Cloud

PROGRAM:

NAME

SYNOPSIS

DESCRIPTION

URL CLASSES

EXAMPLES

NOTES

ENVIRONMENT

REPORTING BUGS

COPYRIGHT

Latest Linux & Windows online programs

Categories to download Software & Programs for Windows & Linux