EnglishFrenchSpanish

OnWorks favicon

pdfgrep - Online in the Cloud

Run pdfgrep in OnWorks free hosting provider over Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

This is the command pdfgrep that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

PROGRAM:

NAME


pdfgrep - search pdf files for a regular expression

SYNOPSIS


pdfgrep [OPTION...] PATTERN [FILE...]

DESCRIPTION


Search for PATTERN in each FILE. PATTERN is an extended regular expression.

pdfgrep works much like grep, with one distinction: It operates on pages and not on lines.

OPTIONS


-i, --ignore-case
Ignore case distinctions in both the PATTERN and the input files.

-F, --fixed-strings
Interpret PATTERN as a list of fixed strings separated by newlines, any of which is to
be matched.

-P, --perl-regexp
Interpret PATTERN as a Perl compatible regular expression (PCRE). See pcresyntax(3)
for a quick overview.

-H, --with-filename
Print the file name for each match. This is the default setting when there is more
than one file to search.

-h, --no-filename
Suppress the prefixing of file name on output. This is the default setting when there
is only one file to search.

-n, --page-number
Prefix each match with the number of the page where it was found.

-c, --count
Suppress normal output. Instead print the number of matches for each input file. Note
that unlike grep, multiple matches on the same page will be counted individually.

-p, --page-count
Like -c, but prints the number of matches per page.

-C, --context NUM
Print at most INUM characters of context around each match. The exact number will
vary, because pdfgrep tries to respect word boundaries. If NUM is "line", the whole
line will be printed. If this option is not set, pdfgrep tries to print lines that are
not longer than the terminal width.

--color WHEN
Surround file names, page numbers and matched text with escape sequences to display
them in color on the terminal. (The default setting is auto). WHEN can be:

always
Always use colors, even when stdout is not a terminal.

never
Do not use colors.

auto
Use colors only when stdout is a terminal.

-o, --only-matching
Print only the matched part of a line without any surrounding context.

-r, --recursive
Recursively search all files (restricted by --include and --exclude) under each
directory, following symlinks only if they are on the command line.

-R, --dereference-recursive
Same as -r, but follows all symlinks.

--exclude=GLOB
Skip files whose base name matches GLOB. See glob(7) for wildcards you can use. You
can use this option multiple times to exclude more patterns. It takes precedence over
--include. Note, that in- and excludes apply only to files found via --recursive and
not to the argument list.

--include=GLOB
Only search files whose base name matches GLOB. See --exclude for details. The default
is *.pdf.

--password=PASSWORD
Use PASSWORD to decrypt the PDF-files. Can be specified multiple times; all passwords
will be tried on all PDFs. Note that this password will show up in your command
history and the output of ps(1). So please do not use this if the security of PASSWORD
is important.

-m, --max-count NUM
Stop reading a file after NUM matches. When the -c or --count option is also used,
pdfgrep does not output a count greater than NUM.

-Z, --null
Output a null byte (called NUL in ASCII and '\0' in C) instead of the colon that
usually separates a filename from the rest of the line. This option makes the output
unambiguous in the presence of colons, spaces or newlines in the filename. It can be
used in conjunction with commands such as xargs -0 or perl -0.

--match-prefix-separator SEP
Changes the colon used to separate filename, line number and text in the output to
SEP, which can be an arbitrary string. This is useful when filenames contain colons,
but only for interactive usage. For scripting, --null should be used.

--debug
Enable debug output. Note: Due to limitations of poppler before version 0.30.0, some
debug output is also printed without --debug when using such a poppler version.

--warn-empty
Print a warning to stderr if a PDF contains no searchable text. This is the case for
PDFs that consist only of images, for example scanned documents.

--unac
Remove accents and ligatures from both the search pattern and the PDF documents. This
is useful if you want to search for a word containing "ae", but the PDF uses the
single character "æ" instead. See unac(3) and unaccent(1) for details.

This option is experimental and only available if pdfgrep is compiled with unac
support.

-q, --quiet
Suppress all normal output to stdout. Errors will be printed and the exit codes will
be returned (see below).

--help
Print a short summary of the options.

-V, --version Show version information.

EXIT STATUS


Normally, the exit status is 0 if at least one match is found, 1 if no match is found and
2 if an error occurred. But if the --quiet or -q option is used and a match was found,
pdfgrep will return 0 regardless of errors.

ENVIRONMENT VARIABLES


The behavior of pdfgrep is affected by the following environment variable.

GREP_COLORS
Specifies the colors and other attributes used to highlight various parts of the
output. The syntax and values are like GREP_COLORS of grep. See grep(1) for more
details. Currently only the capabilities mt, ms, mc, fn, ln and se are used by
pdfgrep, where mt, ms and mc have the same effect.

EXAMPLES


Print the first ten lines matching pattern and print their page number

pdfgrep -n --max-count 10 pattern foo.pdf

Search all .pdf files whose names begin with foo recursively in the current directory

pdfgrep -r --include "foo*.pdf" pattern

Search all .pdf files that are smaller than 12M recursively in the current directory

find . -name "*.pdf" -size -12M -print0 | xargs -0 pdfgrep pattern

Note that in contrast to the previous examples, this task could not be solved with
pdfgrep alone, but the Unix tools find(1) and xargs(1) had to be used. That’s because
pdfgrep itself doesn’t include options to exclude files by their size. But as you see,
it doesn’t have to!

Use pdfgrep online using onworks.net services


Free Servers & Workstations

Download Windows & Linux apps

  • 1
    strace
    strace
    The strace project has been moved to
    https://strace.io. strace is a
    diagnostic, debugging and instructional
    userspace tracer for Linux. It is used
    to monitor a...
    Download strace
  • 2
    gMKVExtractGUI
    gMKVExtractGUI
    A GUI for mkvextract utility (part of
    MKVToolNix) which incorporates most (if
    not all) functionality of mkvextract and
    mkvinfo utilities. Written in C#NET 4.0,...
    Download gMKVExtractGUI
  • 3
    JasperReports Library
    JasperReports Library
    JasperReports Library is the
    world's most popular open source
    business intelligence and reporting
    engine. It is entirely written in Java
    and it is able to ...
    Download JasperReports Library
  • 4
    Frappe Books
    Frappe Books
    Frappe Books is a free and open source
    desktop book-keeping software that's
    simple and well-designed to be used by
    small businesses and freelancers. It'...
    Download Frappe Books
  • 5
    Numerical Python
    Numerical Python
    NEWS: NumPy 1.11.2 is the last release
    that will be made on sourceforge. Wheels
    for Windows, Mac, and Linux as well as
    archived source distributions can be fou...
    Download Numerical Python
  • 6
    CMU Sphinx
    CMU Sphinx
    CMUSphinx is a speaker-independent large
    vocabulary continuous speech recognizer
    released under BSD style license. It is
    also a collection of open source tools ...
    Download CMU Sphinx
  • More »

Linux commands

Ad