EnglishFrenchSpanish

Ad


OnWorks favicon

gocr - Online in the Cloud

Run gocr in OnWorks free hosting provider over Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

This is the command gocr that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

PROGRAM:

NAME


gocr - command line text recognition tool

SYNOPSIS


gocr [OPTION] [-i] pnm-file

DESCRIPTION


gocr is an optical character recognition program that can be used from the command line.
It takes input in PNM, PGM, PBM, PPM, or PCX format, and writes recognized text to stdout.
If the pnm file is a single dash, PNM data is read from stdin. If gzip, bzip2 and netpbm-
progs are installed and your system supports popen(3) also pnm.gz, pnm.bz2, png, jpg,
jpeg, tiff, gif, bmp, ps (only single pages) and eps are supported as input files (not as
input stream), where pnm can be replaced by one of ppm, pgm and pbm.

OPTIONS


-h show usage information

-i file
read input from file (or stdin if file is a single dash)

-o file
send output to file instead of stdout

-e file
send errors to file instead of stderr or to stdout if file is a dash

-x file
progress output to file (file can be a file name, a fifo name or a file descriptor
1...255), this is useful for GUI developpers to show the OCR progress, the file
descriptor argument is only available, if compiled with __USE_POSIX defined

-p path
database path, a final slash must be included, default is ./db/, this path will be
populated with images of learned characters

-f format
output format of the recognized text (ISO8859_1 TeX HTML XML UTF8 ASCII), XML will
also output position and probability data

-l level
set grey level to level (0<160<=255, default: 0 for autodetect), darker pixels
belong to characters, brighter pixels are interpreted as background of the input
image

-d size
set dust size in pixels (clusters smaller than this are removed), 0 means no
clusters are removed, the default is -1 for auto detection

-s num set spacewidth between words in units of dots (default: 0 for autodetect), wider
widths are interpreted as word spaces, smaller as character spaces

-v verbosity
be verbose to stderr; verbosity is a bitfield

-c string
only verbose output of characters from string to stderr, more output is generated
for all characters within the string, the underscore stands for unknown chars, this
function is usefull to limit debug information to the necessary one

-C string
only recognise characters from string, this is a filter function in cases where the
interest is only to a part of the character alphabet, you can use 0-9 or a-z to
specify ranges, use -- to detect the minus sign

-a certainty
set value for certainty of recognition (0..100; default: 95), characters with a
higher certainty are accepted, characters with a lower certainty are treated as
unknown (not recognized); set higher values, if you want to have only more certain
recognized characters

-u string
output this string for every unrecognized character (default is "_")

-m mode
set oprational mode; mode is a bitfield (default: 0)

-n bool
if bool is non-zero, only recognise numbers (this is now obsolete, use -C
"0123456789")

The verbosity is specified as a bitfield:

1 print more info

2 list shapes of boxes (see -c) to stderr

4 list pattern of boxes (see -c) to stderr

8 print pattern after recognition for debugging

16 print debug information about recognition of lines to stderr

32 create outXX.png with boxes and lines marked on each general OCR-step

The operation modes are:

2 use database to recognize characters which are not recognized by other
algorithms, (early development)

4 switching on layout analysis or zoning (development)

8 don't compare unrecognized characters to recognized one

16 don't try to divide overlapping characters to two or three single characters

32 don't do context correction

64 character packing, before recognition starts, similar characters are searched
and only one of this characters will be send to the recognition engine
(development)

130 extend database, prompts user for unidentified characters and extends the
database with users answer (128+2, early development)

256 switch off the recognition engine (makes sense together with -m 2)

Use gocr online using onworks.net services


Free Servers & Workstations

Download Windows & Linux apps

  • 1
    VASSAL Engine
    VASSAL Engine
    VASSAL is a game engine for creating
    electronic versions of traditional board
    and card games. It provides support for
    game piece rendering and interaction,
    and...
    Download VASSAL Engine
  • 2
    OpenPDF - Fork of iText
    OpenPDF - Fork of iText
    OpenPDF is a Java library for creating
    and editing PDF files with a LGPL and
    MPL open source license. OpenPDF is the
    LGPL/MPL open source successor of iText,
    a...
    Download OpenPDF - Fork of iText
  • 3
    SAGA GIS
    SAGA GIS
    SAGA - System for Automated
    Geoscientific Analyses - is a Geographic
    Information System (GIS) software with
    immense capabilities for geodata
    processing and ana...
    Download SAGA GIS
  • 4
    Toolbox for Java/JTOpen
    Toolbox for Java/JTOpen
    The IBM Toolbox for Java / JTOpen is a
    library of Java classes supporting the
    client/server and internet programming
    models to a system running OS/400,
    i5/OS, o...
    Download Toolbox for Java/JTOpen
  • 5
    D3.js
    D3.js
    D3.js (or D3 for Data-Driven Documents)
    is a JavaScript library that allows you
    to produce dynamic, interactive data
    visualizations in web browsers. With D3
    you...
    Download D3.js
  • 6
    Shadowsocks
    Shadowsocks
    A fast tunnel proxy that helps you
    bypass firewalls This is an application
    that can also be fetched from
    https://sourceforge.net/projects/shadowsocksgui/.
    It ha...
    Download Shadowsocks
  • More »

Linux commands

Ad