hocr2djvused - Online in the Cloud

Run hocr2djvused in OnWorks free hosting provider over Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

This is the command hocr2djvused that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

Run in Ubuntu Run in Fedora Run in Windows Sim Run in MACOS Sim

PROGRAM:

NAME

hocr2djvused - hOCR to djvused script converter

SYNOPSIS

hocr2djvused [option...] [hocr-file...]

DESCRIPTION

hocr2djvused reads one or more hOCR[1] files (as produced by OCRopus[2] or Cuneiform[3] or
Tesseract[4]) and converts them to a djvused script.

Unless a filename is explicitly provided on the command line, hOCR is read from the
standard input.

OPTIONS

Text segmentation options
-t lines, --details lines
Record location of every line. Don't record locations of particular words or
characters.

-t words, --details=words
Record location of every line and every word. Don't record locations of particular
characters.

This is the default.

-t chars, --details=chars
Record location of every line, every word and every character.

--word-segmentation=simple
Consider each non-empty sequence of non-whitespace characters a single word.

This is the default, despite being linguistically incorrect.

--word-segmentation=uax29
Use the Unicode Text Segmentation[5] algorithm to break lines into words.

This options break assumptions of some DjVu tools that words are separated by spaces,
and therefore is it not recommended.

Other options
--rotation=n
Assume that DjVu pages are rotated by n degrees.

--page-size=widthxheight
Specifies that page size is width pixels × height pixels.

This option is required for hOCR generated by Cuneiform (< 0.8) and superfluous
otherwise.

--html5
Use a HTML5 parser[6], which is more robust but slower than the default parser.

--fix-utf8
Attempt to fix UTF-8 encoding issues and eliminate unwanted control characters.

This option might be needed for hOCR generated by Cuneiform[7] or Tesseract[8].

--version
Output version information and exit.

-h, --help
Display help and exit.

Use hocr2djvused online using onworks.net services