This is the command hocr2djvused that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator
hocr2djvused - hOCR to djvused script converter
hocr2djvused [option...] [hocr-file...]
hocr2djvused reads one or more hOCR files (as produced by OCRopus or Cuneiform or
Tesseract) and converts them to a djvused script.
Unless a filename is explicitly provided on the command line, hOCR is read from the
Text segmentation options
-t lines, --details lines
Record location of every line. Don't record locations of particular words or
-t words, --details=words
Record location of every line and every word. Don't record locations of particular
This is the default.
-t chars, --details=chars
Record location of every line, every word and every character.
Consider each non-empty sequence of non-whitespace characters a single word.
This is the default, despite being linguistically incorrect.
Use the Unicode Text Segmentation algorithm to break lines into words.
This options break assumptions of some DjVu tools that words are separated by spaces,
and therefore is it not recommended.
Assume that DjVu pages are rotated by n degrees.
Specifies that page size is width pixels × height pixels.
This option is required for hOCR generated by Cuneiform (< 0.8) and superfluous
Use a HTML5 parser, which is more robust but slower than the default parser.
Attempt to fix UTF-8 encoding issues and eliminate unwanted control characters.
This option might be needed for hOCR generated by Cuneiform or Tesseract.
Output version information and exit.
Display help and exit.
Use hocr2djvused online using onworks.net services