EnglishFrenchSpanish

OnWorks favicon

slmbuild - Online in the Cloud

Run slmbuild in OnWorks free hosting provider over Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

This is the command slmbuild that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

PROGRAM:

NAME


slmbuild - generate language model from idngram file

SYNOPSIS


slmbuild [option]... idngram_file...

DESCRIPTION


slmbuild generates a back-off smoothing language model from a given idngram file.
Generally, the idngram_file is created by ids2ngram.

OPTIONS All the following options are mandatory.


-n,--NMax N
1 for unigram, 2 for bigram, 3 for trigram. Any number not in the range of 1..3 is not
valid.

-o, --out output-file
Specify the output xfilei name.

-l, --log
using -log(pr), use pr directly by default.

-w, --wordcount N
Lexican size, number of different words.

-b, --brk id...
Set the ids which should be treated as breaker.

-e, --e id...
Set the ids which should not be put into LM.

-c, --cut c...
k-grams whose freq <= c[k] are dropped.

-d, --discount method, param...
The k-th -d parm specifies the discount method

For k-gram, possibble values for method/param are:

B<GT>,I<R>,I<dis> : B<GT> discount for r E<lt>= I<R>, r is the freq of a ngram.
Linear discount for those r E<gt> I<R>, i.e. r'=r*dis
0 E<lt>E<lt> dis E<lt> 1.0, for example 0.999
B<ABS>,[I<dis>] : Absolute discount r'=r-I<dis>. And I<dis> is optional
0 E<lt>E<lt> I<dis> E<lt> cut[k]+1.0, normally I<dis> E<lt> 1.0.
LIN,[I<dis>] : Linear discount r'=r*dis. And dis is optional
0 E<lt> dis E<lt> 1.0

NOTE


-n must be given before -c -b. And -c must give right number of cut-off, also -ds must
appear exactly N times specifying the discounts for 1-gram, 2-gram..., respectively.

BREAKER-IDs could be SentenceTokens or ParagraphTokens. Conceptually, these ids have no
meaning when they appeared in the middle of n-gram.

EXCLUDE-IDs could be ambiguious-ids. Conceptually, n-grams which contain those ids are
meaningless.

We can not erase ngrams according to BREAKER-IDS and EXCLUDE-IDs directly from IDNGRAM
file, because some low-level information is still useful in it.

EXAMPLE


Following example read 'all.id3gram' and write trigram model 'all.slm'.

At 1-gram level, use Good-Turing discount with cut-off 0, i<R>=8, dis=0.9995. At 2-gram
level, use Absolute discount with cut-off 3, dis auto-calc. At 3-gram level, use Absolute
discount with cut-off 2, dis auto-calc. Word id 10,11,12 are breakers (sentence/para/paper
breaker, etc). Exclude-ID is 9. Lexicon contains 200000 words. The result languagme model
uses -log(pr).

slmbuild -l -n 3 -o all.slm -w 200000 -c 0,3,2 -d GT,8,0.9995 -d ABS -d ABS -b 10,11,12 -e
9 all.id3gram

Use slmbuild online using onworks.net services


Free Servers & Workstations

Download Windows & Linux apps

  • 1
    strace
    strace
    The strace project has been moved to
    https://strace.io. strace is a
    diagnostic, debugging and instructional
    userspace tracer for Linux. It is used
    to monitor a...
    Download strace
  • 2
    gMKVExtractGUI
    gMKVExtractGUI
    A GUI for mkvextract utility (part of
    MKVToolNix) which incorporates most (if
    not all) functionality of mkvextract and
    mkvinfo utilities. Written in C#NET 4.0,...
    Download gMKVExtractGUI
  • 3
    JasperReports Library
    JasperReports Library
    JasperReports Library is the
    world's most popular open source
    business intelligence and reporting
    engine. It is entirely written in Java
    and it is able to ...
    Download JasperReports Library
  • 4
    Frappe Books
    Frappe Books
    Frappe Books is a free and open source
    desktop book-keeping software that's
    simple and well-designed to be used by
    small businesses and freelancers. It'...
    Download Frappe Books
  • 5
    Numerical Python
    Numerical Python
    NEWS: NumPy 1.11.2 is the last release
    that will be made on sourceforge. Wheels
    for Windows, Mac, and Linux as well as
    archived source distributions can be fou...
    Download Numerical Python
  • 6
    CMU Sphinx
    CMU Sphinx
    CMUSphinx is a speaker-independent large
    vocabulary continuous speech recognizer
    released under BSD style license. It is
    also a collection of open source tools ...
    Download CMU Sphinx
  • More »

Linux commands

Ad