gt-hop - Online in the Cloud

This is the command gt-hop that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

PROGRAM:

NAME


gt-hop - Cognate sequence-based homopolymer error correction.

SYNOPSIS


gt hop -<mode> -c <encseq> -map <sam/bam> -reads <fastq> [options...]

DESCRIPTION


-c [string]
cognate sequence (encoded using gt encseq encode)

-map [string]
mapping of reads to the cognate sequence it must be in SAM/BAM format, and sorted by
coordinate (can be prepared e.g. using: samtools sort)

-sam [yes|no]
mapping file is SAM default: BAM

-aggressive [yes|no]
correct as much as possible

-moderate [yes|no]
mediate between sensitivity and precision

-conservative [yes|no]
correct only most likely errors

-expert [yes|no]
manually select correction criteria

-reads
uncorrected read file(s) in FastQ format; the corrected reads are output in the
currect working directory in files which are named as the input files, each prepended
by a prefix (see -outprefix option) -reads allows one to output the reads in the same
order as in the input and is mandatory if the SAM contains more than a single primary
alignment for each read (e.g. output of bwasw) see also -o option as an alternative

-outprefix [string]
prefix for output filenames (corrected reads)when -reads is specified the prefix is
prepended to each input filename (default: hop_)

-o [string]
output file for corrected reads (see also -reads/-outprefix) if -o is used, reads are
output in a single file in the order they are found in the SAM file (which usually
differ from the original order) this will only work if the reads were aligned with a
software which only includes 1 alignment for each read (e.g. bwa) (default: undefined)

-hmin [value]
minimal homopolymer length in cognate sequence (default: 3)

-read-hmin [value]
minimal homopolymer length in reads (default: 2)

-qmax [value]
maximal average quality of homopolymer in a read (default: 120)

-altmax [value]
max support of alternate homopol. length; e.g. 0.8 means: do not correct any read if
homop. length in more than 80%% of the reads has the same value, different from the
cognate if altmax is set to 1.0 reads are always corrected (default: 0.800000)

-cogmin [value]
min support of cognate sequence homopol. length; e.g. 0.1 means: do not correct any
read if cognate homop. length is not present in at least 10%% of the reads if cogmin
is set to 0.0 reads are always corrected

-mapqmin [value]
minimal mapping quality (default: 21)

-covmin [value]
minimal coverage; e.g. 5 means: do not correct any read if coverage (number of reads
mapped over whole homopolymer) is less than 5 if covmin is set to 1 reads are always
corrected (default: 1)

-allow-muliple [yes|no]
allow multiple corrections in a read (default: no)

-clenmax [value]
maximal correction length default: unlimited

-ann [string]
annotation of cognate sequence it must be sorted by coordinates on the cognate
sequence (this can be e.g. done using: gt gff3 -sort) if -ann is used, corrections
will be limited to homopolymers startingor ending inside the feature type indicated by
-ft optionformat: sorted GFF3 (default: undefined)

-ft [string]
feature type to use when -ann option is specified (default: CDS)

-v [yes|no]
be verbose (default: no)

-help
display help for basic options and exit

-help+
display help for all options and exit

-version
display version information and exit

Correction mode:

One of the options -aggressive, -moderate, -conservative or -expert must be selected.

The -aggressive, -moderate and -conservative modes are presets of the criteria by which it
is decided if an observed discrepancy in homopolymer length between cognate sequence and a
read shall be corrected or not. A description of the single criteria is provided by using
the -help+' option. The presets are equivalent to the following settings:

-aggressive -moderate -conservative
-hmin 3 3 3
-read-hmin 1 1 2
-altmax 1.00 0.99 0.80
-refmin 0.00 0.00 0.10
-mapqmin 0 10 21
-covmin 1 1 1
-clenmax unlimited unlimited unlimited
-allow-multiple yes yes no

The aggressive mode tries to maximize the sensitivity, the conservative mode to minimize
the false positives. An even more conservative set of corrections can be achieved using
the -ann option (see -help+).

The -expert mode allows one to manually set each parameter; the default values are the
same as in the -conservative mode.

(Finally, for evaluation purposes only, the -state-of-truth mode can be used: this mode
assumes that the sequenced genome has been specified as cognate sequence and outputs an
ideal list of corrections.)

REPORTING BUGS


Report bugs to <gt-users@genometools.org>.

Use gt-hop online using onworks.net services



Latest Linux & Windows online programs