This is the command emmae that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

**PROGRAM:**

**NAME**

emma - Multiple sequence alignment (ClustalW wrapper)

**SYNOPSIS**

**emma**

**-sequence**

__seqall__[

**-onlydend**

__toggle__]

**-dend**

__toggle__

**-dendfile**

__infile__[

**-slow**

__toggle__]

**-pwmatrix**

__list__

**-pwdnamatrix**

__list__

**-usermatrix**

__variable__

**-pairwisedatafile**

__infile__

**-matrix**

__list__

**-usermamatrix**

__variable__

**-dnamatrix**

__list__

**-umamatrix**

__variable__

**-mamatrixfile**

__infile__

**-pwgapopen**

__float__

**-pwgapextend**

__float__

**-ktup**

__integer__

**-gapw**

__integer__

**-topdiags**

__integer__

**-window**

__integer__

**-nopercent**

__boolean__[

**-gapopen**

__float__]

[

**-gapextend**

__float__] [

**-endgaps**

__boolean__] [

**-gapdist**

__integer__]

**-norgap**

__boolean__

**-hgapres**

__string__

**-nohgap**

__boolean__[

**-maxdiv**

__integer__]

**-outseq**

__seqoutset__

**-dendoutfile**

__outfile__

**emma**

**-help**

**DESCRIPTION**

**emma**is a command line program from EMBOSS (“the European Molecular Biology Open Software

Suite”). It is part of the "Alignment:Multiple" command group(s).

**OPTIONS**

**Input**

**section**

**-sequence**

__seqall__

**-onlydend**

__toggle__

Default value: N

**-dend**

__toggle__

Default value: N

**-dendfile**

__infile__

**-slow**

__toggle__

A distance is calculated between every pair of sequences and these are used to

construct the dendrogram which guides the final multiple alignment. The scores are

calculated from separate pairwise alignments. These can be calculated using 2 methods:

dynamic programming (slow but accurate) or by the method of Wilbur and Lipman

(extremely fast but approximate). The slow-accurate method is fine for short sequences

but will be VERY SLOW for many (e.g. >100) long (e.g. >1000 residue) sequences.

Default value: Y

**Pairwise**

**align**

**options**

**-pwmatrix**

__list__

The scoring table which describes the similarity of each amino acid to each other.

There are three 'in-built' series of weight matrices offered. Each consists of several

matrices which work differently at different evolutionary distances. To see the exact

details, read the documentation. Crudely, we store several matrices in memory,

spanning the full range of amino acid distance (from almost identical sequences to

highly divergent ones). For very similar sequences, it is best to use a strict weight

matrix which only gives a high score to identities and the most favoured conservative

substitutions. For more divergent sequences, it is appropriate to use 'softer'

matrices which give a high score to many other frequent substitutions. 1) BLOSUM

(Henikoff). These matrices appear to be the best available for carrying out data base

similarity (homology searches). The matrices used are: Blosum80, 62, 45 and 30. 2) PAM

(Dayhoff). These have been extremely widely used since the late '70s. We use the PAM

120, 160, 250 and 350 matrices. 3) GONNET . These matrices were derived using almost

the same procedure as the Dayhoff one (above) but are much more up to date and are

based on a far larger data set. They appear to be more sensitive than the Dayhoff

series. We use the GONNET 40, 80, 120, 160, 250 and 350 matrices. We also supply an

identity matrix which gives a score of 1.0 to two identical amino acids and a score of

zero otherwise. This matrix is not very useful. Default value: b

**-pwdnamatrix**

__list__

The scoring table which describes the scores assigned to matches and mismatches

(including IUB ambiguity codes). Default value: i

**-usermatrix**

__variable__

**-pairwisedatafile**

__infile__

**Matrix**

**options**

**-matrix**

__list__

This gives a menu where you are offered a choice of weight matrices. The default for

proteins is the PAM series derived by Gonnet and colleagues. Note, a series is used!

The actual matrix that is used depends on how similar the sequences to be aligned at

this alignment step are. Different matrices work differently at each evolutionary

distance. There are three 'in-built' series of weight matrices offered. Each consists

of several matrices which work differently at different evolutionary distances. To see

the exact details, read the documentation. Crudely, we store several matrices in

memory, spanning the full range of amino acid distance (from almost identical

sequences to highly divergent ones). For very similar sequences, it is best to use a

strict weight matrix which only gives a high score to identities and the most favoured

conservative substitutions. For more divergent sequences, it is appropriate to use

'softer' matrices which give a high score to many other frequent substitutions. 1)

BLOSUM (Henikoff). These matrices appear to be the best available for carrying out

data base similarity (homology searches). The matrices used are: Blosum80, 62, 45 and

30. 2) PAM (Dayhoff). These have been extremely widely used since the late '70s. We

use the PAM 120, 160, 250 and 350 matrices. 3) GONNET . These matrices were derived

using almost the same procedure as the Dayhoff one (above) but are much more up to

date and are based on a far larger data set. They appear to be more sensitive than the

Dayhoff series. We use the GONNET 40, 80, 120, 160, 250 and 350 matrices. We also

supply an identity matrix which gives a score of 1.0 to two identical amino acids and

a score of zero otherwise. This matrix is not very useful. Alternatively, you can read

in your own (just one matrix, not a series). Default value: b

**-usermamatrix**

__variable__

**-dnamatrix**

__list__

This gives a menu where a single matrix (not a series) can be selected. Default value:

i

**-umamatrix**

__variable__

**-mamatrixfile**

__infile__

**Additional**

**section**

**Slow**

**align**

**options**

**-pwgapopen**

__float__

The penalty for opening a gap in the pairwise alignments. Default value: 10.0

**-pwgapextend**

__float__

The penalty for extending a gap by 1 residue in the pairwise alignments. Default

value: 0.1

**Fast**

**align**

**options**

**-ktup**

__integer__

This is the size of exactly matching fragment that is used. INCREASE for speed (max= 2

for proteins; 4 for DNA), DECREASE for sensitivity. For longer sequences (e.g. >1000

residues) you may need to increase the default. Default value: @($(acdprotein)?1:2)

**-gapw**

__integer__

This is a penalty for each gap in the fast alignments. It has little affect on the

speed or sensitivity except for extreme values. Default value: @($(acdprotein)?3:5)

**-topdiags**

__integer__

The number of k-tuple matches on each diagonal (in an imaginary dot-matrix plot) is

calculated. Only the best ones (with most matches) are used in the alignment. This

parameter specifies how many. Decrease for speed; increase for sensitivity. Default

value: @($(acdprotein)?5:4)

**-window**

__integer__

This is the number of diagonals around each of the 'best' diagonals that will be used.

Decrease for speed; increase for sensitivity. Default value: @($(acdprotein)?5:4)

**-nopercent**

__boolean__

Default value: N

**Gap**

**options**

**-gapopen**

__float__

The penalty for opening a gap in the alignment. Increasing the gap opening penalty

will make gaps less frequent. Default value: 10.0

**-gapextend**

__float__

The penalty for extending a gap by 1 residue. Increasing the gap extension penalty

will make gaps shorter. Terminal gaps are not penalised. Default value: 5.0

**-endgaps**

__boolean__

End gap separation: treats end gaps just like internal gaps for the purposes of

avoiding gaps that are too close (set by 'gap separation distance'). If you turn this

off, end gaps will be ignored for this purpose. This is useful when you wish to align

fragments where the end gaps are not biologically meaningful. Default value: Y

**-gapdist**

__integer__

Gap separation distance: tries to decrease the chances of gaps being too close to each

other. Gaps that are less than this distance apart are penalised more than other gaps.

This does not prevent close gaps; it makes them less frequent, promoting a block-like

appearance of the alignment. Default value: 8

**-norgap**

__boolean__

Residue specific penalties: amino acid specific gap penalties that reduce or increase

the gap opening penalties at each position in the alignment or sequence. As an

example, positions that are rich in glycine are more likely to have an adjacent gap

than positions that are rich in valine. Default value: N

**-hgapres**

__string__

This is a set of the residues 'considered' to be hydrophilic. It is used when

introducing Hydrophilic gap penalties. Default value: GPSNDQEKR

**-nohgap**

__boolean__

Hydrophilic gap penalties: used to increase the chances of a gap within a run (5 or

more residues) of hydrophilic amino acids; these are likely to be loop or random coil

regions where gaps are more common. The residues that are 'considered' to be

hydrophilic are set by '-hgapres'. Default value: N

**-maxdiv**

__integer__

This switch, delays the alignment of the most distantly related sequences until after

the most closely related sequences have been aligned. The setting shows the percent

identity level required to delay the addition of a sequence; sequences that are less

identical than this level to any other sequences will be aligned later. Default value:

30

**Output**

**section**

**-outseq**

__seqoutset__

**-dendoutfile**

__outfile__

Use emmae online using onworks.net services