sumaclust - Online in the Cloud

This is the command sumaclust that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

Run in Ubuntu Run in Fedora Run in Windows Sim Run in MACOS Sim

PROGRAM:

NAME

sumaclust - star clustering of genetic sequences

SYNOPSIS

sumaclust [options] <dataset>

DESCRIPTION

With the development of next-generation sequencing, efficient tools are needed to handle
millions of sequences in reasonable amounts of time. Sumaclust is a program developed by
the LECA. Sumaclust aims to cluster sequences in a way that is fast and exact at the same
time. This tool has been developed to be adapted to the type of data generated by DNA
metabarcoding, i.e. entirely sequenced, short markers. Sumaclust clusters sequences using
the same clustering algorithm as UCLUST and CD- HIT. This algorithm is mainly useful to
detect the 'erroneous' sequences created during amplification and sequencing protocols,
deriving from 'true' sequences.

OPTIONS

-h [H]elp - print <this> help

-l : Reference sequence length is the shortest.

-L Reference sequence length is the largest.

-a Reference sequence length is the alignment length (default).

-n Score is normalized by reference sequence length (default).

-r : Raw score, not normalized.

-d : Score is expressed in distance (default : score is expressed in similarity).

-t ##.## : Score threshold for clustering. If the score is normalized and expressed in
similarity (default),

it is an identity, e.g. 0.95 for an identity of 95%. If the score is normalized and
expressed in distance, it is (1.0 - identity), e.g. 0.05 for an identity of 95%.
If the score is not normalized and expressed in similarity, it is the length of the
Longest Common Subsequence. If the score is not normalized and expressed in
distance, it is (reference length - LCS length). Only sequences with a similarity
above ##.## with the center sequence of a cluster are assigned to that cluster.
Default: 0.97.

-e Exact option: A sequence is assigned to the cluster with the center sequence
presenting the highest similarity score > threshold, as opposed to the default
'fast' option where a sequence is assigned to the first cluster found with a center
sequence presenting a score > threshold.

-R ## Maximum ratio between the counts of two sequences so that the less abundant one can
be considered as a variant of the more abundant one. Default: 1.0.

-p ## Multithreading with ## threads using openMP.

-s ####
Sorting by ####. Must be 'None' for no sorting, or a key in the fasta header of
each sequence, except for the count that can be computed (default : sorting by
count).

-o Sorting is in ascending order (default : descending).

-g n's are replaced with a's (default: sequences with n's are discarded).

-B ### Output of the OTU table in BIOM format is activated, and written to file ###.

-O ### Output of the OTU map (observation map) is activated, and written to file ###.

-F ### Output in FASTA format is written to file ### instead of standard output.

-f Output in FASTA format is deactivated.

Argument : the nucleotide dataset to cluster

Use sumaclust online using onworks.net services

Latest Linux & Windows online programs

Cloud Native Landscape

Cloud Native Landscape is an
open-source project that serves as a
comprehensive visual map and database of
the cloud-native ecosystem, cataloging
hundreds of t...

Enter

code-server

code-server converts VS Code, the
worlds most popular IDE, into a cloud
IDE. This means you can essentially code
on any device you choose with a
consistent de...

Enter

Cua

Cua is an open-source command-line
utility and workflow orchestrator
designed to help developers define,
compose, and run common tasks with a
unified interface...

Enter

InfluxDB

InfluxDB is an open source time series
datastore designed to handle high write
and query loads. Time series is
currently the fastest growing database
category ...

Enter

Metabase

Metabase is the easiest way to let
everyone in your company access business
data and analytics, learn from it and
ask questions. Even if you or your
colleagues...

Enter

LMCache

LMCache is an extension layer for LLM
serving engines that accelerates
inference, especially with long
contexts, by storing and reusing
key-value (KV) attentio...

Enter

Mongoose

Mongoose is a MongoDB object modeling
tool that was built to answer the need
for better ways to model your
application data. It's designed to
work in an as...

Enter

XGBoost

XGBoost is an optimized distributed
gradient boosting library, designed to
be scalable, flexible, portable and
highly efficient. It supports
regression, classi...

Enter

electerm

electerm is a terminal/ssh/sftp client
(Linux, Mac, Win) based on
electron/ssh2/node-pty/xterm/antd/subx
and many other libs. Works as a
terminal/file manager ...

Enter

sumaclust - Online in the Cloud

PROGRAM:

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

Latest Linux & Windows online programs

Categories to download Software & Programs for Windows & Linux