OnWorks favicon

mlpack_kmeans - Online in the Cloud

Run mlpack_kmeans in OnWorks free hosting provider over Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

This is the command mlpack_kmeans that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator



mlpack_kmeans - k-means clustering


mlpack_kmeans [-h] [-v] -c int -i string [-a string] [-e] [-C string] [-P] [-I string] [-l] [-m int] [-o string] [-p double] [-r] [-S int] [-s int] -V


This program performs K-Means clustering on the given dataset, storing the learned cluster
assignments either as a column of labels in the file containing the input dataset or in a
separate file. Empty clusters are not allowed by default; when a cluster becomes empty,
the point furthest from the centroid of the cluster with maximum variance is taken to fill
that cluster.

Optionally, the Bradley and Fayyad approach ("Refining initial points for k-means
clustering", 1998) can be used to select initial points by specifying the --refined_start
(-r) option. This approach works by taking random samples of the dataset; to specify the
number of samples, the --samples parameter is used, and to specify the percentage of the
dataset to be used in each sample, the --percentage parameter is used (it should be a
value between 0.0 and 1.0).

There are several options available for the algorithm used for each Lloyd iteration,
specified with the --algorithm (-a) option. The standard O(kN) approach can be used
('naive'). Other options include the Pelleg-Moore tree-based algorithm ('pelleg-moore'),
Elkan's triangle-inequality based algorithm ('elkan'), Hamerly's modification to Elkan's
algorithm ('hamerly'), the dual-tree k-means algorithm ('dualtree'), and the dual-tree k-
means algorithm using the cover tree ('dualtree-covertree').

As of October 2014, the --overclustering option has been removed. If you want this support
back, let us know -- file a bug at https://github.com/mlpack/mlpack/ or get in touch
through another means.


--clusters (-c) [int]
Number of clusters to find (0 autodetects from initial centroids).

--input_file (-i) [string]
Input dataset to perform clustering on.


--algorithm (-a) [string]
Algorithm to use for the Lloyd iteration ('naive', 'pelleg-moore', 'elkan',
'hamerly', 'dualtree', or 'dualtree-covertree'). Default value 'naive'.

--allow_empty_clusters (-e)
Allow empty clusters to be created.

--centroid_file (-C) [string]
If specified, the centroids of each cluster will be written to the given file.
Default value ''.

--help (-h)
Default help info.

--in_place (-P)
If specified, a column containing the learned cluster assignments will be added to
the input dataset file. In this case, --outputFile is overridden.

--info [string]
Get help on a specific module or option. Default value ''.

--initial_centroids (-I) [string]
Start with the specified initial centroids. Default value ''.

--labels_only (-l)
Only output labels into output file.

--max_iterations (-m) [int]
Maximum number of iterations before K-Means terminates. Default value 1000.

--output_file (-o) [string]
File to write output labels or labeled data to. Default value ''.

--percentage (-p) [double]
Percentage of dataset to use for each refined start sampling (use when
--refined_start is specified). Default value 0.02.

--refined_start (-r)
Use the refined initial point strategy by Bradley and Fayyad to choose initial

--samplings (-S) [int]
Number of samplings to perform for refined start

(use when --refined_start is specified).
Default value 100.

--seed (-s) [int]
Random seed. If 0, 'std::time(NULL)' is used. Default value 0.

--verbose (-v)
Display informational messages and the full list of parameters and timers at the
end of execution.

--version (-V)
Display the version of mlpack.


For further information, including relevant papers, citations, and theory, consult the
documentation found at http://www.mlpack.org or included with your DISTRIBUTION OF MLPACK.


Use mlpack_kmeans online using onworks.net services

Free Servers & Workstations

Download Windows & Linux apps

  • 1
    The strace project has been moved to
    https://strace.io. strace is a
    diagnostic, debugging and instructional
    userspace tracer for Linux. It is used
    to monitor a...
    Download strace
  • 2
    A GUI for mkvextract utility (part of
    MKVToolNix) which incorporates most (if
    not all) functionality of mkvextract and
    mkvinfo utilities. Written in C#NET 4.0,...
    Download gMKVExtractGUI
  • 3
    JasperReports Library
    JasperReports Library
    JasperReports Library is the
    world's most popular open source
    business intelligence and reporting
    engine. It is entirely written in Java
    and it is able to ...
    Download JasperReports Library
  • 4
    Frappe Books
    Frappe Books
    Frappe Books is a free and open source
    desktop book-keeping software that's
    simple and well-designed to be used by
    small businesses and freelancers. It'...
    Download Frappe Books
  • 5
    Numerical Python
    Numerical Python
    NEWS: NumPy 1.11.2 is the last release
    that will be made on sourceforge. Wheels
    for Windows, Mac, and Linux as well as
    archived source distributions can be fou...
    Download Numerical Python
  • 6
    CMU Sphinx
    CMU Sphinx
    CMUSphinx is a speaker-independent large
    vocabulary continuous speech recognizer
    released under BSD style license. It is
    also a collection of open source tools ...
    Download CMU Sphinx
  • 7
    Old Feren OS Repositories
    Old Feren OS Repositories
    This was the Official Repository for
    Feren OS. To add the latest one, run
    this command: (16.04-based) echo
    "deb ...
    Download Old Feren OS Repositories
  • More »

Linux commands