EnglishFrenchSpanish

Ad


OnWorks favicon

mlpack_nca - Online in the Cloud

Run mlpack_nca in OnWorks free hosting provider over Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

This is the command mlpack_nca that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

PROGRAM:

NAME


mlpack_nca - neighborhood components analysis (nca)

SYNOPSIS


mlpack_nca [-h] [-v] -i string -o string [-A double] [-l string] [-L] [-n int] [-T int] [-M double] [-m double] [-N] [-B int] [-O string] [-s int] [-a double] [-t double] [-V] [-w double]

DESCRIPTION


This program implements Neighborhood Components Analysis, both a linear dimensionality
reduction technique and a distance learning technique. The method seeks to improve k-
nearest-neighbor classification on a dataset by scaling the dimensions. The method is
nonparametric, and does not require a value of k. It works by using stochastic ("soft")
neighbor assignments and using optimization techniques over the gradient of the accuracy
of the neighbor assignments.

To work, this algorithm needs labeled data. It can be given as the last row of the input
dataset (--input_file), or alternatively in a separate file (--labels_file).

This implementation of NCA uses either stochastic gradient descent or the L_BFGS
optimizer. Both of these optimizers do not guarantee global convergence for a nonconvex
objective function (NCA's objective function is nonconvex), so the final results could
depend on the random seed or other optimizer parameters.

Stochastic gradient descent, specified by --optimizer "sgd", depends primarily on two
parameters: the step size (--step_size) and the maximum number of iterations
(--max_iterations). In addition, a normalized starting point can be used (--normalize),
which is necessary if many warnings of the form ’Denominator of p_i is 0!' are given.
Tuning the step size can be a tedious affair. In general, the step size is too large if
the objective is not mostly uniformly decreasing, or if zero-valued denominator warnings
are being issued. The step size is too small if the objective is changing very slowly.
Setting the termination condition can be done easily once a good step size parameter is
found; either increase the maximum iterations to a large number and allow SGD to find a
minimum, or set the maximum iterations to 0 (allowing infinite iterations) and set the
tolerance (--tolerance) to define the maximum allowed difference between objectives for
SGD to terminate. Be careful -- setting the tolerance instead of the maximum iterations
can take a very long time and may actually never converge due to the properties of the SGD
optimizer.

The L-BFGS optimizer, specified by --optimizer "lbfgs", uses a back-tracking line search
algorithm to minimize a function. The following parameters are used by L-BFGS: --num_basis
(specifies the number of memory points used by L-BFGS), --max_iterations,
--armijo_constant, --wolfe, --tolerance (the optimization is terminated when the gradient
norm is below this value), --max_line_search_trials, --min_step and --max_step (which both
refer to the line search routine). For more details on the L-BFGS optimizer, consult
either the mlpack L-BFGS documentation (in lbfgs.hpp) or the vast set of published
literature on L-BFGS.

By default, the SGD optimizer is used.

REQUIRED OPTIONS


--input_file (-i) [string]
Input dataset to run NCA on.

--output_file (-o) [string]
Output file for learned distance matrix.

OPTIONS


--armijo_constant (-A) [double]
Armijo constant for L-BFGS. Default value 0.0001.

--help (-h)
Default help info.

--info [string]
Get help on a specific module or option. Default value ''.

--labels_file (-l) [string]
File of labels for input dataset. Default value ''.

--linear_scan (-L)
Don't shuffle the order in which data points are visited for SGD.

--max_iterations (-n) [int]
Maximum number of iterations for SGD or L-BFGS (0 indicates no limit). Default
value 500000.

--max_line_search_trials (-T) [int]
Maximum number of line search trials for L-BFGS. Default value 50.

--max_step (-M) [double]
Maximum step of line search for L-BFGS. Default value 1e+20.

--min_step (-m) [double]
Minimum step of line search for L-BFGS. Default value 1e-20.

--normalize (-N)
Use a normalized starting point for optimization. This is useful for when points
are far apart, or when SGD is returning NaN.

--num_basis (-B) [int]
Number of memory points to be stored for L-BFGS. Default value 5.

--optimizer (-O) [string]
Optimizer to use; "sgd" or "lbfgs". Default value 'sgd'.

--seed (-s) [int]
Random seed. If 0, 'std::time(NULL)' is used. Default value 0.

--step_size (-a) [double]
Step size for stochastic gradient descent (alpha). Default value 0.01.

--tolerance (-t) [double]
Maximum tolerance for termination of SGD or L-BFGS. Default value 1e-07.

--verbose (-v)
Display informational messages and the full list of parameters and timers at the
end of execution.

--version (-V)
Display the version of mlpack.

--wolfe (-w) [double]
Wolfe condition parameter for L-BFGS. Default value 0.9.

ADDITIONAL INFORMATION


For further information, including relevant papers, citations, and theory, consult the
documentation found at http://www.mlpack.org or included with your DISTRIBUTION OF MLPACK.

mlpack_nca(1)

Use mlpack_nca online using onworks.net services


Free Servers & Workstations

Download Windows & Linux apps

  • 1
    VASSAL Engine
    VASSAL Engine
    VASSAL is a game engine for creating
    electronic versions of traditional board
    and card games. It provides support for
    game piece rendering and interaction,
    and...
    Download VASSAL Engine
  • 2
    OpenPDF - Fork of iText
    OpenPDF - Fork of iText
    OpenPDF is a Java library for creating
    and editing PDF files with a LGPL and
    MPL open source license. OpenPDF is the
    LGPL/MPL open source successor of iText,
    a...
    Download OpenPDF - Fork of iText
  • 3
    SAGA GIS
    SAGA GIS
    SAGA - System for Automated
    Geoscientific Analyses - is a Geographic
    Information System (GIS) software with
    immense capabilities for geodata
    processing and ana...
    Download SAGA GIS
  • 4
    Toolbox for Java/JTOpen
    Toolbox for Java/JTOpen
    The IBM Toolbox for Java / JTOpen is a
    library of Java classes supporting the
    client/server and internet programming
    models to a system running OS/400,
    i5/OS, o...
    Download Toolbox for Java/JTOpen
  • 5
    D3.js
    D3.js
    D3.js (or D3 for Data-Driven Documents)
    is a JavaScript library that allows you
    to produce dynamic, interactive data
    visualizations in web browsers. With D3
    you...
    Download D3.js
  • 6
    Shadowsocks
    Shadowsocks
    A fast tunnel proxy that helps you
    bypass firewalls This is an application
    that can also be fetched from
    https://sourceforge.net/projects/shadowsocksgui/.
    It ha...
    Download Shadowsocks
  • More »

Linux commands

Ad