This is the command gmtregressgmt that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

**PROGRAM:**

**NAME**

gmtregress - Linear regression of 1-D data sets

**SYNOPSIS**

**gmtregress**[

__table__] [

__min__/

__max__/

__inc__] [

__level__] [

**x**|

**y**|

**o**|

**r**] [

__flags__] [

**1**|

**2**|

**r**|

**w**] [ [

**r**] ] [

__min__/

__max__/

__inc__|

__n__] [ [

**w**][

**x**][

**y**][

**r**] ] [ [

__level__] ] [

**-a**<flags> ] [

**-b**<binary> ] [

**-g**<gaps> ] [

**-h**<headers> ] [

**-i**<flags> ] [

**-o**<flags> ]

**Note:**No space is allowed between the option flag and the associated arguments.

**DESCRIPTION**

**gmtregress**reads one or more data tables [or

__stdin__] and determines the best linear

regression model

__y__=

__a__+

__b__*

__x__for each segment using the chosen parameters. The user may

specify which data and model components should be reported. By default, the model will be

evaluated at the input points, but alternatively you can specify an equidistant range over

which to evaluate the model, or turn off evaluation completely. Instead of determining

the best fit we can perform a scan of all possible regression lines (for a range of slope

angles) and examine how the chosen misfit measure varies with slope. This is particularly

useful when analyzing data with many outliers. Note: If you actually need to work with

log10 of

__x__or

__y__you can accomplish that transformation during read by using the

**-i**option.

**REQUIRED** **ARGUMENTS**

None

**OPTIONAL** **ARGUMENTS**

__table__One or more ASCII (or binary, see

**-bi**[

__ncols__][

__type__]) data table file(s) holding a

number of data columns. If no tables are given then we read from standard input.

The first two columns are expected to contain the required

__x__and

__y__data. Depending

on your

**-W**and

**-E**settings we may expect an additional 1-3 columns with error

estimates of one of both of the data coordinates, and even their correlation.

**-A**

__min__

**/**

__max__

**/**

__inc__

Instead of determining a best-fit regression we explore the full range of

regressions. Examine all possible regression lines with slope angles between

__min__

and

__max__, using steps of

__inc__degrees [-90/+90/1]. For each slope the optimum

intercept is determined based on your regression type (

**-E**) and misfit norm (

**-N**)

settings. For each segment we report the four columns

__angle__,

__E__,

__slope__,

__intercept__,

for the range of specified angles. The best model parameters within this range are

written into the segment header and reported in verbose mode (

**-V**).

**-C**

__level__

Set the confidence level (in %) to use for the optional calculation of confidence

bands on the regression [95]. This is only used if

**-F**includes the output column

**c**.

**-Ex|y|o|r**

Type of linear regression, i.e., select the type of misfit we should calculate.

Choose from

**x**(regress

__x__on

__y__; i.e., the misfit is measured horizontally from data

point to regression line),

**y**(regress

__y__on

__x__; i.e., the misfit is measured

vertically [Default]),

**o**(orthogonal regression; i.e., the misfit is measured from

data point orthogonally to nearest point on the line), or

**r**(Reduced Major Axis

regression; i.e., the misfit is the product of both vertical and horizontal

misfits) [

**y**].

**-F**

__flags__

Append a combination of the columns you wish returned; the output order will match

the order specified. Choose from

**x**(observed

__x__),

**y**(observed

__y__),

**m**(model

prediction),

**r**(residual = data minus model),

**c**(symmetrical confidence interval on

the regression; see

**-C**for specifying the level),

**z**(standardized residuals or

so-called

__z-scores__) and

**w**(outlier weights 0 or 1; for

**-Nw**these are the Reweighted

Least Squares weights) [

**xymrczw**]. As an alternative to evaluating the model, just

give

**-Fp**and we instead write a single record with the model parameters

__npoints__

__xmean__

__ymean__

__angle__

__misfit__

__slope__

__intercept__

__sigma_slope__

__sigma_intercept__.

**-N1|2|r|w**

Selects the norm to use for the misfit calculation. Choose among

**1**(L-1 measure;

the mean of the absolute residuals),

**2**(Least-squares; the mean of the squared

residuals),

**r**(LMS; The least median of the squared residuals), or

**w**(RLS;

Reweighted Least Squares: the mean of the squared residuals after outliers

identified via LMS have been removed) [Default is

**2**]. Traditional regression uses

L-2 while L-1 and in particular LMS are more robust in how they handle outliers.

As alluded to, RLS implies an initial LMS regression which is then used to identify

outliers in the data, assign these a zero weight, and then redo the regression

using a L-2 norm.

**-S[r]**Restricts which records will be output. By default all data records will be output

in the format specified by

**-F**. Use

**-S**to exclude data points identified as

outliers by the regression. Alternatively, use

**-Sr**to reverse this and only output

the outlier records.

**-T**

__min__

**/**

__max__

**/**

__inc__

**|**

**-T**

__n__

Evaluate the best-fit regression model at the equidistant points implied by the

arguments. If

**-T**

__n__is given instead we will reset

__min__and

__max__to the extreme

__x__-values for each segment and determine

__inc__so that there are exactly

__n__output

values for each segment. To skip the model evaluation entirely, simply provide

**-T**0.

**-W[w][x][y][r]**

Specifies weighted regression and which weights will be provided. Append

**x**if

giving 1-sigma uncertainties in the

__x__-observations,

**y**if giving 1-sigma

uncertainties in

__y__, and

**r**if giving correlations between

__x__and

__y__observations, in

the order these columns appear in the input (after the two required and leading

__x__,

__y__columns). Giving both

**x**and

**y**(and optionally

**r**) implies an orthogonal

regression, otherwise giving

**x**requires

**-Ex**and

**y**requires

**-Ey**. We convert

uncertainties in

__x__and

__y__to regression weights via the relationship weight =

1/sigma. Use

**-Ww**if the we should interpret the input columns to have precomputed

weights instead. Note: residuals with respect to the regression line will be

scaled by the given weights. Most norms will then square this weighted residual

(

**-N1**is the only exception).

**-V[**

__level__

**]**

**(more**

**...)**

Select verbosity level [c].

**-a**

__col__

**=**

__name__

**[**

__...__

**]**

**(more**

**...)**

Set aspatial column associations

__col__=

__name__.

**-bi[**

__ncols__

**][t]**

**(more**

**...)**

Select native binary input.

**-bo[**

__ncols__

**][**

__type__

**]**

**(more**

**...)**

Select native binary output. [Default is same as input].

**-g[a]x|y|d|X|Y|D|[**

__col__

**]z[+|-]**

__gap__

**[u]**

**(more**

**...)**

Determine data gaps and line breaks.

**-h[i|o][**

__n__

**][+c][+d][+r**

__remark__

**][+r**

__title__

**]**

**(more**

**...)**

Skip or produce header record(s).

**-i**

__cols__

**[l][s**

__scale__

**][o**

__offset__

**][,**

__...__

**]**

**(more**

**...)**

Select input columns (0 is first column).

**-o**

__cols__

**[,...]**

**(more**

**...)**

Select output columns (0 is first column).

**-^**

**or**

**just**

**-**

Print a short message about the syntax of the command, then exits (NOTE: on Windows

use just

**-**).

**-+**

**or**

**just**

**+**

Print an extensive usage (help) message, including the explanation of any

module-specific option (but not the GMT common options), then exits.

**-?**

**or**

**no**

**arguments**

Print a complete usage (help) message, including the explanation of options, then

exits.

**--version**

Print GMT version and exit.

**--show-datadir**

Print full path to GMT share directory and exit.

**ASCII** **FORMAT** **PRECISION**

The ASCII output formats of numerical data are controlled by parameters in your

**gmt.conf**

file. Longitude and latitude are formatted according to FORMAT_GEO_OUT, whereas other

values are formatted according to FORMAT_FLOAT_OUT. Be aware that the format in effect can

lead to loss of precision in the output, which can lead to various problems downstream. If

you find the output is not written with enough precision, consider switching to binary

output (

**-bo**if available) or specify more decimals using the FORMAT_FLOAT_OUT setting.

**EXAMPLES**

To do a standard least-squares regression on the

__x-y__data in points.txt and return x, y,

and model prediction with 99% confidence intervals, try

gmt regress points.txt -Fxymc -C99 > points_regressed.txt

To just get the slope for the above regression, try

slope=`gmt regress points.txt -Fp -o5`

To do a reweighted least-squares regression on the data rough.txt and return x, y, model

prediction and the RLS weights, try

gmt regress rough.txt -Fxymw > points_regressed.txt

To do an orthogonal least-squares regression on the data crazy.txt but first take the

logarithm of both x and y, then return x, y, model prediction and the normalized residuals

(z-scores), try

gmt regress crazy.txt -Eo -Fxymz -i0-1l > points_regressed.txt

To examine how the orthogonal LMS misfits vary with angle between 0 and 90 in steps of 0.2

degrees for the same file, try

gmt regress points.txt -A0/90/0.2 -Eo -Nr > points_analysis.txt

**REFERENCES**

Draper, N. R., and H. Smith, 1998,

__Applied__

__regression__

__analysis__, 3rd ed., 736 pp., John

Wiley and Sons, New York.

Rousseeuw, P. J., and A. M. Leroy, 1987,

__Robust__

__regression__

__and__

__outlier__

__detection__, 329 pp.,

John Wiley and Sons, New York.

York, D., N. M. Evensen, M. L. Martinez, and J. De Basebe Delgado, 2004, Unified equations

for the slope, intercept, and standard errors of the best straight line,

__Am.__

__J.__

__Phys.__,

72(3), 367-375.

Use gmtregressgmt online using onworks.net services