Logo

CORREL

Updated 2023-10-31 15:54:30.627000

Syntax

SELECT [westclintech].[wct].[CORREL] (
 ,<@Known_y, float,>
 ,<@Known_x, float,>)

Description

Use the aggregate function CORREL to calculate the correlation coefficient between two datasets. The equation for the correlation coefficient is

r_{xy} = \frac{\sum(x-\bar{x})(y-\bar{y})}{\sqrt{\sum(x-\bar{x})^2\sum(y-\bar{y})^2}

Arguments

@Known_x

the x-values to be used in the CORREL calculation. @Known_x is an expression of type float or of a type that can be implicitly converted to float.

@Known_y

the y-values to be used in the CORREL calculation. @Known_y is an expression of type float or of a type that can be implicitly converted to float.

Return Type

float

Remarks

CORREL is an AGGREGATE function and follows the same conventions as all other AGGREGATE functions in SQL Server.

Examples

In this example, we calculate the slope for a single set of x- and y-values

SELECT wct.CORREL(y, x) as CORREL

FROM

(

    SELECT 0.75,

           1

    UNION ALL

    SELECT 2.5,

           2

    UNION ALL

    SELECT 6.75,

           3

    UNION ALL

    SELECT 10,

           4

) n(x, y);

This produces the following result

{"columns":[{"field":"CORREL","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"}],"rows":[{"CORREL":"0.988719187867937"}]}

In this example, we will populate a temporary table with some historical financial information and then calculate the correlation. First, create the table and put some data in it:

CREATE TABLE #c

(

    SYM NVARCHAR(5),

    YE BIGINT,

    REV FLOAT,

    GPROF FLOAT,

    OPINC FLOAT,

    NETINC FLOAT

);

INSERT INTO #c

VALUES

('YHOO', 2009, 6460.32, 3588.57, 386.69, 597.99);

INSERT INTO #c

VALUES

('YHOO', 2008, 72.5, 4185.14, 12.96, 418.92);

INSERT INTO #c

VALUES

('YHOO', 2007, 6969.27, 4130.52, 695.41, 639.16);

INSERT INTO #c

VALUES

('YHOO', 2006, 6425.68, 3749.96, 940.97, 751.39);

INSERT INTO #c

VALUES

('YHOO', 2005, 5257.67, 3161.47, 1107.73, 1896.23);

INSERT INTO #c

VALUES

('GOOG', 2009, 23650.56, 14806.45, 8312.19, 6520.45);

INSERT INTO #c

VALUES

('GOOG', 2008, 21795.55, 13174.04, 5537.21, 4226.86);

INSERT INTO #c

VALUES

('GOOG', 2007, 16593.99, 9944.9, 54.44, 4203.72);

INSERT INTO #c

VALUES

('GOOG', 2006, 10604.92, 6379.89, 3550, 3077.45);

INSERT INTO #c

VALUES

('GOOG', 2005, 6138.56, 3561.47, 2017.28, 1465.4);

INSERT INTO #c

VALUES

('MSFT', 2010, 62484, 509, 24167, 18760);

INSERT INTO #c

VALUES

('MSFT', 2009, 58437, 46282, 21225, 14569);

INSERT INTO #c

VALUES

('MSFT', 2008, 60420, 48822, 22271, 17681);

INSERT INTO #c

VALUES

('MSFT', 2007, 51122, 40429, 18438, 14065);

INSERT INTO #c

VALUES

('MSFT', 2006, 44282, 36632, 16064, 12599);

INSERT INTO #c

VALUES

('ORCL', 2010, 26820, 21056, 9062, 6135);

INSERT INTO #c

VALUES

('ORCL', 2009, 23252, 18458, 8321, 5593);

INSERT INTO #c

VALUES

('ORCL', 2008, 22430, 17449, 7844, 5521);

INSERT INTO #c

VALUES

('ORCL', 2007, 17996, 13805, 5974, 4274);

INSERT INTO #c

VALUES

('ORCL', 2006, 14380, 11145, 4736, 3381);

INSERT INTO #c

VALUES

('SAP', 2009, 10672, 6980, 2588, 1748);

INSERT INTO #c

VALUES

('SAP', 2008, 11575, 7370, 2701, 1847);

INSERT INTO #c

VALUES

('SAP', 2007, 10256, 6631, 2698, 1906);

INSERT INTO #c

VALUES

('SAP', 2006, 9393, 6064, 2578, 1871);

INSERT INTO #c

VALUES

('SAP', 2005, 8509, 5460, 2337, 1496);

Now, calculate the correlation of the revenue (REV) against the year (YE) for each company (SYM)

SELECT #c.SYM,

       wct.CORREL(REV, YE) as CORREL

FROM #c

GROUP BY SYM;

This produces the following result.

{"columns":[{"field":"SYM"},{"field":"CORREL","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"}],"rows":[{"SYM":"GOOG","CORREL":".988604792733014"},{"SYM":"MSFT","CORREL":".91861026921264"},{"SYM":"ORCL","CORREL":".983795721235544"},{"SYM":"SAP","CORREL":".873067973316442"},{"SYM":"YHOO","CORREL":".-0.219384585146269"}]}

In this example, we will calculate the correlation of the operating income (OPINC) against the revenue (REV)

SELECT #c.SYM,

       wct.CORREL(OPINC, REV) as CORREL

FROM #c

GROUP BY SYM;

This produces the following result.

{"columns":[{"field":"SYM"},{"field":"CORREL","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"}],"rows":[{"SYM":"GOOG","CORREL":"0.651906713868849"},{"SYM":"MSFT","CORREL":".987612258172035"},{"SYM":"ORCL","CORREL":".9924157389967"},{"SYM":"SAP","CORREL":".844595495520328"},{"SYM":"YHOO","CORREL":"0.677389856742323"}]}

Let’s say we wanted to perform the original analysis, but we only want to return the results where the correlation is positive.

SELECT #c.SYM,

       wct.CORREL(OPINC, REV) as CORREL

FROM #c

GROUP BY SYM

HAVING wct.CORREL(OPINC, REV) > 0;

This produces the following result.

{"columns":[{"field":"SYM"},{"field":"CORREL","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"}],"rows":[{"SYM":"GOOG","CORREL":".988604792733014"},{"SYM":"MSFT","CORREL":".91861026921264"},{"SYM":"ORCL","CORREL":".983795721235544"},{"SYM":"SAP","CORREL":".873067973316442"}]}

See Also

CORREL - Aggregate function to calculate the correlation coefficient

COVAR - the average of the products of the deviations in known x- and y-values

FISHER - the Fisher transformation at x

FISHERINV - the inverse Fisher transformation

PEARSON - Aggregate function to calculate the correlation coefficient

INTERCEPT - the point at which a line will intersect the y-axis by using existing x-values and y-values

SLOPE - slope of the linear regression through the data points in the known x-values and y-values

LINEST - the Ordinary Least Squares (OLS) solution for a series of x-values and y-values

RSQ - the Pearson product moment correlation coefficient through data points in known-y’s and known-x’s

STEYX - the standard error of the predicted y-value for each x in the regression