RSQ
Updated 2023-11-02 12:39:03.887000
Syntax
SELECT [westclintech].[wct].[RSQ] (
<@Known_y, float,>
,<@Known_x, float,>)
Description
Use the aggregate function RSQ to return the Pearson product moment correlation coefficient through data points in known-y’s and known-x’s. The r-squared value can be interpreted as the proportion of the variance in y attributable to the variance in x. The equation for the Pearson product moment correlation coefficient is:
r_{xy}=\frac{\sum(x-\bar{x})(y-\bar{y})}{\sqrt{\sum(x-\bar{x})^2\sum(y-\bar{y})^2}}
RSQ returns r2 which is the square of this correlation coefficient
Arguments
@Known_x
the x-values to be used in the RSQ calculation. @Known_x is an expression of type float or of a type that can be implicitly converted to float.
@Known_y
the y-values to be used in the RSQ calculation. @Known_y is an expression of type float or of a type that can be implicitly converted to float.
Return Type
float
Remarks
RSQ is an AGGREGATE function and follows the same conventions as all other AGGREGATE functions in SQL Server.
Examples
In this example, we calculate the RSQ coefficient as correlation for a single set of x- and y-values
SELECT wct.RSQ(y, x) as RSQ
FROM
(
SELECT 0.75,
1
UNION ALL
SELECT 2.5,
2
UNION ALL
SELECT 6.75,
3
UNION ALL
SELECT 10,
4
) n(x, y);
This produces the following result
{"columns":[{"field":"RSQ","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"}],"rows":[{"RSQ":"0.977565632458234"}]}
In this example, we will populate some temporary table with some historical financial information and then calculate the RSQ value. First, create the table and put some data in it:
CREATE TABLE #c
(
SYM NVARCHAR(5),
YE BIGINT,
REV FLOAT,
GPROF FLOAT,
OPINC FLOAT,
NETINC FLOAT
);
INSERT INTO #c
VALUES
('YHOO', 2009, 6460.32, 3588.57, 386.69, 597.99);
INSERT INTO #c
VALUES
('YHOO', 2008, 72.5, 4185.14, 12.96, 418.92);
INSERT INTO #c
VALUES
('YHOO', 2007, 6969.27, 4130.52, 695.41, 639.16);
INSERT INTO #c
VALUES
('YHOO', 2006, 6425.68, 3749.96, 940.97, 751.39);
INSERT INTO #c
VALUES
('YHOO', 2005, 5257.67, 3161.47, 1107.73, 1896.23);
INSERT INTO #c
VALUES
('GOOG', 2009, 23650.56, 14806.45, 8312.19, 6520.45);
INSERT INTO #c
VALUES
('GOOG', 2008, 21795.55, 13174.04, 5537.21, 4226.86);
INSERT INTO #c
VALUES
('GOOG', 2007, 16593.99, 9944.9, 54.44, 4203.72);
INSERT INTO #c
VALUES
('GOOG', 2006, 10604.92, 6379.89, 3550, 3077.45);
INSERT INTO #c
VALUES
('GOOG', 2005, 6138.56, 3561.47, 2017.28, 1465.4);
INSERT INTO #c
VALUES
('MSFT', 2010, 62484, 509, 24167, 18760);
INSERT INTO #c
VALUES
('MSFT', 2009, 58437, 46282, 21225, 14569);
INSERT INTO #c
VALUES
('MSFT', 2008, 60420, 48822, 22271, 17681);
INSERT INTO #c
VALUES
('MSFT', 2007, 51122, 40429, 18438, 14065);
INSERT INTO #c
VALUES
('MSFT', 2006, 44282, 36632, 16064, 12599);
INSERT INTO #c
VALUES
('ORCL', 2010, 26820, 21056, 9062, 6135);
INSERT INTO #c
VALUES
('ORCL', 2009, 23252, 18458, 8321, 5593);
INSERT INTO #c
VALUES
('ORCL', 2008, 22430, 17449, 7844, 5521);
INSERT INTO #c
VALUES
('ORCL', 2007, 17996, 13805, 5974, 4274);
INSERT INTO #c
VALUES
('ORCL', 2006, 14380, 11145, 4736, 3381);
INSERT INTO #c
VALUES
('SAP', 2009, 10672, 6980, 2588, 1748);
INSERT INTO #c
VALUES
('SAP', 2008, 11575, 7370, 2701, 1847);
INSERT INTO #c
VALUES
('SAP', 2007, 10256, 6631, 2698, 1906);
INSERT INTO #c
VALUES
('SAP', 2006, 9393, 6064, 2578, 1871);
INSERT INTO #c
VALUES
('SAP', 2005, 8509, 5460, 2337, 1496);
Now, calculate the correlation of the revenue (REV) againt the year (YE) for each company (SYM)
SELECT #c.SYM,
wct.RSQ(REV, YE) as RSQ
FROM #c
GROUP BY SYM;
This produces the following result.
{"columns":[{"field":"SYM"},{"field":"RSQ","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"}],"rows":[{"SYM":"GOOG","RSQ":".977339436214685"},{"SYM":"MSFT","RSQ":".84384482670292"},{"SYM":"ORCL","RSQ":".967854021121364"},{"SYM":"SAP","RSQ":".7622476860379"},{"SYM":"YHOO","RSQ":"0.0481295961998007"}]}
In this example, we will calculate the RSQ of the operating income (OPINC) against the revenue (REV)
SELECT #c.SYM,
wct.RSQ(OPINC, REV) as RSQ
FROM #c
GROUP BY SYM;
This produces the following result.
{"columns":[{"field":"SYM"},{"field":"RSQ","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"}],"rows":[{"SYM":"GOOG","RSQ":"0.424982363587281"},{"SYM":"MSFT","RSQ":".975377972491666"},{"SYM":"ORCL","RSQ":".9848889990367"},{"SYM":"SAP","RSQ":".713341551053228"},{"SYM":"YHOO","RSQ":"0.458857018017384"}]}