COVARIANCE_P
Updated 2023-10-23 19:29:19.693000
Syntax
SELECT [westclintech].[wct].[COVARIANCE_P] (
<@Known_y, float,>
,<@Known_x, float,>)
Description
Use the aggregate function COVARIANCE _P to calculate the population covariance for a set of x- and y-values. The formula for the population covariance is
\sigma_{xy}=\frac{\sum_{i=1}^{N}(x_i - \bar{x})(y_i - \bar{y})}{N}
Arguments
@Known_x
the x-values to be used in the calculation. @Known_x is an expression of type float or of a type that can be implicitly converted to float.
@Known_y
the y-values to be used in the calculation. @Known_y is an expression of type float or of a type that can be implicitly converted to float.
Return Type
float
Remarks
COVARIANCE_P is an AGGREGATE function and follows the same conventions as all other AGGREGATE functions in SQL Server.
Use COVARIANCE_S to calculate the sample covariance.
Examples
In this example, we calculate the population covariance for a single set of x- and y-values
SELECT wct.COVARIANCE_P(y, x) as COVARIANCE_P
FROM
(
SELECT 0.75,
1
UNION ALL
SELECT 2.5,
2
UNION ALL
SELECT 6.75,
3
UNION ALL
SELECT 10,
4
) n(x, y);
This produces the following result.
{"columns":[{"field":"COVARIANCE_P","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"}],"rows":[{"COVARIANCE_P":"4"}]}
In this example, we will populate some temporary table with some historical financial information and then calculate the population covariance. First, create the table and put some data in it:
CREATE TABLE #c
(
SYM NVARCHAR(5),
YE BIGINT,
REV FLOAT,
GPROF FLOAT,
OPINC FLOAT,
NETINC FLOAT
);
INSERT INTO #c
VALUES
('YHOO', 2009, 6460.32, 3588.57, 386.69, 597.99);
INSERT INTO #c
VALUES
('YHOO', 2008, 72.5, 4185.14, 12.96, 418.92);
INSERT INTO #c
VALUES
('YHOO', 2007, 6969.27, 4130.52, 695.41, 639.16);
INSERT INTO #c
VALUES
('YHOO', 2006, 6425.68, 3749.96, 940.97, 751.39);
INSERT INTO #c
VALUES
('YHOO', 2005, 5257.67, 3161.47, 1107.73, 1896.23);
INSERT INTO #c
VALUES
('GOOG', 2009, 23650.56, 14806.45, 8312.19, 6520.45);
INSERT INTO #c
VALUES
('GOOG', 2008, 21795.55, 13174.04, 5537.21, 4226.86);
INSERT INTO #c
VALUES
('GOOG', 2007, 16593.99, 9944.9, 54.44, 4203.72);
INSERT INTO #c
VALUES
('GOOG', 2006, 10604.92, 6379.89, 3550, 3077.45);
INSERT INTO #c
VALUES
('GOOG', 2005, 6138.56, 3561.47, 2017.28, 1465.4);
INSERT INTO #c
VALUES
('MSFT', 2010, 62484, 509, 24167, 18760);
INSERT INTO #c
VALUES
('MSFT', 2009, 58437, 46282, 21225, 14569);
INSERT INTO #c
VALUES
('MSFT', 2008, 60420, 48822, 22271, 17681);
INSERT INTO #c
VALUES
('MSFT', 2007, 51122, 40429, 18438, 14065);
INSERT INTO #c
VALUES
('MSFT', 2006, 44282, 36632, 16064, 12599);
INSERT INTO #c
VALUES
('ORCL', 2010, 26820, 21056, 9062, 6135);
INSERT INTO #c
VALUES
('ORCL', 2009, 23252, 18458, 8321, 5593);
INSERT INTO #c
VALUES
('ORCL', 2008, 22430, 17449, 7844, 5521);
INSERT INTO #c
VALUES
('ORCL', 2007, 17996, 13805, 5974, 4274);
INSERT INTO #c
VALUES
('ORCL', 2006, 14380, 11145, 4736, 3381);
INSERT INTO #c
VALUES
('SAP', 2009, 10672, 6980, 2588, 1748);
INSERT INTO #c
VALUES
('SAP', 2008, 11575, 7370, 2701, 1847);
INSERT INTO #c
VALUES
('SAP', 2007, 10256, 6631, 2698, 1906);
INSERT INTO #c
VALUES
('SAP', 2006, 9393, 6064, 2578, 1871);
INSERT INTO #c
VALUES
('SAP', 2005, 8509, 5460, 2337, 1496);
Now, calculate the population covariance of the revenue (REV) againt the year (YE) for each company (SYM)
SELECT #c.SYM,
wct.COVARIANCE_P(REV, YE) as COVARIANCE_P
FROM #c
GROUP BY SYM;
This produces the following result.
{"columns":[{"field":"SYM"},{"field":"COVARIANCE_P","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"}],"rows":[{"SYM":"GOOG","COVARIANCE_P":"9242.926"},{"SYM":"MSFT","COVARIANCE_P":"8743.8"},{"SYM":"ORCL","COVARIANCE_P":"6027.2"},{"SYM":"SAP","COVARIANCE_P":"1301.6"},{"SYM":"YHOO","COVARIANCE_P":"-789.576"}]}
Let’s say we wanted to perform the same analysis as above, but we only want to return the results where the covariance is positive.
SELECT #c.SYM,
wct.COVARIANCE_P(REV, YE) as COVARIANCE_P
FROM #c
GROUP BY SYM
HAVING wct.COVARIANCE_P(REV, YE) > 0;
This produces the following result.
SYM COVARIANCE_P
----- ----------------------
GOOG 9242.926
MSFT 8743.8
ORCL 6027.2
SAP 1301.6
In this example, we will calculate the correlation of the operating income (OPINC) against the revenue (REV).
SELECT #c.SYM
,wct.COVARIANCE_P(OPINC,REV) as COVARIANCE_P
FROM #c
GROUP BY SYM;
This produces the following result.
{"columns":[{"field":"SYM"},{"field":"COVARIANCE_P","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"}],"rows":[{"SYM":"GOOG","COVARIANCE_P":"12281522.966476"},{"SYM":"MSFT","COVARIANCE_P":"19038574.4"},{"SYM":"ORCL","COVARIANCE_P":"6855059.76"},{"SYM":"SAP","COVARIANCE_P":"117904.8"},{"SYM":"YHOO","COVARIANCE_P":"676127.183864"}]}