WLS
Updated 2024-02-13 20:16:54.467000
Syntax
SELECT * FROM [westclintech].[wct].[WLS] (
<@TableName, nvarchar(max),>
,<@ColumnNames, nvarchar(max),>
,<@GroupedColumnName, nvarchar(max),>
,<@GroupedColumnValue, sql_variant,>
,<@LConst, bit,>
,<@y_Column, nvarchar(4000),>
,<@w_Column, nvarchar(4000),>)
Description
Use the table-valued function WLS to calculate the Ordinary Least Squares (OLS) solution for a series of x- and y-values and an associated column of weights; sometimes referred to as Weighted Least-Squares (WLS). WLS returns the coefficients of regression, standard errors, Student’s T and associated p-value for each of the independent variables. It also returns summary statistics about the regression including the standard error of y, R2, adjusted R2, the F-statistic and its p-value, the regression sum of squares, the residual sum of squares and the quartiles of the residuals. WLS is closely related to LINEST and the regression coefficients and their standard errors, T statistics and p-values can be calculated in LINEST, though LINEST will not produce the correct summary statistics. See Example 1 to find out more.
Arguments
@GroupedColumnName
The name, as text, of the column in the table or view specified by @TableName which will be used for grouping the results.
@w_Column
The column name or column number containing the weight (w) variable.
@ColumnNames
he names, as text, of the columns in @TableName that contain the values used in the calculation. The column names include the independent variables (x0,x1…xn), the dependent variable (y) and the weight (w). Data returned from the @ColumnNames must be of a type float or of a type that intrinsically converts to float.
@TableName
The name, as text, of the table or view that contains the values used in the calculation.
@GroupedColumnValue
The column value to do the grouping on.
@LConst
A bit value specifying the calculation of a y-intercept (@LConst =1) or regression through the origin (@LConst = 0).
@y_Column
The column name or column number containing the dependent (y) variable.
Return Type
table
{"columns": [{"field": "colName", "headerName": "Name", "header": "name"}, {"field": "colDatatype", "headerName": "Type", "header": "type"}, {"field": "colDesc", "headerName": "Description", "header": "description", "minWidth": 1000}], "rows": [{"id": "60470735-9118-4d77-b6b9-03be34240dc2", "colName": "stat_name", "colDatatype": "nvarchar(4000)", "colDesc": "Identifies the statistic being returned: m \u2013 estimated coefficient se \u2013standard error of the estimated coefficient tstat \u2013 Student\u2019s T statistic pval \u2013 p-values of the tstat rsq \u2013 R2 rsqa \u2013 adjusted R2 rsqm \u2013 multiple R2 sey \u2013 standard error for the y estimate F \u2013 F statistic F_pval \u2013 p-value of F df \u2013 residual degrees of freedom ss_resid \u2013 weighted sum of squares mss \u2013 modified sum-of-squares w_resid_quart \u2013 weighted residual quartile"}, {"id": "9290d1a2-0f84-406f-9490-81f1b2fa9167", "colName": "idx", "colDatatype": "int", "colDesc": "Uniquely identifies a return value for the stat_names where multiple values are returned: m, se, tstat, pval, and w_resid_quart. For m, se, tstat, and pval, idx identifies that subscript of the estimated coefficient. For example, the stat_name m with an idx of 0, specifies that the stat_val is for m0, or the y-intercept (which is b in y = mx + b). An idx of 1 for the same stat_name identifies m1. For w_resid_quart idx identifies the quartile being returned. For all other stat_names returning a single value, the idx will be NULL."}, {"id": "f5a0dc85-303c-4a95-9b2d-5c12d9c9c72a", "colName": "stat_val", "colDatatype": "float", "colDesc": "The return value."}, {"id": "e8cfde42-6aed-4e56-9e88-5f3e55fe7d93", "colName": "col_name", "colDatatype": "nvarchar(4000)", "colDesc": "The column name from the resultant table produced by the dynamic SQL for the m, se, tstat, and pval stat_names."}]}
Remarks
If @y_Column is NULL then @y_Column is the left-most column in @ColumnNames .
I f @w_Column is NULL the @w_Column is the right-most column in @ColumnNames .
If @y_Column = @w_Column then no rows are returned.
I f @y_Column is numeric and less than 1 or greater than the number of columns in @ColumnNames then no rows are returned.
Weight values must be greater than zero.
The number of rows in the regression must be greater than or equal to the number of columns.
Examples
This example explains the calculation of the regression coefficients and the summary statistics in WLS. We will put the WLS results into a temp table, #wls.
SELECT *
INTO #t
FROM
(
VALUES
(103, 126.8, 62.3, 0.420928305104083),
(127.2, 115.7, 98, 0.642347072957175),
(118, 103.4, 92.2, 0.503672280805613),
(121.8, 95.2, 74.2, 0.349193063289055),
(106.1, 96, 78.9, 0.321793289794097),
(124.6, 124.7, 96.1, 1.34249371606786),
(116.9, 122.2, 94.1, 0.401800920329203),
(118.6, 128.2, 79.2, 0.67140606947821),
(125.2, 116.9, 79.6, 0.336969408869812),
(123.3, 112.3, 87.8, 0.556210387357181)
) n (y, x1, x2, w);
Use the WLS function to calculate the coefficients of regression and the associated statistics.
SELECT *
INTO #wls
FROM wct.WLS('#t', 'y,x1,x2,w', '', NULL, 1, 'y', 'w');
SELECT *
FROM #wls;
This produces the following result.
{"columns":[{"field":"stat_name"},{"field":"idx"},{"field":"stat_val","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"col_name"}],"rows":[{"stat_name":"m","idx":"0","stat_val":"76.2158913852846","col_name":"Intercept"},{"stat_name":"m","idx":"1","stat_val":"0.0222877042102519","col_name":"x1"},{"stat_name":"m","idx":"2","stat_val":"0.47373007655067","col_name":"x2"},{"stat_name":"se","idx":"0","stat_val":"24.1127459683416","col_name":"Intercept"},{"stat_name":"se","idx":"1","stat_val":"0.171731881726625","col_name":"x1"},{"stat_name":"se","idx":"2","stat_val":"0.173375419548835","col_name":"x2"},{"stat_name":"tstat","idx":"0","stat_val":"3.16081343391378","col_name":"Intercept"},{"stat_name":"tstat","idx":"1","stat_val":"0.129781983322882","col_name":"x1"},{"stat_name":"tstat","idx":"2","stat_val":"2.73239469460799","col_name":"x2"},{"stat_name":"pval","idx":"0","stat_val":"0.0159102496734965","col_name":"Intercept"},{"stat_name":"pval","idx":"1","stat_val":"0.900389539811493","col_name":"x1"},{"stat_name":"pval","idx":"2","stat_val":"0.0292374089699169","col_name":"x2"},{"stat_name":"rsq","idx":"NULL","stat_val":"0.520577705092377","col_name":"NULL"},{"stat_name":"sey","idx":"NULL","stat_val":"4.29237110562461","col_name":"NULL"},{"stat_name":"F","idx":"NULL","stat_val":"3.80045314366198","col_name":"NULL"},{"stat_name":"F_pval","idx":"NULL","stat_val":"0.0762981122063386","col_name":"NULL"},{"stat_name":"df","idx":"NULL","stat_val":"7","col_name":"NULL"},{"stat_name":"mss","idx":"NULL","stat_val":"140.04251562907","col_name":"NULL"},{"stat_name":"ss_resid","idx":"NULL","stat_val":"128.971147958807","col_name":"NULL"},{"stat_name":"rsqm","idx":"NULL","stat_val":"0.721510710310233","col_name":"NULL"},{"stat_name":"rsqa","idx":"NULL","stat_val":"0.383599906547341","col_name":"NULL"},{"stat_name":"w_resid_quart","idx":"0","stat_val":"-5.46438974663061","col_name":"NULL"},{"stat_name":"w_resid_quart","idx":"1","stat_val":"-3.44808553096924","col_name":"NULL"},{"stat_name":"w_resid_quart","idx":"2","stat_val":"0.839382652539098","col_name":"NULL"},{"stat_name":"w_resid_quart","idx":"3","stat_val":"2.08237170553033","col_name":"NULL"},{"stat_name":"w_resid_quart","idx":"4","stat_val":"5.03271582569117","col_name":"NULL"}]}
There is nothing further that needs to be done in terms of getting the regression results. The rest of this example serves as an explanation of how the results are calculated. To calculate weighted least squares the dependent variable and all the independent variables are multiplied by the square root of the weights. We can achieve that result in LINEST by setting @LConst = 0 and by manually creating the intercept in the result table. The following SQL does that and puts the results into a temp tale #ols.
SELECT stat_name,
idx - 1 as idx,
stat_val,
col_name
INTO #ols
FROM wct.LINEST('#t', 'y*SQRT(w) as y, sqrt(w) as Intercept, x1*SQRT(w) as x1,
x2*SQRT(w) as x2', '', NULL, 1, 0)
WHERE idx > 0
OR idx IS NULL;
SELECT *
FROM #ols;
This produces the following result.
{"columns":[{"field":"stat_name"},{"field":"idx"},{"field":"stat_val","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"col_name"}],"rows":[{"stat_name":"m","idx":"0","stat_val":"76.2158913852846","col_name":"Intercept"},{"stat_name":"m","idx":"1","stat_val":"0.0222877042102519","col_name":"x1"},{"stat_name":"m","idx":"2","stat_val":"0.47373007655067","col_name":"x2"},{"stat_name":"se","idx":"0","stat_val":"24.1127459683416","col_name":"Intercept"},{"stat_name":"se","idx":"1","stat_val":"0.171731881726625","col_name":"x1"},{"stat_name":"se","idx":"2","stat_val":"0.173375419548835","col_name":"x2"},{"stat_name":"tstat","idx":"0","stat_val":"3.16081343391378","col_name":"Intercept"},{"stat_name":"tstat","idx":"1","stat_val":"0.129781983322882","col_name":"x1"},{"stat_name":"tstat","idx":"2","stat_val":"2.73239469460799","col_name":"x2"},{"stat_name":"pval","idx":"0","stat_val":"0.0159102496734965","col_name":"Intercept"},{"stat_name":"pval","idx":"1","stat_val":"0.900389539811493","col_name":"x1"},{"stat_name":"pval","idx":"2","stat_val":"0.0292374089699169","col_name":"x2"},{"stat_name":"rsq","idx":"NULL","stat_val":"0.998391679632623","col_name":"NULL"},{"stat_name":"sey","idx":"NULL","stat_val":"4.29237110562461","col_name":"NULL"},{"stat_name":"F","idx":"NULL","stat_val":"1448.45556461448","col_name":"NULL"},{"stat_name":"df","idx":"NULL","stat_val":"7","col_name":"NULL"},{"stat_name":"ss_reg","idx":"NULL","stat_val":"80060.9901152796","col_name":"NULL"},{"stat_name":"ss_resid","idx":"NULL","stat_val":"128.971147958807","col_name":"NULL"},{"stat_name":"rsqm","idx":"NULL","stat_val":"0.999195516219235","col_name":"NULL"},{"stat_name":"rsqa","idx":"NULL","stat_val":"0.997702399475176","col_name":"NULL"}]}
The following SQL produces a side-by-side comparison of the regression results.
SELECT w.stat_name,
w.idx,
w.stat_val as [wls],
o.stat_val as [ols]
FROM #wls w
FULL OUTER JOIN #ols o
ON (
w.stat_name = o.stat_name
AND ISNULL(w.idx, 0) = ISNULL(o.idx, 0)
)
OR
(
w.stat_name = 'mss'
AND o.stat_name = 'ss_reg'
);
This produces the following result.
{"columns":[{"field":"stat_name"},{"field":"idx"},{"field":"wls","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"ols"}],"rows":[{"stat_name":"m","idx":"0","wls":"76.2158913852846","ols":"76.2158913852846"},{"stat_name":"m","idx":"1","wls":"0.0222877042102519","ols":"0.0222877042102519"},{"stat_name":"m","idx":"2","wls":"0.47373007655067","ols":"0.47373007655067"},{"stat_name":"se","idx":"0","wls":"24.1127459683416","ols":"24.1127459683416"},{"stat_name":"se","idx":"1","wls":"0.171731881726625","ols":"0.171731881726625"},{"stat_name":"se","idx":"2","wls":"0.173375419548835","ols":"0.173375419548835"},{"stat_name":"tstat","idx":"0","wls":"3.16081343391378","ols":"3.16081343391378"},{"stat_name":"tstat","idx":"1","wls":"0.129781983322882","ols":"0.129781983322882"},{"stat_name":"tstat","idx":"2","wls":"2.73239469460799","ols":"2.73239469460799"},{"stat_name":"pval","idx":"0","wls":"0.0159102496734965","ols":"0.0159102496734965"},{"stat_name":"pval","idx":"1","wls":"0.900389539811493","ols":"0.900389539811493"},{"stat_name":"pval","idx":"2","wls":"0.0292374089699169","ols":"0.0292374089699169"},{"stat_name":"rsq","idx":"NULL","wls":"0.520577705092377","ols":"0.998391679632623"},{"stat_name":"sey","idx":"NULL","wls":"4.29237110562461","ols":"4.29237110562461"},{"stat_name":"F","idx":"NULL","wls":"3.80045314366198","ols":"1448.45556461448"},{"stat_name":"df","idx":"NULL","wls":"7","ols":"7"},{"stat_name":"mss","idx":"NULL","wls":"140.04251562907","ols":"80060.9901152796"},{"stat_name":"ss_resid","idx":"NULL","wls":"128.971147958807","ols":"128.971147958807"},{"stat_name":"rsqm","idx":"NULL","wls":"0.721510710310233","ols":"0.999195516219235"},{"stat_name":"rsqa","idx":"NULL","wls":"0.383599906547341","ols":"0.997702399475176"},{"stat_name":"F_pval","idx":"NULL","wls":"0.0762981122063386","ols":"NULL"},{"stat_name":"w_resid_quart","idx":"0","wls":"-5.46438974663061","ols":"NULL"},{"stat_name":"w_resid_quart","idx":"1","wls":"-3.44808553096924","ols":"NULL"},{"stat_name":"w_resid_quart","idx":"2","wls":"0.839382652539098","ols":"NULL"},{"stat_name":"w_resid_quart","idx":"3","wls":"2.08237170553033","ols":"NULL"},{"stat_name":"w_resid_quart","idx":"4","wls":"5.03271582569117","ols":"NULL"}]}
Notice that LINEST does a pretty good job with the m, se, tstat, pval, sey, df, and ss_resid statistics. And LINEST does not calculate F_pval nor does it calculate the quartiles of the residuals. The real difference arises with the calculation of the sum of squares of regression (ss_reg in LINEST; mss in WLS ) which is used in the calculation of rsq, F, rsqm and rsqa. The ss_reg value in LINEST is calculated as sum of yhat (ŷ) squared, where yhat is simply the independent variables multiplied by the coefficients of regression. We can see that calculation in the following SQL.
SELECT SUM(SQUARE([m0] * n.Intercept + [m1] * x1 + [m2] * x2)) as ss_reg
FROM
(
SELECT 'm' + cast(idx as char(1)) as coef,
stat_val
FROM #ols
WHERE stat_name = 'm'
) d
PIVOT
(
max(stat_val)
FOR coef in (m0, m1, m2)
) pvt
CROSS JOIN
(
SELECT sqrt(w) as Intercept,
x1 * SQRT(w) as x1,
x2 * SQRT(w) as x2
FROM #t
) n;
This produces the following result.
{"columns":[{"field":"ss_reg","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"}],"rows":[{"ss_reg":"80060.9901152796"}]}
For weighted least squares, this calculation needs to be adjusted for the weights. The following SQL shows how to make this adjustment.
DECLARE @wavg_y as float =
(
SELECT wct.WAVG(w, y)FROM #t
);
DECLARE @mss as float =
(
SELECT SUM(POWER([m0] * n.Intercept + [m1] * x1 + [m2] * x2 - @wavg_y,
2) * w) as mss
FROM
(
SELECT 'm' + cast(idx as char(1)) as coef,
stat_val
FROM #ols
WHERE stat_name = 'm'
) d
PIVOT
(
max(stat_val)
FOR coef in (m0, m1, m2)
) pvt
CROSS JOIN
(SELECT 1 as Intercept, x1, x2, w FROM #t) n
);
SELECT @mss as mss;
This produces the following result.
{"columns":[{"field":"mss","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"}],"rows":[{"mss":"140.04251562907"}]}
Having gotten the regression modified sum-of-squares value, we can then use the same formulas as in ordinary least squares to calculate the remaining statistics.
DECLARE @ssresid as float =
(
SELECT stat_val from #wls WHERE stat_name = 'ss_resid'
);
DECLARE @df as float =
(
SELECT stat_val from #wls WHERE stat_name = 'df'
);
DECLARE @rsq as float = @mss / (@mss + @ssresid);
DECLARE @rsqm as float = SQRT(@rsq);
DECLARE @p as float =
(
SELECT COUNT(*) - 1 FROM #t
);
DECLARE @rsqa as float = 1 - (1 - @rsq) * @p / @df;
DECLARE @Fobs as float = @mss / ((@p - @df) * @ssresid / @df);
DECLARE @Fdist as float = wct.F_DIST_RT(@Fobs, @p - @df, @df);
SELECT stat_name,
NULL as idx,
stat_val
FROM
(
VALUES
('rsq', @rsq),
('mss', @mss),
('rsqm', @rsqm),
('rsqa', @rsqa),
('F', @Fobs),
('F_pval', @Fdist)
) x (stat_name, stat_val);
This produces the following result.
{"columns":[{"field":"stat_name"},{"field":"idx"},{"field":"stat_val","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"}],"rows":[{"stat_name":"rsq","idx":"NULL","stat_val":"0.520577705092377"},{"stat_name":"mss","idx":"NULL","stat_val":"140.04251562907"},{"stat_name":"rsqm","idx":"NULL","stat_val":"0.721510710310233"},{"stat_name":"rsqa","idx":"NULL","stat_val":"0.383599906547341"},{"stat_name":"F","idx":"NULL","stat_val":"3.80045314366198"},{"stat_name":"F_pval","idx":"NULL","stat_val":"0.0762981122063386"}]}
Finally, the w_resid_quart values are simply the quartiles of the residuals.
SELECT 'w_resid_quart' as stat_name,
x.k as idx,
wct.QUARTILE(y - ([m0] * n.Intercept + [m1] * x1 + [m2] * x2), x.k) as
stat_val
FROM
(
SELECT 'm' + cast(idx as char(1)) as coef,
stat_val
FROM #ols
WHERE stat_name = 'm'
) d
PIVOT
(
max(stat_val)
FOR coef in (m0, m1, m2)
) pvt
CROSS JOIN
(
SELECT y * sqrt(w) as y,
sqrt(w) as Intercept,
x1 * sqrt(w) as x1,
x2 * sqrt(w) as x2
FROM #t
) n
CROSS APPLY
(
VALUES
(0),
(1),
(2),
(3),
(4)
) x (k)
GROUP BY x.k;
This produces the following result.
{"columns":[{"field":"stat_name"},{"field":"idx","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"stat_val","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"}],"rows":[{"stat_name":"w_resid_quart","idx":"0","stat_val":"-5.46438974663062"},{"stat_name":"w_resid_quart","idx":"1","stat_val":"-3.44808553096925"},{"stat_name":"w_resid_quart","idx":"2","stat_val":"0.839382652539094"},{"stat_name":"w_resid_quart","idx":"3","stat_val":"2.08237170553032"},{"stat_name":"w_resid_quart","idx":"4","stat_val":"5.03271582569117"}]}
See Also
LINEST - the Ordinary Least Squares (OLS) solution for a series of x-values and y-values
LINEST_q - the Ordinary Least Squares (OLS) solution for a series of x-values and y-values