SQL Server WLS_q Function

Updated 2024-02-13 19:59:02.277000

Description

Use the SQL Server table-valued function WLS_q to calculate the Ordinary Least Squares (OLS) solution for a series of x- and y-values and an associated column of weights; sometimes referred to as Weighted Least-Squares (WLS). WLS_q returns the coefficients of regression, standard errors, Student's T and associated p-value for each of the independent variables. It also returns summary statistics about the regression including the standard error of y, R2, adjusted R2, the F-statistic and its p-value, the regression sum of squares, the residual sum of squares and the quartiles of the residuals. WLS_q is closely related to LINEST_q and the regression coefficients and their standard errors, T statistics and p-values can be calculated in LINEST_q, though LINEST_q will not produce the correct summary statistics. See Examples to find out more.

Syntax

SELECT * FROM [westclintech].[wct].[WLS_q] (
   <@Matrix_RangeQuery, nvarchar(max),>
  ,<@LConst, bit,>
  ,<@y_Column, nvarchar(4000),>
  ,<@w_Column, nvarchar(4000),>)

Arguments

@Matrix_RangeQuery

The SELECT statement, as a string, which, when executed, creates the resultant table of w-, x- and y-values which will be used in the calculation. Data returned from the @Matrix_RangeQuery must be of a type float or of a type that intrinsically converts to float.

@LConst

A bit value specifying the calculation of a y-intercept (@LConst = 1) or regression through the origin (@LConst = 0).

@y_Column

The column name or column number containing the dependent (y) variable.

@w_Column

The column name or column number containing the weight (w) variable.

Return Type

table

{"columns": [{"field": "colName", "headerName": "Name", "header": "name"}, {"field": "colDatatype", "headerName": "Type", "header": "type"}, {"field": "colDesc", "headerName": "Description", "header": "description", "minWidth": 1000}], "rows": [{"id": "752465d7-f008-4755-8636-759f1537649d", "colName": "stat_name", "colDatatype": "nvarchar(4000)", "colDesc": "Identifies the statistic being returned: m \u2013 estimated coefficient se \u2013standard error of the estimated coefficient tstat \u2013 Student\u2019s T statistic pval \u2013 p-values of the tstat rsq \u2013 R2 rsqa \u2013 adjusted R2 rsqm \u2013 multiple R2 sey \u2013 standard error for the y estimate F \u2013 F statistic F_pval \u2013 p-value of F df \u2013 residual degrees of freedom ss_resid \u2013 weighted sum of squares mss \u2013 modified sum-of-squares w_resid_quart \u2013 weighted residual quartile"}, {"id": "8968d027-c1a7-4859-ab93-5c9e23bc5a89", "colName": "idx", "colDatatype": "int", "colDesc": "Uniquely identifies a return value for the stat_names where multiple values are returned: m, se, tstat, pval, and w_resid_quart.  For m, se, tstat, and pval, idx identifies that subscript of the estimated coefficient. For example, the stat_name m with an idx of 0, specifies that the stat_val is for m0, or the y-intercept (which is b in y = mx + b). An idx of 1 for the same stat_name identifies m1. For w_resid_quart idx identifies the quartile being returned. For all other stat_names returning a single value, the idx will be NULL."}, {"id": "ce029761-fa89-4be7-a187-7907eaaad3f7", "colName": "stat_val", "colDatatype": "float", "colDesc": "The return value."}, {"id": "fc2121e5-5976-457c-979f-16028a0c7e4f", "colName": "col_name", "colDatatype": "nvarchar(4000)", "colDesc": "The column name from the resultant table produced by the dynamic SQL for the m, se, tstat, and pval stat_names."}]}

Remarks

If @y_Column is NULL then @y_Column is the left-most column in @ColumnNames .

If @w_Column is NULL the @w_Column is the right-most column in @ColumnNames.

I f @y_Column = @w_Column then no rows are returned.

I f @y_Column is numeric and less than 1 or greater than the number of columns in @ColumnNames then no rows are returned.

I f @w_Column is numeric and less than 1 or greater than the number of columns in @ColumnNames then no rows are returned.

W eight values must be greater than zero.

The number of rows in the regression must be greater than or equal to the number of columns.

Examples

This example explains the calculation of the regression coefficients and the summary statistics in WLS_q . Let's set up an example in SQL Server and take a closer look at those calculation. We will put the WLS_q results into a temp table, #wls.

SELECT *
INTO #t
FROM
(
    VALUES
        (103, 126.8, 62.3, 0.420928305104083),
        (127.2, 115.7, 98, 0.642347072957175),
        (118, 103.4, 92.2, 0.503672280805613),
        (121.8, 95.2, 74.2, 0.349193063289055),
        (106.1, 96, 78.9, 0.321793289794097),
        (124.6, 124.7, 96.1, 1.34249371606786),
        (116.9, 122.2, 94.1, 0.401800920329203),
        (118.6, 128.2, 79.2, 0.67140606947821),
        (125.2, 116.9, 79.6, 0.336969408869812),
        (123.3, 112.3, 87.8, 0.556210387357181)
) n (y, x1, x2, w);

Use the WLS_q function to calculate the coefficients of regression and the associated statistics.

SELECT *
INTO #wls
FROM wct.WLS_q('SELECT y,x1,x2,w FROM #t', 1, 'y', 'w');
SELECT *
FROM #wls;