LLS_q
Updated 2024-03-28 21:24:59.197000
Syntax
SELECT * FROM [westclintech].[wct].[LLS_q] (
<@Matrix_RangeQuery, nvarchar(max),>
,<@y_ColumnNumber, int,>
,<@LConst, bit,>
,<@Tol, float,>)
Description
Use the table-valued function LLS_q when you want to perform a Linear Least Squares calculation and you to adjust the tolerance for collinearity. For more information on Linear Least Squares see the LINEST_q function.
Arguments
@Tol
The tolerance values. The tolerance is used internally in QR decomposition to determine when to stop the pivoting. When the tolerance ins sufficiently small, the regression will continue for columns that might otherwise be considered collinear.
@Y_ColumnNumber
the index into the array identifying the column containing the y-values. The index value must be between 1 and n, where n is the number of columns in the resultant table created by the @Matrix_RangeQuery. @y_ColumnNumber must be of the type int or of a type that implicitly converts to int.
@Matrix_RangeQuery
The SELECT statement, as a string, which, when executed, creates the resultant table of x- and y-values which will be used in the calculation. Data returned from the @Matrix_RangeQuery must be of a type float or of a type that intrinsically converts to float.
@Lconst
A logical value specifying whether to force the y-intercept value (m0) equal to zero. @LConst must be of the type bit or of a type that implicitly converts to bit.
Return Type
table
{"columns": [{"field": "colName", "headerName": "Name", "header": "name"}, {"field": "colDatatype", "headerName": "Type", "header": "type"}, {"field": "colDesc", "headerName": "Description", "header": "description", "minWidth": 1000}], "rows": [{"id": "2a80ceb3-27cb-4091-a3fa-ae9900538535", "colName": "stat_name", "colDatatype": "nvarchar(4000)", "colDesc": "Identifies the statistic being returned: m the estimated coefficient se the standard error of the estimated coefficient tstat the t statistic pval the p-value (t distribution) for the t statistic rsq the coefficient of determination (r2) rsqa adjusted r square rsqm multiple r square sey the standard error for the y estimate f the f-observed value df the residual degrees of freedom ss_reg the regression sum of squares ss_resid the residual sum of squares"}, {"id": "e6ae9b16-4980-424d-869c-4f2404d9ac6b", "colName": "idx", "colDatatype": "int", "colDesc": "Identifies the subscript for the estimated coefficient, the standard error of the estimated coefficient, the t statistic and the p-values. For example, the stat_name m with an idx of 0, specifies that the stat_val is for m0, or the y-intercept (which is b in y = mx + b). An idx of 1 for the same stat_name identifies m1. The stat_name se with an idx of 0 identifies the standard error of the m0 coefficient (which is sometimes referred to as the standard error of b or seb). Idx values are only supplied for the m, se, tstat, and pval stat_names. All others will have an idx of NULL."}, {"id": "6b90c99d-d18b-4793-9458-0d67a13df05a", "colName": "stat_val", "colDatatype": "float", "colDesc": "the calculated value of the statistic."}, {"id": "60352909-bfdc-4160-a93b-be2480cc8df3", "colName": "col_name", "colDatatype": "nvarchar(4000)", "colDesc": "the column name from the resultant table produced by the dynamic SQL. col_name values are produced only for the m, se, tstat, and pval statistics; all other stat_names have NULL for col_name."}]}
Remarks
If @Lconst is NULL then @LConst is set to 'True'.
If @LConst is true than the number of rows must be greater than the number of columns.
If @Lconst is false than the number of rows must be greater than or equal to the number of columns.
For simpler queries, you can try the LINEST function.
If @Tol is NULL then @Tol = 0.000000015036712.
Examples
Example #1
We put the x- and y-data into a temp table, #xy.
SELECT *
INTO #xy
FROM ( VALUES (6.1, 1.21, 4.35, 5.42, 6.45, 138.08),
(7.95, 0.97, 2.79, 4.73, 8.14, 80.3),
(8.53, 9.73, 9.7, 1.16, 9.05, 284.45),
(7.4, 1.61, 9.9, 8.8, 4.38, 226.66),
(7.42, 4.58, 0.06, 8.97, 8.75, 112.37),
(6.19, 3.56, 9.69, 8.7, 5.67, 168.73),
(8.44, 3.85, 8.23, 1.05, 0.92, 160.38),
(0.84, 3.86, 7.85, 2.14, 3.03, 129.26),
(0.37, 7.33, 5.07, 8.06, 3.25, 170.39),
(7.48, 0.68, 8.34, 2.98, 2.81, 188.53),
(5.33, 3.51, 7.03, 6.49, 7.54, 131.1)) n (x1, x2, x3, x4, x5, y);
This is what the data look like.
{"columns":[{"field":"x1","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"x2","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"x3","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"x4","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"x5","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"y","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"}],"rows":[{"x1":"6.10","x2":"1.21","x3":"4.35","x4":"5.42","x5":"6.45","y":"138.08"},{"x1":"7.95","x2":"0.97","x3":"2.79","x4":"4.73","x5":"8.14","y":"80.30"},{"x1":"8.53","x2":"9.73","x3":"9.70","x4":"1.16","x5":"9.05","y":"284.45"},{"x1":"7.40","x2":"1.61","x3":"9.90","x4":"8.80","x5":"4.38","y":"226.66"},{"x1":"7.42","x2":"4.58","x3":"0.06","x4":"8.97","x5":"8.75","y":"112.37"},{"x1":"6.19","x2":"3.56","x3":"9.69","x4":"8.70","x5":"5.67","y":"168.73"},{"x1":"8.44","x2":"3.85","x3":"8.23","x4":"1.05","x5":"0.92","y":"160.38"},{"x1":"0.84","x2":"3.86","x3":"7.85","x4":"2.14","x5":"3.03","y":"129.26"},{"x1":"0.37","x2":"7.33","x3":"5.07","x4":"8.06","x5":"3.25","y":"170.39"},{"x1":"7.48","x2":"0.68","x3":"8.34","x4":"2.98","x5":"2.81","y":"188.53"},{"x1":"5.33","x2":"3.51","x3":"7.03","x4":"6.49","x5":"7.54","y":"131.10"}]}
To invoke the table-valued function LINEST_q , we enter the following SQL.
SELECT *
FROM wct.LINEST_q( 'SELECT * FROM #xy', --@Matrix_RangeQuery
6, --@Y_columnNumber
'True' --@Lconst
);
The following results are produced.
{"columns":[{"field":"stat_name"},{"field":"idx"},{"field":"stat_val","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"col_name"}],"rows":[{"stat_name":"m","idx":"0","stat_val":"-2.00793721243439","col_name":"Intercept"},{"stat_name":"m","idx":"1","stat_val":"6.75602376546862","col_name":"x1"},{"stat_name":"m","idx":"2","stat_val":"10.7172546474712","col_name":"x2"},{"stat_name":"m","idx":"3","stat_val":"11.482088782086","col_name":"x3"},{"stat_name":"m","idx":"4","stat_val":"2.66947583235558","col_name":"x4"},{"stat_name":"m","idx":"5","stat_val":"-1.11015571062543","col_name":"x5"},{"stat_name":"se","idx":"0","stat_val":"56.6362962641145","col_name":"Intercept"},{"stat_name":"se","idx":"1","stat_val":"5.18738179897952","col_name":"x1"},{"stat_name":"se","idx":"2","stat_val":"5.0471020356278","col_name":"x2"},{"stat_name":"se","idx":"3","stat_val":"4.44876625513428","col_name":"x3"},{"stat_name":"se","idx":"4","stat_val":"4.35402063583205","col_name":"x4"},{"stat_name":"se","idx":"5","stat_val":"5.97904930603055","col_name":"x5"},{"stat_name":"tstat","idx":"0","stat_val":"-0.0354531871764828","col_name":"Intercept"},{"stat_name":"tstat","idx":"1","stat_val":"1.30239570312671","col_name":"x1"},{"stat_name":"tstat","idx":"2","stat_val":"2.12344719243191","col_name":"x2"},{"stat_name":"tstat","idx":"3","stat_val":"2.58096023112803","col_name":"x3"},{"stat_name":"tstat","idx":"4","stat_val":"0.613105921085155","col_name":"x4"},{"stat_name":"tstat","idx":"5","stat_val":"-0.185674285961434","col_name":"x5"},{"stat_name":"pval","idx":"0","stat_val":"0.973090230335637","col_name":"Intercept"},{"stat_name":"pval","idx":"1","stat_val":"0.249542367556677","col_name":"x1"},{"stat_name":"pval","idx":"2","stat_val":"0.087124225219611","col_name":"x2"},{"stat_name":"pval","idx":"3","stat_val":"0.0493746060236316","col_name":"x3"},{"stat_name":"pval","idx":"4","stat_val":"0.566619010443336","col_name":"x4"},{"stat_name":"pval","idx":"5","stat_val":"0.859997582994016","col_name":"x5"},{"stat_name":"rsq","idx":"NULL","stat_val":"0.777597428923374","col_name":"NULL"},{"stat_name":"sey","idx":"NULL","stat_val":"37.5668536635793","col_name":"NULL"},{"stat_name":"F","idx":"NULL","stat_val":"3.49635089720011","col_name":"NULL"},{"stat_name":"df","idx":"NULL","stat_val":"5","col_name":"NULL"},{"stat_name":"ss_reg","idx":"NULL","stat_val":"24671.4493290961","col_name":"NULL"},{"stat_name":"ss_resid","idx":"NULL","stat_val":"7056.3424709039","col_name":"NULL"},{"stat_name":"rsqm","idx":"NULL","stat_val":"0.881814849570687","col_name":"NULL"},{"stat_name":"rsqa","idx":"NULL","stat_val":"0.555194857846747","col_name":"NULL"}]}
The results are returned in 3rd normal form. You can use standard SQL commands to re-format the results. For example, if you wanted to produce the the coefficients in a format similar to output of the Excel Data Analysis Regression Tool, you could use the following SQL.
;WITH mycte
as (SELECT *
FROM wct.LINEST_q('SELECT * FROM #xy', 6, 1) p )
SELECT d.col_name,
d.m,
d.se,
d.tstat,
d.pval,
m - wct.T_INV_2T(.05, m.stat_val) * se as [Lower Confidence Level],
m + wct.T_INV_2T(.05, m.stat_val) * se as [Upper Confidence Level]
FROM mycte p
PIVOT ( MAX(stat_val)
FOR stat_name in (m, se, tstat, pval)) d
CROSS JOIN mycte m
WHERE stat_name = 'df'
AND d.col_name IS NOT NULL;
This produces the following result.
{"columns":[{"field":"col_name"},{"field":"m","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"se","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"tstat","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"pval","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"Lower Confidence Level","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"Upper Confidence Level","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"}],"rows":[{"col_name":"Intercept","m":"-2.00793721243439","se":"56.6362962641145","tstat":"-0.0354531871764828","pval":"0.973090230335637","Lower Confidence Level":"-147.596171626684","Upper Confidence Level":"143.580297201815"},{"col_name":"x1","m":"6.75602376546862","se":"5.18738179897952","tstat":"1.30239570312671","pval":"0.249542367556677","Lower Confidence Level":"-6.57856566149857","Upper Confidence Level":"20.0906131924358"},{"col_name":"x2","m":"10.7172546474712","se":"5.0471020356278","tstat":"2.12344719243191","pval":"0.087124225219611","Lower Confidence Level":"-2.25673416791668","Upper Confidence Level":"23.6912434628591"},{"col_name":"x3","m":"11.482088782086","se":"4.44876625513428","tstat":"2.58096023112803","pval":"0.0493746060236316","Lower Confidence Level":"0.0461710556459689","Upper Confidence Level":"22.9180065085259"},{"col_name":"x4","m":"2.66947583235558","se":"4.35402063583205","tstat":"0.613105921085155","pval":"0.566619010443336","Lower Confidence Level":"-8.52289052609997","Upper Confidence Level":"13.8618421908111"},{"col_name":"x5","m":"-1.11015571062543","se":"5.97904930603055","tstat":"-0.185674285961434","pval":"0.859997582994016","Lower Confidence Level":"-16.4797912510815","Upper Confidence Level":"14.2594798298306"}]}
Similarly, if you wanted to reformat the results to produce the equivalent of the ANOVA table from the Excel Data Analysis Regression tool, you could use the following SQL.
;WITH mycte
as (SELECT df as [Regression df],
Obs - df - 1 as [Residual df],
ss_reg as [Regression SS],
ss_resid as [Residual SS],
F
FROM ( SELECT stat_name,
stat_val
FROM wct.LINEST_q('SELECT * FROM #xy', 6, 1) p
WHERE stat_name in ( 'ss_reg', 'ss_resid', 'F', 'df' )
UNION
SELECT 'Obs',
COUNT(*)
FROM #xy) d
PIVOT ( MAX(stat_val)
FOR stat_name in (df, F, ss_reg, ss_resid, Obs)) p)
SELECT 'Regression',
[Regression df] as DF,
[Regression SS] as SS,
[Regression SS] / [Regression df] as MS,
F,
wct.F_DIST_RT(F, [Regression df], [Residual df]) as [Significance F]
FROM mycte
UNION ALL
SELECT 'Residual',
[Residual df],
[Residual SS],
[Residual SS] / [Residual df] as [Residual MS],
NULL,
NULL
FROM mycte
UNION ALL
SELECT 'Total',
[Regression df] + [Residual df],
[Regression SS] + [Residual SS],
NULL,
NULL,
NULL
FROM mycte;
This produces the following result.
{"columns":[{"field":"(No Column Name)"},{"field":"DF","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"SS","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"MS"},{"field":"F"},{"field":"Significance F"}],"rows":[{"(No Column Name)":"Regression","DF":"5","SS":"24671.4493290961","MS":"4934.28986581922","F":"3.49635089720011","Significance F":"0.0978733240631539"},{"(No Column Name)":"Residual","DF":"5","SS":"7056.3424709039","MS":"1411.26849418078","F":"NULL","Significance F":"NULL"},{"(No Column Name)":"Total","DF":"10","SS":"31727.7918","MS":"NULL","F":"NULL","Significance F":"NULL"}]}
Since the function executes the @Matrix_RangeQuery values, we did not need to put the x- and y-values into a temporary table and could have simply included them in @Matrix_RangeQuery , like this.
SELECT *
FROM wct.LINEST_q(
'SELECT
*
FROM (VALUES
(6.1,1.21,4.35,5.42,6.45,138.08)
,(7.95,0.97,2.79,4.73,8.14,80.3)
,(8.53,9.73,9.7,1.16,9.05,284.45)
,(7.4,1.61,9.9,8.8,4.38,226.66)
,(7.42,4.58,0.06,8.97,8.75,112.37)
,(6.19,3.56,9.69,8.7,5.67,168.73)
,(8.44,3.85,8.23,1.05,0.92,160.38)
,(0.84,3.86,7.85,2.14,3.03,129.26)
,(0.37,7.33,5.07,8.06,3.25,170.39)
,(7.48,0.68,8.34,2.98,2.81,188.53)
,(5.33,3.51,7.03,6.49,7.54,131.1)
)n(x1,x2,x3,x4,x5,y)' ,
--@Matrix_RangeQuery
6,--@Y_columnNumber
'True' --@Lconst
);
Example #2
Using the same data as Example #1, if we wanted to calculate the coefficients without a y-intercept, we would modify function call to make the last parameter FALSE. This time, however, we will put the results into a temporary table, #L0, and reformat the results using the #L0 table.
SELECT *
INTO #L0
FROM wct.LINEST_q( 'SELECT * FROM #xy', --@Matrix_RangeQuery
6, --@Y_columnNumber
'False' --@Lconst
);
The #L0 table should contain the following data.
{"columns":[{"field":"stat_name"},{"field":"idx"},{"field":"stat_val"},{"field":"col_name"}],"rows":[{"stat_name":"m","idx":"0","stat_val":"0","col_name":"Intercept"},{"stat_name":"m","idx":"1","stat_val":"6.68653600898501","col_name":"x1"},{"stat_name":"m","idx":"2","stat_val":"10.669110550903","col_name":"x2"},{"stat_name":"m","idx":"3","stat_val":"11.3894353187733","col_name":"x3"},{"stat_name":"m","idx":"4","stat_val":"2.58577750738032","col_name":"x4"},{"stat_name":"m","idx":"5","stat_val":"-1.15990773149072","col_name":"x5"},{"stat_name":"se","idx":"0","stat_val":"0","col_name":"Intercept"},{"stat_name":"se","idx":"1","stat_val":"4.38493471629212","col_name":"x1"},{"stat_name":"se","idx":"2","stat_val":"4.4380097835218","col_name":"x2"},{"stat_name":"se","idx":"3","stat_val":"3.28695713905425","col_name":"x3"},{"stat_name":"se","idx":"4","stat_val":"3.34008793898134","col_name":"x4"},{"stat_name":"se","idx":"5","stat_val":"5.30630339494569","col_name":"x5"},{"stat_name":"tstat","idx":"0","stat_val":"NULL","col_name":"Intercept"},{"stat_name":"tstat","idx":"1","stat_val":"1.52488838297668","col_name":"x1"},{"stat_name":"tstat","idx":"2","stat_val":"2.40403042609709","col_name":"x2"},{"stat_name":"tstat","idx":"3","stat_val":"3.46503919489816","col_name":"x3"},{"stat_name":"tstat","idx":"4","stat_val":"0.774164499443965","col_name":"x4"},{"stat_name":"tstat","idx":"5","stat_val":"-0.218590541316492","col_name":"x5"},{"stat_name":"pval","idx":"0","stat_val":"NULL","col_name":"Intercept"},{"stat_name":"pval","idx":"1","stat_val":"0.178129170463663","col_name":"x1"},{"stat_name":"pval","idx":"2","stat_val":"0.0530030468411133","col_name":"x2"},{"stat_name":"pval","idx":"3","stat_val":"0.0133846278949679","col_name":"x3"},{"stat_name":"pval","idx":"4","stat_val":"0.468231464499349","col_name":"x4"},{"stat_name":"pval","idx":"5","stat_val":"0.834214519463804","col_name":"x5"},{"stat_name":"rsq","idx":"NULL","stat_val":"0.978154399885257","col_name":"NULL"},{"stat_name":"sey","idx":"NULL","stat_val":"34.2979988105928","col_name":"NULL"},{"stat_name":"F","idx":"NULL","stat_val":"53.7309697924091","col_name":"NULL"},{"stat_name":"df","idx":"NULL","stat_val":"6","col_name":"NULL"},{"stat_name":"ss_reg","idx":"NULL","stat_val":"316032.862965531","col_name":"NULL"},{"stat_name":"ss_resid","idx":"NULL","stat_val":"7058.11633446853","col_name":"NULL"},{"stat_name":"rsqm","idx":"NULL","stat_val":"0.989016885541019","col_name":"NULL"},{"stat_name":"rsqa","idx":"NULL","stat_val":"0.959949733122971","col_name":"NULL"}]}
We can use the same technique as in Example #1 to reformat the coefficient statistics.
SELECT d.col_name,
d.m,
d.se,
d.tstat,
d.pval,
m - wct.T_INV_2T(.05, m.stat_val) * se as [Lower Confidence Level],
m + wct.T_INV_2T(.05, m.stat_val) * se as [Upper Confidence Level]
FROM #L0 p
PIVOT ( MAX(stat_val)
FOR stat_name in (m, se, tstat, pval)) d
CROSS JOIN #L0 m
WHERE m.stat_name = 'df'
AND d.col_name IS NOT NULL;
This produces the following result.
{"columns":[{"field":"col_name"},{"field":"m","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"se","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"tstat"},{"field":"pval"},{"field":"Lower Confidence Level","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"Upper Confidence Level","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"}],"rows":[{"col_name":"Intercept","m":"0","se":"0","tstat":"NULL","pval":"NULL","Lower Confidence Level":"0","Upper Confidence Level":"0"},{"col_name":"x1","m":"6.68653600898501","se":"4.38493471629212","tstat":"1.52488838297668","pval":"0.178129170463663","Lower Confidence Level":"-4.04301271480719","Upper Confidence Level":"17.4160847327772"},{"col_name":"x2","m":"10.669110550903","se":"4.4380097835218","tstat":"2.40403042609709","pval":"0.0530030468411133","Lower Confidence Level":"-0.190308183893837","Upper Confidence Level":"21.5285292856998"},{"col_name":"x3","m":"11.3894353187733","se":"3.28695713905425","tstat":"3.46503919489816","pval":"0.0133846278949679","Lower Confidence Level":"3.3465409410159","Upper Confidence Level":"19.4323296965307"},{"col_name":"x4","m":"2.58577750738032","se":"3.34008793898134","tstat":"0.774164499443965","pval":"0.468231464499349","Lower Confidence Level":"-5.58712325437951","Upper Confidence Level":"10.7586782691401"},{"col_name":"x5","m":"-1.15990773149072","se":"5.30630339494569","tstat":"-0.218590541316492","pval":"0.834214519463804","Lower Confidence Level":"-14.1439643943541","Upper Confidence Level":"11.8241489313727"}]}
Note that even though we specified no intercept, an intercept row is still created with a regression coefficient of 0 (keeping the results consistent with the Excel Data Analysis Regression). It is simple enough to exclude this from the output (like R does) simply by adding another condition to the WHERE clause.
SELECT d.col_name,
d.m,
d.se,
d.tstat,
d.pval,
m - wct.T_INV_2T(.05, m.stat_val) * se as [Lower Confidence Level],
m + wct.T_INV_2T(.05, m.stat_val) * se as [Upper Confidence Level]
FROM #L0 p
PIVOT ( MAX(stat_val)
FOR stat_name in (m, se, tstat, pval)) d
CROSS JOIN #L0 m
WHERE m.stat_name = 'df'
AND d.col_name IS NOT NULL
AND d.col_name <> 'Intercept';
The following SQL can be used to reproduce the ANOVA table.
;WITH mycte
as (SELECT df as [Residual df],
--,Obs - df - 1 as [Regression df]
Obs - df as [Regression df],
ss_reg as [Regression SS],
ss_resid as [Residual SS],
F
FROM ( SELECT stat_name,
stat_Val
FROM #L0
WHERE stat_name in ( 'ss_reg', 'ss_resid', 'F', 'df' )
UNION
SELECT 'Obs',
COUNT(*)
FROM #xy) d
PIVOT ( MAX(stat_val)
FOR stat_name in (df, F, ss_reg, ss_resid, Obs)) p)
SELECT 'Regression',
[Regression df] as DF,
[Regression SS] as SS,
[Regression SS] / [Regression df] as MS,
F,
wct.F_DIST_RT(F, [Regression df], [Residual df]) as [Significance F]
FROM mycte
UNION ALL
SELECT 'Residual',
[Residual df],
[Residual SS],
[Residual SS] / [Residual df] as [Residual MS],
NULL,
NULL
FROM mycte
UNION ALL
SELECT 'Total',
[Regression df] + [Residual df],
[Regression SS] + [Residual SS],
NULL,
NULL,
NULL
FROM mycte;
This produces the following result.
{"columns":[{"field":"(No Column Name)"},{"field":"DF","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"SS","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"MS"},{"field":"F"},{"field":"Significance F"}],"rows":[{"(No Column Name)":"Regression","DF":"5","SS":"316032.862965531","MS":"63206.5725931063","F":"53.7309697924091","Significance F":"6.67425061507056E-05"},{"(No Column Name)":"Residual","DF":"6","SS":"7058.11633446853","MS":"1176.35272241142","F":"NULL","Significance F":"NULL"},{"(No Column Name)":"Total","DF":"11","SS":"323090.9793","MS":"NULL","F":"NULL","Significance F":"NULL"}]}
Note that the calculation of the regression degrees of freedom needs to be adjusted to reflect the absence on intercept. It also worthwhile noting that the calculation in this SQL produces a Significance F that agrees with R while Excel produces a different value (which seems to be caused be subtracting 1 from the residual degrees of freedom).
Example #3
Here’s another example. Instead of just having a matrix of x- and y-values in our table, we are going to add a column, testid, which is a way of grouping x- and y-values together for purposes of doing the LINEST_q calculation. This allows us to compute the ordinary least squares values for multiple sets of data in a single SELECT statement.
Second, we take the x-values and raise them to the powers of 2, 3, 4, 5 thus solving for the equation:
y = B0 + B1x + B2x2 + B3x3 + B4x4 + B5x5
SELECT *
INTO #xy
FROM ( VALUES ('Wampler5', 7590001, 0),
('Wampler5', -20479994, 1),
('Wampler5', 20480063, 2),
('Wampler5', -20479636, 3),
('Wampler5', 25231365, 4),
('Wampler5', -20476094, 5),
('Wampler5', 20489331, 6),
('Wampler5', -20460392, 7),
('Wampler5', 18417449, 8),
('Wampler5', -20413570, 9),
('Wampler5', 20591111, 10),
('Wampler5', -20302844, 11),
('Wampler5', 18651453, 12),
('Wampler5', -20077766, 13),
('Wampler5', 21059195, 14),
('Wampler5', -19666384, 15),
('Wampler5', 26348481, 16),
('Wampler5', -18971402, 17),
('Wampler5', 22480719, 18),
('Wampler5', -17866340, 19),
('Wampler5', 10958421, 20),
('Wampler4', 75901, 0),
('Wampler4', -204794, 1),
('Wampler4', 204863, 2),
('Wampler4', -204436, 3),
('Wampler4', 253665, 4),
('Wampler4', -200894, 5),
('Wampler4', 214131, 6),
('Wampler4', -185192, 7),
('Wampler4', 221249, 8),
('Wampler4', -138370, 9),
('Wampler4', 315911, 10),
('Wampler4', -27644, 11),
('Wampler4', 455253, 12),
('Wampler4', 197434, 13),
('Wampler4', 783995, 14),
('Wampler4', 608816, 15),
('Wampler4', 1370781, 16),
('Wampler4', 1303798, 17),
('Wampler4', 2205519, 18),
('Wampler4', 2408860, 19),
('Wampler4', 3444321, 20),
('Wampler3', 760, 0),
('Wampler3', -2042, 1),
('Wampler3', 2111, 2),
('Wampler3', -1684, 3),
('Wampler3', 3888, 4),
('Wampler3', 1858, 5),
('Wampler3', 11379, 6),
('Wampler3', 17560, 7),
('Wampler3', 39287, 8),
('Wampler3', 64382, 9),
('Wampler3', 113159, 10),
('Wampler3', 175108, 11),
('Wampler3', 273291, 12),
('Wampler3', 400186, 13),
('Wampler3', 581243, 14),
('Wampler3', 811568, 15),
('Wampler3', 1121004, 16),
('Wampler3', 1506550, 17),
('Wampler3', 2002767, 18),
('Wampler3', 2611612, 19),
('Wampler3', 3369180, 20),
('Wampler2', 1, 0),
('Wampler2', 1.11111, 1),
('Wampler2', 1.24992, 2),
('Wampler2', 1.42753, 3),
('Wampler2', 1.65984, 4),
('Wampler2', 1.96875, 5),
('Wampler2', 2.38336, 6),
('Wampler2', 2.94117, 7),
('Wampler2', 3.68928, 8),
('Wampler2', 4.68559, 9),
('Wampler2', 6, 10),
('Wampler2', 7.71561, 11),
('Wampler2', 9.92992, 12),
('Wampler2', 12.75603, 13),
('Wampler2', 16.32384, 14),
('Wampler2', 20.78125, 15),
('Wampler2', 26.29536, 16),
('Wampler2', 33.05367, 17),
('Wampler2', 41.26528, 18),
('Wampler2', 51.16209, 19),
('Wampler2', 63, 20),
('Wampler1', 1, 0),
('Wampler1', 6, 1),
('Wampler1', 63, 2),
('Wampler1', 364, 3),
('Wampler1', 1365, 4),
('Wampler1', 3906, 5),
('Wampler1', 9331, 6),
('Wampler1', 19608, 7),
('Wampler1', 37449, 8),
('Wampler1', 66430, 9),
('Wampler1', 111111, 10),
('Wampler1', 177156, 11),
('Wampler1', 271453, 12),
('Wampler1', 402234, 13),
('Wampler1', 579195, 14),
('Wampler1', 813616, 15),
('Wampler1', 1118481, 16),
('Wampler1', 1508598, 17),
('Wampler1', 2000719, 18),
('Wampler1', 2613660, 19),
('Wampler1', 3368421, 20)) n (testid, y, x);
Let’s say wanted to run LINEST_q for the all the data where the testid is equal to Wampler3. We could simply enter the following statement.
SELECT *
FROM wct.LINEST_q(
'SELECT y,x,POWER(x,2),POWER(x,3),POWER(x,4),POWER(x,5) FROM #xy WHERE testid = ''Wampler3''', 1, 'True');
This produces the following result.
{"columns":[{"field":"stat_name"},{"field":"idx"},{"field":"stat_val","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"col_name"}],"rows":[{"stat_name":"m","idx":"0","stat_val":"1.00000000017179","col_name":"Intercept"},{"stat_name":"m","idx":"1","stat_val":"0.999999999679294","col_name":"x"},{"stat_name":"m","idx":"2","stat_val":"1.0000000000945","col_name":"Column1"},{"stat_name":"m","idx":"3","stat_val":"0.999999999989123","col_name":"Column2"},{"stat_name":"m","idx":"4","stat_val":"1.00000000000055","col_name":"Column3"},{"stat_name":"m","idx":"5","stat_val":"0.99999999999999","col_name":"Column4"},{"stat_name":"se","idx":"0","stat_val":"2152.32624678169","col_name":"Intercept"},{"stat_name":"se","idx":"1","stat_val":"2363.55173469678","col_name":"x"},{"stat_name":"se","idx":"2","stat_val":"779.343524331576","col_name":"Column1"},{"stat_name":"se","idx":"3","stat_val":"101.475507550349","col_name":"Column2"},{"stat_name":"se","idx":"4","stat_val":"5.64566512170747","col_name":"Column3"},{"stat_name":"se","idx":"5","stat_val":"0.112324854679311","col_name":"Column4"},{"stat_name":"tstat","idx":"0","stat_val":"0.000464613578757896","col_name":"Intercept"},{"stat_name":"tstat","idx":"1","stat_val":"0.000423092071563048","col_name":"x"},{"stat_name":"tstat","idx":"2","stat_val":"0.00128313121091521","col_name":"Column1"},{"stat_name":"tstat","idx":"3","stat_val":"0.00985459471087597","col_name":"Column2"},{"stat_name":"tstat","idx":"4","stat_val":"0.177127048530663","col_name":"Column3"},{"stat_name":"tstat","idx":"5","stat_val":"8.90274910975851","col_name":"Column4"},{"stat_name":"pval","idx":"0","stat_val":"0.999635414800478","col_name":"Intercept"},{"stat_name":"pval","idx":"1","stat_val":"0.999667996985275","col_name":"x"},{"stat_name":"pval","idx":"2","stat_val":"0.998993119116665","col_name":"Column1"},{"stat_name":"pval","idx":"3","stat_val":"0.992267170697082","col_name":"Column2"},{"stat_name":"pval","idx":"4","stat_val":"0.86177817377114","col_name":"Column3"},{"stat_name":"pval","idx":"5","stat_val":"2.25341849383758E-07","col_name":"Column4"},{"stat_name":"rsq","idx":"NULL","stat_val":"0.99999555902582","col_name":"NULL"},{"stat_name":"sey","idx":"NULL","stat_val":"2360.14502379269","col_name":"NULL"},{"stat_name":"F","idx":"NULL","stat_val":"675524.458240117","col_name":"NULL"},{"stat_name":"df","idx":"NULL","stat_val":"15","col_name":"NULL"},{"stat_name":"ss_reg","idx":"NULL","stat_val":"18814317208116.7","col_name":"NULL"},{"stat_name":"ss_resid","idx":"NULL","stat_val":"83554268.0000007","col_name":"NULL"},{"stat_name":"rsqm","idx":"NULL","stat_val":"0.999997779510445","col_name":"NULL"},{"stat_name":"rsqa","idx":"NULL","stat_val":"0.999994078701093","col_name":"NULL"}]}
You will notice that the column names for the calculated columns are Column1, Column2, Column3, and Column4. This is because we have not assigned them a name and these are the default column names. Even though Column1 does not refer to the first column in the resultant table from our dynamic SQL, it is the first column with no name. To get more descriptive column names, you can simply assign the names in the @ColumnNames variable as in this example.
SELECT *
FROM wct.LINEST_q(
'SELECT y,x,POWER(x,2) as [x^2],POWER(x,3) as [x^3],POWER(x,4) as [x^4],POWER(x,5) as [x^5] FROM #xy WHERE testid = ''Wampler3''',
1,
'True');
This produces the following result.
{"columns":[{"field":"stat_name"},{"field":"idx"},{"field":"stat_val","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"col_name"}],"rows":[{"stat_name":"m","idx":"0","stat_val":"1.00000000017179","col_name":"Intercept"},{"stat_name":"m","idx":"1","stat_val":"0.999999999679294","col_name":"x"},{"stat_name":"m","idx":"2","stat_val":"1.0000000000945","col_name":"x^2"},{"stat_name":"m","idx":"3","stat_val":"0.999999999989123","col_name":"x^3"},{"stat_name":"m","idx":"4","stat_val":"1.00000000000055","col_name":"x^4"},{"stat_name":"m","idx":"5","stat_val":"0.99999999999999","col_name":"x^5"},{"stat_name":"se","idx":"0","stat_val":"2152.32624678169","col_name":"Intercept"},{"stat_name":"se","idx":"1","stat_val":"2363.55173469678","col_name":"x"},{"stat_name":"se","idx":"2","stat_val":"779.343524331576","col_name":"x^2"},{"stat_name":"se","idx":"3","stat_val":"101.475507550349","col_name":"x^3"},{"stat_name":"se","idx":"4","stat_val":"5.64566512170747","col_name":"x^4"},{"stat_name":"se","idx":"5","stat_val":"0.112324854679311","col_name":"x^5"},{"stat_name":"tstat","idx":"0","stat_val":"0.000464613578757896","col_name":"Intercept"},{"stat_name":"tstat","idx":"1","stat_val":"0.000423092071563048","col_name":"x"},{"stat_name":"tstat","idx":"2","stat_val":"0.00128313121091521","col_name":"x^2"},{"stat_name":"tstat","idx":"3","stat_val":"0.00985459471087597","col_name":"x^3"},{"stat_name":"tstat","idx":"4","stat_val":"0.177127048530663","col_name":"x^4"},{"stat_name":"tstat","idx":"5","stat_val":"8.90274910975851","col_name":"x^5"},{"stat_name":"pval","idx":"0","stat_val":"0.999635414800478","col_name":"Intercept"},{"stat_name":"pval","idx":"1","stat_val":"0.999667996985275","col_name":"x"},{"stat_name":"pval","idx":"2","stat_val":"0.998993119116665","col_name":"x^2"},{"stat_name":"pval","idx":"3","stat_val":"0.992267170697082","col_name":"x^3"},{"stat_name":"pval","idx":"4","stat_val":"0.86177817377114","col_name":"x^4"},{"stat_name":"pval","idx":"5","stat_val":"2.25341849383758E-07","col_name":"x^5"},{"stat_name":"rsq","idx":"NULL","stat_val":"0.99999555902582","col_name":"NULL"},{"stat_name":"sey","idx":"NULL","stat_val":"2360.14502379269","col_name":"NULL"},{"stat_name":"F","idx":"NULL","stat_val":"675524.458240117","col_name":"NULL"},{"stat_name":"df","idx":"NULL","stat_val":"15","col_name":"NULL"},{"stat_name":"ss_reg","idx":"NULL","stat_val":"18814317208116.7","col_name":"NULL"},{"stat_name":"ss_resid","idx":"NULL","stat_val":"83554268.0000007","col_name":"NULL"},{"stat_name":"rsqm","idx":"NULL","stat_val":"0.999997779510445","col_name":"NULL"},{"stat_name":"rsqa","idx":"NULL","stat_val":"0.999994078701093","col_name":"NULL"}]}
In the following SQL we return selected values for each test.
SELECT p.testid,
p.rsq,
p.rsqa,
p.rsqm,
p.F,
p.df,
p.ss_reg,
p.ss_resid
FROM ( SELECT n.testid,
k.stat_name,
k.stat_val
FROM (SELECT DISTINCT testid FROM #xy) n
CROSS APPLY wct.LINEST_q(
'SELECT y,x,POWER(x,2) as [x^2],POWER(x,3) as [x^3],POWER(x,4) as [x^4],POWER(x,5) as [x^5] FROM #xy WHERE testid = '''
+ n.testid + '''',
1,
'True') k ) d
PIVOT ( SUM(stat_val)
FOR stat_name IN (F, df, ss_reg, ss_resid, rsq, rsqm, rsqa)) p;
This produces the following result.
{"columns":[{"field":"testid"},{"field":"rsq","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"rsqa","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"rsqm","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"F","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"df","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"ss_reg","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"ss_resid","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"}],"rows":[{"testid":"Wampler1","rsq":"1","rsqa":"1","rsqm":"1","F":"1.38198630394956E+32","df":"15","ss_reg":"18814317208116.7","ss_resid":"4.08419037605817E-19"},{"testid":"Wampler2","rsq":"1","rsqa":"1","rsqm":"1","F":"6.62975864402057E+31","df":"15","ss_reg":"6602.91858365167","ss_resid":"2.98785473417206E-28"},{"testid":"Wampler3","rsq":"0.99999555902582","rsqa":"0.999994078701093","rsqm":"0.999997779510445","F":"675524.458240117","df":"15","ss_reg":"18814317208116.7","ss_resid":"83554268.0000007"},{"testid":"Wampler4","rsq":"0.957478440825662","rsqa":"0.94330458776755","rsqm":"0.978508273253559","F":"67.5524458240123","df":"15","ss_reg":"18814317208116.7","ss_resid":"835542680000"},{"testid":"Wampler5","rsq":"0.0022466892157494","rsqa":"-0.330337747712334","rsqm":"0.0473992533248088","F":"0.00675524458240124","df":"15","ss_reg":"18814317208116.7","ss_resid":"8.3554268E+15"}]}
In this SQL we select the coefficient statistics for each of the tests.
SELECT p.testid,
p.col_name,
p.m,
p.se,
p.tstat,
p.pval
FROM ( SELECT n.testid,
k.stat_name,
k.idx,
k.col_name,
k.stat_val
FROM (SELECT DISTINCT testid FROM #xy) n
CROSS APPLY wct.LINEST_q(
'SELECT y,x,POWER(x,2) as [x^2],POWER(x,3) as [x^3],POWER(x,4) as [x^4],POWER(x,5) as [x^5] FROM #xy WHERE testid = '''
+ n.testid + '''',
1,
'True') k
WHERE k.idx IS NOT NULL) d
PIVOT ( SUM(stat_val)
FOR stat_name IN (m, se, tstat, pval)) p
ORDER BY testid,
idx;
This produces the following result.
{"columns":[{"field":"testid"},{"field":"col_name"},{"field":"m","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"se","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"tstat","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"pval","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"}],"rows":[{"testid":"Wampler1","col_name":"Intercept","m":"0.999999999993022","se":"1.50479303077928E-10","tstat":"6645432159.36582","pval":"6.15882032963405E-140"},{"testid":"Wampler1","col_name":"x","m":"0.999999999844435","se":"1.65247075510795E-10","tstat":"6051544311.77277","pval":"2.50814535309203E-139"},{"testid":"Wampler1","col_name":"x^2","m":"1.00000000004225","se":"5.44875901481339E-11","tstat":"18352802855.1745","pval":"1.4852856151972E-146"},{"testid":"Wampler1","col_name":"x^3","m":"0.999999999995625","se":"7.09463246033833E-12","tstat":"140951628655.325","pval":"7.78564309561968E-160"},{"testid":"Wampler1","col_name":"x^4","m":"1.00000000000021","se":"3.94715138653456E-13","tstat":"2533472628923.33","pval":"1.17924139944035E-178"},{"testid":"Wampler1","col_name":"x^5","m":"0.999999999999996","se":"7.85316160862982E-15","tstat":"127337249611812","pval":"3.57284346398899E-204"},{"testid":"Wampler2","col_name":"Intercept","m":"0.999999999999995","se":"4.0700853992358E-15","tstat":"245695090375194","pval":"1.86809429176221E-208"},{"testid":"Wampler2","col_name":"x","m":"0.0999999999999979","se":"4.46951637564801E-15","tstat":"22373785348420.3","pval":"7.60771017143417E-193"},{"testid":"Wampler2","col_name":"x^2","m":"0.0100000000000011","se":"1.47375180882262E-15","tstat":"6785403037428.73","pval":"4.5051705244268E-185"},{"testid":"Wampler2","col_name":"x^3","m":"0.000999999999999853","se":"1.91891904063468E-16","tstat":"5211267275085.79","pval":"2.36154241344595E-183"},{"testid":"Wampler2","col_name":"x^4","m":"0.000100000000000008","se":"1.06760483988871E-17","tstat":"9366761582912.28","pval":"3.57687675436682E-187"},{"testid":"Wampler2","col_name":"x^5","m":"9.99999999999986E-06","se":"2.12408203303352E-19","tstat":"47079160995106.7","pval":"1.0837154071571E-197"},{"testid":"Wampler3","col_name":"Intercept","m":"1.00000000017179","se":"2152.32624678169","tstat":"0.000464613578757896","pval":"0.999635414800478"},{"testid":"Wampler3","col_name":"x","m":"0.999999999679294","se":"2363.55173469678","tstat":"0.000423092071563048","pval":"0.999667996985275"},{"testid":"Wampler3","col_name":"x^2","m":"1.0000000000945","se":"779.343524331576","tstat":"0.00128313121091521","pval":"0.998993119116665"},{"testid":"Wampler3","col_name":"x^3","m":"0.999999999989123","se":"101.475507550349","tstat":"0.00985459471087597","pval":"0.992267170697082"},{"testid":"Wampler3","col_name":"x^4","m":"1.00000000000055","se":"5.64566512170747","tstat":"0.177127048530663","pval":"0.86177817377114"},{"testid":"Wampler3","col_name":"x^5","m":"0.99999999999999","se":"0.112324854679311","tstat":"8.90274910975851","pval":"2.25341849383758E-07"},{"testid":"Wampler4","col_name":"Intercept","m":"1.00000001085174","se":"215232.624678168","tstat":"4.64613583719948E-06","pval":"0.999996354191052"},{"testid":"Wampler4","col_name":"x","m":"0.999999977160948","se":"236355.173469677","tstat":"4.23092062035716E-06","pval":"0.999996679970601"},{"testid":"Wampler4","col_name":"x^2","m":"1.00000000841373","se":"77934.3524331573","tstat":"1.28313122158987E-05","pval":"0.999989931165944"},{"testid":"Wampler4","col_name":"x^3","m":"0.999999998874462","se":"10147.5507550348","tstat":"9.85459469989148E-05","pval":"0.999922670373091"},{"testid":"Wampler4","col_name":"x^4","m":"1.00000000006258","se":"564.566512170745","tstat":"0.0017712704854165","pval":"0.99861007362054"},{"testid":"Wampler4","col_name":"x^5","m":"0.99999999999877","se":"11.2324854679311","tstat":"0.0890274910974769","pval":"0.930237861013585"},{"testid":"Wampler5","col_name":"Intercept","m":"1.00000109952421","se":"21523262.4678168","tstat":"4.64614089531958E-08","pval":"0.999999967977304"},{"testid":"Wampler5","col_name":"x","m":"0.999997728516215","se":"23635517.3469677","tstat":"4.23091110651957E-08","pval":"0.999999967977304"},{"testid":"Wampler5","col_name":"x^2","m":"1.00000083489823","se":"7793435.24331573","tstat":"1.28313228207793E-07","pval":"0.999999898735342"},{"testid":"Wampler5","col_name":"x^3","m":"0.999999888409967","se":"1014755.07550348","tstat":"9.85459361130866E-07","pval":"0.999999226799412"},{"testid":"Wampler5","col_name":"x^4","m":"1.00000000620071","se":"56456.6512170745","tstat":"1.7712704962888E-05","pval":"0.999986100711028"},{"testid":"Wampler5","col_name":"x^5","m":"0.999999999878203","se":"1123.2485467931","tstat":"0.000890274910867431","pval":"0.999301395744685"}]}
See Also
LINEST - the Ordinary Least Squares (OLS) solution for a series of x-values and y-values
LOGEST_Q - Exponential curve that best fits a series of x- and y-values
GROWTH - calculate predicted exponential growth using existing values
TRENDMX - Calculate the values along a linear trend for multiple x values