Westclintech

A few weeks ago a client contacted us about the **XLeratorDB LINEST** function and whether or not it could a handle a linear system where the number of rows was less than the number of columns. I told him that in such a case the **LINEST** function would generate an error message because it requires that the number of rows to at least be equal to the number of regressors.

He then told me that he could do it in Excel, so I asked him to send me his workbook, which he did. After reviewing his workbook I discovered that his particular example wasn’t actually doing what he thought it was doing and that he actually did have a linear system that satisfied the requirements with regards to the number of rows and the number of regressors. Further, XLeratorDB and Excel produced identical results. A few days later he came back to me and told me that he was able to get Excel to produce values in LINEST where the number of rows was less than the number of regressors. I really had a hard time believing this and because of the difference in time zones and language I started investigating on my own. I use Google Sheets a lot more than I use Excel these days. I created a Sheet with 4 regressors plus the intercept but with only three rows of data: ![17A.jpg](https://wct-staging-lb-918786685.us-east-1.elb.amazonaws.com/uploads/17_A_f644715df2.jpg) This produced the expected error message. I then tried the same thing in R using Jupyter Notebooks. I was actually a little surprised by this, as I expected R not to return a result at all. However, there is a warning message in the summary results for the coefficients: "2 not defined because of singularities". I then zeroed in on the Multiple R-squared of 1. This means that the fitted y-values were exactly the same as the supplied y-values; there was no residual error. In effect, the x3 and x4 columns were dropped in regression and the number of rows was equal to the number of regressors, providing an exact fit for this linear system. This is easy enough to check in R, as the QR decomposition is stored as part of the regression and we can use that to reconstruct the x matrix, including the intercept. ```sql SELECT * INTO #t FROM (VALUES (99,92,19,-138,64) ,(98,59,55,-156,102) ,(108,144,66,-161,102) )n(x1,x2,x3,x4,y) SELECT ISNULL(Intercept, 0) as Intercept ,ISNULL(x1,0) as x1 ,ISNULL(x2,0) as x2 ,ISNULL(x3,0) as x3 ,ISNULL(x4,0) as x4 FROM ( SELECT n.regressors, L.stat_val, L.stat_name, L.col_name FROM (VALUES ('x1,x2'), ('x1,x3'), ('x1,x4'), ('x2,x3'), ('x2,x4'), ('x3,x4') )n(regressors) CROSS APPLY wct.LINEST('#t',CONCAT(n.regressors,',y'),'',NULL,3,'1')L WHERE stat_name = 'm' )d PIVOT (max(stat_val) for col_name in (Intercept,x1,x2,x3,x4))pvt ORDER BY regressors ``` Notice that row 1 has returned the same results as R. Row 4 has returned the same results as Excel. But there are 4 other results that are exactly as good a fit, depending on which columns are used in the regression. We can run the following SQL to look at some of the summary statistics from the 6 different regressions. ```sql SELECT pvt.regressors ,pvt.rsq ,pvt.ss_resid ,pvt.ss_reg ,pvt.df ,pvt.F FROM ( SELECT n.regressors ,L.stat_val ,L.stat_name FROM (VALUES ('x1,x2'), ('x1,x3'), ('x1,x4'), ('x2,x3'), ('x2,x4'), ('x3,x4') )n(regressors) CROSS APPLY wct.LINEST('#t',CONCAT(n.regressors,',y'),'',NULL,3,'1')L )d PIVOT (MAX(stat_val) FOR stat_name in (rsq, ss_resid, ss_reg, df, F))pvt ORDER BY 1 ``` First, if you are using ordinary least squares to make predictions you should not use underdetermined system; their predictive value will be extremely low. And you might not realize that you are doing so, as the TREND and GROWTH functions perform Ordinary Least Squares under the covers and you might not realize how poor those predicted values might be. Second, if you are using ordinary least squares to explain the linear system, the explanation won’t make much sense. In our example, there are 6 possible values for the intercept and 3 possible values for each of the regressors (x1, x2, x3, x4). There’s not much explanatory value in that. That’s why we think that the best solution is the Google Sheets solution; explicitly trap the problem and report the error. GIGO (Garbage In, Garbage Out) is an acronym for a reason. Let us know what you think! Send an e-mail so [support@westclintech.com](mailto:support@westclintech.com). If you want to try doing linear regression in SQL Server download the 15-day Trial and try some of the examples in this article or in the documentation on the website. >Notice that row 1 has returned the same results as R. Row 4 has returned the same results as Excel. But there are 4 other results that are exactly as good a fit, depending on which columns are used in the regression. >We can run the following SQL to look at some of the summary statistics from the 6 different regressions.

Underdetermined Systems and Ordinary Least Squares

Try for 15 Days!