KSTEST
Updated 2023-10-26 14:11:48.413000
Syntax
SELECT [westclintech].[wct].[KSTEST] (
<@x, float,>
,<@cdf, float,> <@statistic, nvarchar(4000),>)
Description
Use the aggregate function KSTEST to compare a sample with a reference probability distribution. The Kolmogorov-Smirnov statistic qualifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference sample.
Arguments
@x
the sample values. @x is an expression of type float or of a type that can be implicitly converted to float.
@cdf
the values of the cumulative distribution function. @cdf is an expression of type float or of a type that can be implicitly converted to float.
@statistic
identifies the return value as either the Kolmogorov-Smirnov statistic (k) or the p-value (p).
Return Type
float
Remarks
If @x is null then @x and the corresponding @cdf are not included in the aggregate.
If @cdf is null then @cdf and the corresponding @x are not included in the aggregate.
@statistic must be invariant for the GROUP.
KSTEST is an aggregate function and follows the same conventions as all other aggregate functions in SQL Server.
To evaluate two samples, use KSTEST2.
Examples
In this example, we compare the sample (-2,-1, 0, 1, 2, 3, 4) to the standard normal distribution.
SELECT wct.KSTEST(x, wct.NORMSDIST(x), 'k') as k_observed,
wct.KSTEST(x, wct.NORMSDIST(x), 'p') as p_value
FROM
(
VALUES
(-2),
(-1),
(0),
(1),
(2),
(3),
(4)
) n (x);
This produces the following result.
{"columns":[{"field":"k_observed","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"p_value","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"}],"rows":[{"k_observed":"0.412773317497114","p_value":"0.135890127950776"}]}
In this example, we compare the sample x, consisting of 1,000 randomly generated numbers in the standard normal distribution, to the standard normal distribution.
SELECT wct.KSTEST(x, wct.NORMSDIST(x), 'P') as p_value,
wct.KSTEST(x, wct.NORMSDIST(x), 'k') as k_observed
FROM
(
SELECT k.seriesvalue as x
FROM wctMath.wct.SeriesFloat(0, 1, NULL, 1000, 'N') k
) n(x);
This produces the following result. Your results will be different.
{"columns":[{"field":"p_value","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"k_observed","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"}],"rows":[{"p_value":"0.248195490618504","k_observed":"0.0321196162767327"}]}
In this example, we compare the uniformaly random numbers to the gamma distribution with shape 1 and scale 1.
SELECT wct.KSTEST(x, wct.GAMMADIST(x, 1, 1, 'True'), 'P') as p_value,
wct.KSTEST(x, wct.GAMMADIST(x, 1, 1, 'True'), 'k') as k_observed
FROM
(
SELECT k.seriesvalue as x
FROM wctMath.wct.SeriesFloat(0, 1, NULL, 100, 'R') k
) n(x);
This produces the following result. Your results will be different.
{"columns":[{"field":"p_value","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"k_observed","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"}],"rows":[{"p_value":"1.09546692948201E-12","k_observed":"0.368492604785191"}]}