KSTEST

Updated 2023-10-26 14:11:48.413000

Syntax

SELECT [westclintech].[wct].[KSTEST] (
  <@x, float,> 
 ,<@cdf, float,> <@statistic, nvarchar(4000),>)

Description

Use the aggregate function KSTEST to compare a sample with a reference probability distribution. The Kolmogorov-Smirnov statistic qualifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference sample.

Arguments

@x

the sample values. @x is an expression of type float or of a type that can be implicitly converted to float.

@cdf

the values of the cumulative distribution function. @cdf is an expression of type float or of a type that can be implicitly converted to float.

@statistic

identifies the return value as either the Kolmogorov-Smirnov statistic (k) or the p-value (p).

Return Type

float

Remarks

If @x is null then @x and the corresponding @cdf are not included in the aggregate.

If @cdf is null then @cdf and the corresponding @x are not included in the aggregate.

@statistic must be invariant for the GROUP.

KSTEST is an aggregate function and follows the same conventions as all other aggregate functions in SQL Server.

To evaluate two samples, use KSTEST2.

Examples

In this example, we compare the sample (-2,-1, 0, 1, 2, 3, 4) to the standard normal distribution.

SELECT wct.KSTEST(x, wct.NORMSDIST(x), 'k') as k_observed,

       wct.KSTEST(x, wct.NORMSDIST(x), 'p') as p_value

FROM

(

    VALUES

        (-2),

        (-1),

        (0),

        (1),

        (2),

        (3),

        (4)

) n (x);

This produces the following result.

{"columns":[{"field":"k_observed","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"p_value","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"}],"rows":[{"k_observed":"0.412773317497114","p_value":"0.135890127950776"}]}

In this example, we compare the sample x, consisting of 1,000 randomly generated numbers in the standard normal distribution, to the standard normal distribution.

SELECT wct.KSTEST(x, wct.NORMSDIST(x), 'P') as p_value,

       wct.KSTEST(x, wct.NORMSDIST(x), 'k') as k_observed

FROM

(

    SELECT k.seriesvalue as x

    FROM wctMath.wct.SeriesFloat(0, 1, NULL, 1000, 'N') k

) n(x);

This produces the following result. Your results will be different.

{"columns":[{"field":"p_value","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"k_observed","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"}],"rows":[{"p_value":"0.248195490618504","k_observed":"0.0321196162767327"}]}

In this example, we compare the uniformaly random numbers to the gamma distribution with shape 1 and scale 1.

SELECT wct.KSTEST(x, wct.GAMMADIST(x, 1, 1, 'True'), 'P') as p_value,

       wct.KSTEST(x, wct.GAMMADIST(x, 1, 1, 'True'), 'k') as k_observed

FROM

(

    SELECT k.seriesvalue as x

    FROM wctMath.wct.SeriesFloat(0, 1, NULL, 100, 'R') k

) n(x);

This produces the following result. Your results will be different.

{"columns":[{"field":"p_value","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"},{"field":"k_observed","headerClass":"ag-right-aligned-header","cellClass":"ag-right-aligned-cell"}],"rows":[{"p_value":"1.09546692948201E-12","k_observed":"0.368492604785191"}]}