This is something that came up at work recently. A user wanted to migrate a report from Excel to SSRS and needed to be able to implement the NORMDIST function in T-SQL. NORMDIST takes four parameters:
- x, a number indicating the value of interest on the normal distribution curve
- mu, a number indicating the sample mean
- sigma, a number indicating the sample standard deviation
- cumulative, a boolean indicating whether to compute the cumulative value of the normal distribution curve (in other words, the integral of the normal distribution from negative infinity to x). For what I have done here, I am assuming that cumulative is TRUE. The results for cumulative = FALSE will be completely different.
My statistics knowledge is, to say the very least, quite rusty, but with the help of a couple of Wikipedia pages (this and this), I came up with the following T-SQL that (at least for a smattering of value I tested) comes pretty close to the Excel NORMDIST function.
-- Initialize parameters declare @x float = 1.5; declare @mu float = 0.0; declare @sigma float = 1.0; -- Compute NORMDIST declare @erfx float = (@x - @mu) / sqrt(2.0 * power(@sigma, 2.0)); declare @sign float = sign(@erfx); select @erfx = abs(@erfx); declare @t float = 1.0 / (1.0 + 0.3275911 * @erfx); declare @erf float = @sign * (1.0 - (0.254829592 * @t - 0.284496736 * power(@t, 2.0) + 1.421413741 * power(@t, 3.0) - 1.453152027 * power(@t, 4.0) + 1.061405429 * power(@t, 5.0)) * exp(-power(@erfx, 2.0))); declare @normDist float = 0.5 * (1.0 + @erf); select @normDist; |
Of course, the usual caveats apply. Be sure to validate this script against Excel for any values of interest, and please let me know if you find any errors in the results.
A final tip: Tempting as it may be, don’t implement this as a user-defined function. Scalar UDFs really are quite bad for performance.