Thursday 21 January 2010

Standard Deviation at n+1

I faced a little problem recently @ work: the need to generate real time standard deviation for millions of data (streamed continuously).

So the idea is to find a recurrence relationship for the standard deviation. From the initial Wiki formula, and after 5 lines of easy maths, you end up with:



From that you see that you just need to store the previous standard deviation, the sum of xi, and the sum of the xi square. You just have to initialize with two values, and after that, you calculate the new standard deviation on the fly for the new xi.

Looks like this in Java


public class StandardDeviation {
private double sumXis = Double.NaN, sumXis2 = Double.NaN;
private double n = Double.NaN;
private double stdev2 = Double.NaN;
private double result = Double.NaN;

public StandardDeviation() {
}

public double stdDev(final double[] xs) {
final int length = xs.length;
n = length;

double sdx1 = 0.0;
double sdx2 = 0.0;
for(int i = 0; i < length; i++) {
sdx1 += xs[i] * xs[i];
sdx2 += xs[i];
}
sumXis = sdx2;
sumXis2 = sdx1;
sdx2 = sdx2 * sdx2;

final double std2 = (length * sdx1 - sdx2) / (length * (length-1));
stdev2 = std2;
final double std = Math.sqrt(std2);

result = std;

return std;
}

public double stdDev(final double xs) {
final double stdev2n1 = (n * (n - 1.0) * stdev2 + n * xs * xs + sumXis2 - 2 * xs * sumXis)
/ (n * (n + 1));

sumXis += xs;
sumXis2 += xs * xs;
stdev2 = stdev2n1;
n += 1;

final double result = Math.sqrt(stdev2n1);

this.result = result;

return result;
}

public double getResult() {
return result;
}
}

No comments:

Blog Archive