Summary Statistics¶
Source: adopted from here
Introduction¶
Summary statistics is common in all statistical package in most programming language. In Microsoft Excel, the following summary statistics are provided. Given a list of numerical values [x_1, x_2, \cdots, x_N] and assuming no null values in this list,
- Mean $$ \mu = \frac{1}{N} \sum_{i=1}^{N} x_i = \bar{x} $$
- Standard Error $$ s_{\bar{x}} = \frac{\sqrt{\frac{1}{N-1} \sum_{i=1}^N (x-\bar{x})^2}}{\sqrt{N}} = \frac{s}{\sqrt{N}} $$
- Median: Assuming that sorting the data ascendingly gives [x^o_1, x^o_2, \cdots, x^o_N], then $$ x_{med} = \frac{1}{2}( x_{\left\lfloor \frac{N+1}{2} \right\rfloor}^{o} + x_{\left\lceil \frac{N+1}{2} \right\rceil}^{o} ) $$ where \lfloor \cdot \rfloor and \lceil \cdot \rceil denote the floor and ceiling functions, respectively.
- Mode: the most frequent value(s) in the list
- Standard Deviation $$ s = \sqrt{\frac{1}{N-1} \sum_{i=1}^N (x-\bar{x})^2} $$
- Sample Variance $$ s^2 = \frac{1}{N-1} \sum_{i=1}^N (x-\bar{x})^2 $$
- Kurtosis: The sample kurtosis is defined as: $$ K = \frac{\frac{1}{N} \sum_{i=1}^N (x-\bar{x})^4}{\left[\frac{1}{N-1} \sum_{i=1}^N (x-\bar{x})^2 \right]^2} - 3 $$
- Skewness: The sample skewness is defined as: $$ \frac{\sqrt{N(N-1)}}{N-1} \cdot \frac{\frac{1}{N} \sum_{i=1}^N (x-\bar{x})^3}{\left[\frac{1}{N-1} \sum_{i=1}^N (x-\bar{x})^2 \right]^{\frac{3}{2}}} $$
- Range $$ r = max(x_1, x_2, \cdots, x_N) - min(x_1, x_2, \cdots, x_N) = x_{max} - x_{min} $$
- Minimum $$ x_{min} = min(x_1, x_2, \cdots, x_N) $$
- Maximum $$ x_{max} = max(x_1, x_2, \cdots, x_N) $$
- Sum $$ x_{sum} = \sum_{i=1}^N x_i $$
- Count: The number of data points in the list.
Question¶
Let's replicate the summary statistics reported by Excel. Write a function summaryStats[data]
to calculate the summary statistics of a given list of numbers.
You can use the following code snippet to generate a randome list of numbers:
system "S -314159";
data:10000?til 5000;
The output of the summary statistics might look like:
statsName | statsValue |
---|---|
Count | 10000 |
Mean | 2478.983 |
Sample Variance | 2087419 |
Standard Deviation | 1444.79 |
Standard Error | 14.4479 |
Median | 2464 |
Answer¶
The suggested answer is as follows.
summaryStats:{
stats:([] statsName:"s"$();statsValue:"f"$());
n:count x;
stats:stats,`statsName`statsValue!(`Count;n);
mu:sum[x]%n;
stats:stats,`statsName`statsValue!(`Mean;mu);
variance:sum[dm*dm:x-mu]%n-1; / or s*s:sdev x
stats:stats,`statsName`statsValue!(`$"Sample Variance";variance);
sd:sqrt variance;
stats:stats,`statsName`statsValue!(`$"Standard Deviation";sd);
se:sd%sqrt n;
stats:stats,`statsName`statsValue!(`$"Standard Error";se);
xo:asc x;
median1:med x;
median:0.5*xo[-1+floor 0.5*n+1]+xo[-1+ceiling 0.5*n+1];
stats:stats,`statsName`statsValue!(`$"Median1";median1);
stats:stats,`statsName`statsValue!(`$"Median";median);
mode:where max[c]=c:count each group x;
stats:stats,`statsName`statsValue!(`$"Mode";mode);
dm:x-mu;
v1:sum[dm xexp 4]%n;
v2:variance*variance;
kurtosis:-3+v1%v2;
stats:stats,`statsName`statsValue!(`$"Kurtosis";kurtosis);
v1:sum[dm xexp 3]%n;
v2:variance xexp 1.5;
skewness:sqrt[n%n-1]*v1%v2;
stats:stats,`statsName`statsValue!(`$"Skewness";skewness);
minimum:first xo;
stats:stats,`statsName`statsValue!(`$"Minimum";minimum);
maximum:last xo;
stats:stats,`statsName`statsValue!(`$"Maximum";maximum);
range:maximum-minimum;
stats:stats,`statsName`statsValue!(`$"Range";range);
stats
};
summaryStats[data]