Skip to content

Summary Statistics

image Source: adopted from here

Introduction

Summary statistics is common in all statistical package in most programming language. In Microsoft Excel, the following summary statistics are provided. Given a list of numerical values [x_1, x_2, \cdots, x_N] and assuming no null values in this list,

  • Mean $$ \mu = \frac{1}{N} \sum_{i=1}^{N} x_i = \bar{x} $$
  • Standard Error $$ s_{\bar{x}} = \frac{\sqrt{\frac{1}{N-1} \sum_{i=1}^N (x-\bar{x})^2}}{\sqrt{N}} = \frac{s}{\sqrt{N}} $$
  • Median: Assuming that sorting the data ascendingly gives [x^o_1, x^o_2, \cdots, x^o_N], then $$ x_{med} = \frac{1}{2}( x_{\left\lfloor \frac{N+1}{2} \right\rfloor}^{o} + x_{\left\lceil \frac{N+1}{2} \right\rceil}^{o} ) $$ where \lfloor \cdot \rfloor and \lceil \cdot \rceil denote the floor and ceiling functions, respectively.
  • Mode: the most frequent value(s) in the list
  • Standard Deviation $$ s = \sqrt{\frac{1}{N-1} \sum_{i=1}^N (x-\bar{x})^2} $$
  • Sample Variance $$ s^2 = \frac{1}{N-1} \sum_{i=1}^N (x-\bar{x})^2 $$
  • Kurtosis: The sample kurtosis is defined as: $$ K = \frac{\frac{1}{N} \sum_{i=1}^N (x-\bar{x})^4}{\left[\frac{1}{N-1} \sum_{i=1}^N (x-\bar{x})^2 \right]^2} - 3 $$
  • Skewness: The sample skewness is defined as: $$ \frac{\sqrt{N(N-1)}}{N-1} \cdot \frac{\frac{1}{N} \sum_{i=1}^N (x-\bar{x})^3}{\left[\frac{1}{N-1} \sum_{i=1}^N (x-\bar{x})^2 \right]^{\frac{3}{2}}} $$
  • Range $$ r = max(x_1, x_2, \cdots, x_N) - min(x_1, x_2, \cdots, x_N) = x_{max} - x_{min} $$
  • Minimum $$ x_{min} = min(x_1, x_2, \cdots, x_N) $$
  • Maximum $$ x_{max} = max(x_1, x_2, \cdots, x_N) $$
  • Sum $$ x_{sum} = \sum_{i=1}^N x_i $$
  • Count: The number of data points in the list.

Question

Let's replicate the summary statistics reported by Excel. Write a function summaryStats[data] to calculate the summary statistics of a given list of numbers.

You can use the following code snippet to generate a randome list of numbers:

system "S -314159";
data:10000?til 5000;

The output of the summary statistics might look like:

statsName statsValue
Count 10000
Mean 2478.983
Sample Variance 2087419
Standard Deviation 1444.79
Standard Error 14.4479
Median 2464

Answer

The suggested answer is as follows.

summaryStats:{
  stats:([] statsName:"s"$();statsValue:"f"$());

  n:count x;
  stats:stats,`statsName`statsValue!(`Count;n);

  mu:sum[x]%n;
  stats:stats,`statsName`statsValue!(`Mean;mu);

  variance:sum[dm*dm:x-mu]%n-1; / or s*s:sdev x
  stats:stats,`statsName`statsValue!(`$"Sample Variance";variance);

  sd:sqrt variance;
  stats:stats,`statsName`statsValue!(`$"Standard Deviation";sd);

  se:sd%sqrt n;
  stats:stats,`statsName`statsValue!(`$"Standard Error";se);

  xo:asc x;
  median1:med x;
  median:0.5*xo[-1+floor 0.5*n+1]+xo[-1+ceiling 0.5*n+1];
  stats:stats,`statsName`statsValue!(`$"Median1";median1);
  stats:stats,`statsName`statsValue!(`$"Median";median);

  mode:where max[c]=c:count each group x;
  stats:stats,`statsName`statsValue!(`$"Mode";mode);

  dm:x-mu;
  v1:sum[dm xexp 4]%n;
  v2:variance*variance;
  kurtosis:-3+v1%v2;
  stats:stats,`statsName`statsValue!(`$"Kurtosis";kurtosis);

  v1:sum[dm xexp 3]%n;
  v2:variance xexp 1.5;
  skewness:sqrt[n%n-1]*v1%v2;
  stats:stats,`statsName`statsValue!(`$"Skewness";skewness);

  minimum:first xo;
  stats:stats,`statsName`statsValue!(`$"Minimum";minimum);

  maximum:last xo;
  stats:stats,`statsName`statsValue!(`$"Maximum";maximum);

  range:maximum-minimum;
  stats:stats,`statsName`statsValue!(`$"Range";range);

  stats
  };

summaryStats[data]