Skip to content

Histogram

image Source: adopted from here

Introduction

Histogram is a graphical display of numerical data using bars of different heights and it is an approximate representation of the distribution of data. The height of each bar shows how many data points fall into each range and you decide the ranges to use. This allows the inspection of the data for its underlying distribution (e.g. normal distribution), outliers, skewness, etc.

There are a couple of different guidelines on how to calculate the number of bins for a histogram. For a summary of different guidelines, please see Histogram on wiki page. Let's take a look at the Sturges' formula.

The number of bins k can be calculated from a suggested bin width w as:

k = \left\lceil \frac{max(x)-min(x)}{w} \right\rceil

By Sturges' formula, k can be calculated as:

k = \left\lceil log_2n \right\rceil + 1

where n is the total number of data points used to calculate the histogram.

Question

First, let's generate some random numbers:

genNormalNumber:{
  pi:acos -1;
  $[x=2*n:x div 2;
    raze sqrt[-2*log n?1f]*/:(sin;cos)@\:(2*pi)*n?1f;
    -1_.z.s 1+x
  ]
  };
data:asc genNormalNumber[10000];

The data generated above is a sorted list of random numbers.

Create a histogram using Sturges' formula. The output table should have three columns: the first column binIdx is the bin index, the second column binVal is the median value of all data points in each bin and the third column binCnt is the number of data points falling into each bin.

Answer

The suggested answer is as follows.

histogramSturges:{[data]
  // Calculate the total number of bins
  k:1+ceiling xlog[2;count data];

  // Find the min/max value of the list
  minVal:min data;
  maxVal:max data;

  // Find the lower bound of each bin interval
  bins:minVal+((maxVal-minVal)%k)*til 1+k;

  // Find the bin index of each data item
  binData:([] binIdx:bins binr data;data);

  // 1) Calculate the number of data points in each bin, and
  // 2) Compute the median value of all data points in each bin
  0!select binVal:med data,binCnt:count binIdx by binIdx from binData
  };
histogramSturges[data]

If you plot the above data, you will get something like below. This verifies that our normal random number generator works as expected.

image