Histogram
A histogram is an approximate data visualization of the distribution of a numeric variable. The variable is cut into several bins, and the number of observations per bin is represented by the height of the bar. Histograms are used to study the distribution of one or a few variables. Checking the distribution of your variables one by one is probably the first task you should do when you get a new dataset.
This functionality is provided by the live-exploratory-viz plugin. It is available on the marketplace.
Is this plugin, there is an aggregation function for generate a histogram: histogram(x, min, max, count)
, where the inputs are:
x
- the numerical sequence to be binnedmin
- the minimum value to be binnedmax
- the maximum value to be binnedcount
- the number of bins
The histogram function returns an object with the following structure:
There is an array of objects called bins, where each bin is an object, containing:
bin number
minimum bin value
maximum bin value
count of values existing in the bin.
And, outside the bins array, we have the following variables:
totalCount: total count of values
belowMinCount: values below the minimum existing in the input
aboveMaxCount: values above the maximum existing in the input
invalidCount: count of invalid values
Example (return values will change according to the input):
In this plugin, there is also a aggregation function for recommending the number of bins, called binscalculator, where the input is a numerical sequence. For aggregation function binscalculator, we used three different formulas, to recommend the number of bins to be used:
Scott's rule:
Freedman, David, and Persi Diaconis. "On the histogram as a density estimator: L 2 theory." Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 57.4 (1981): 453-476.
Sturges' rule: Sturges, Herbert A. "The choice of a class interval." Journal of the American statistical association 21.153 (1926): 65-66.
Freedman–Diaconis rule:
Scott, David W. "On optimal and data-based histograms." Biometrika 66.3 (1979): 605-610.
There is also a histogram chart type available on "query based charts", more specifically on "pipes charts''.
Functional Tutorial
We created the aggregation
binsCalculator
to suggest the number of bins to be used to generate the histogram. The input must be a numerical sequence.=> normrandom(1, 0.1) as x every min => binsCalculator(x) over last 24 hours
As output, binsCalculator will return three calculated values. The first value is the result of the Freedman-Diaconis’ formula, the second value is the result of the Sturges’ formula, and the third is the result of the Scott’ formula.
After the result of the suggestion of the number of bins, we can now move to the main aggregation, which generates the pipes widget, called "histogram". The input must be a numerical sequence, a minimum value, a maximum value and a bins quantity.
=> normrandom(1, 0.1) as x every min => histogram(x, 0.5, 1.5, 18) over last 24 hours
=> normrandom(1, 0.1) as x every min
=> histogram(x, 0.5, 1.5, 10) over last 24 hours
=> normrandom(1, 0.1) as x every min
=> histogram(x, 0.5, 1.5, 14) over last 24 hours
As mentioned before, the histogram chart type can handle a query that outputs a seq of bin object:
=> normrandom(1, 0.1) as x every min => histogram(x, 0.5, 1.5, 3) over last 24 hours => _->bins
Or just the array with min and count of the bins:
=> normrandom(1, 0.1) as x every min => histogram(x, 0.5, 1.5, 3) over last 24 hours => _->bins |> newmap('min', _->min, 'count', _->count)
Last updated