# Histogram

A histogram is an approximate data visualization of the distribution of a numeric variable. The variable is cut into several bins, and the number of observations per bin is represented by the height of the bar. Histograms are used to study the distribution of one or a few variables. Checking the distribution of your variables one by one is probably the first task you should do when you get a new dataset.

{% hint style="info" %}
This functionality is provided by the **live-exploratory-viz** plugin. It is available on the [marketplace](https://marketplace.intelie.com/artifact/plugin-live-exploratory-viz/).
{% endhint %}

Is this plugin, there is an aggregation function for generate a histogram: `histogram(x, min, max, count)`, where the inputs are:

1. `x` - the numerical sequence to be binned
2. `min` - the minimum value to be binned
3. `max` - the maximum value to be binned
4. `count` - the number of bins

The histogram function returns an object with the following structure:

* There is an array of objects called bins, where each bin is an object, containing: &#x20;
  * bin number
  * minimum bin value
  * maximum bin value
  * count of values ​​existing in the bin.

And, outside the bins array, we have the following variables:

* totalCount: total count of values
* belowMinCount:  values ​​below the minimum existing in the input
* aboveMaxCount: values ​​above the maximum existing in the input
* invalidCount: count of invalid values

Example (return values will change according to the input):

```
HISTOGRAM_X_0_5_1_5_3:
{
  "bins": [
    {
      "bin": 0,
      "min": 0.5,
      "max": 0.8333333333333333,
      "count": 69
    },
    {
      "bin": 1,
      "min": 0.8333333333333333,
      "max": 1.1666666666666665,
      "count": 1291
    },
    {
      "bin": 2,
      "min": 1.1666666666666667,
      "max": 1.5,
      "count": 80
    }
  ],
  "totalCount": 1440,
  "belowMinCount": 0,
  "aboveMaxCount": 0,
  "invalidCount": 0
}
```

In this plugin, there is also a aggregation function for recommending the number of bins, called binscalculator, where the input is a numerical sequence. For aggregation function binscalculator, we used three different formulas, to recommend the number of bins to be used:

1. Scott's rule:

   *Freedman, David, and Persi Diaconis. "On the histogram as a density estimator: L 2 theory." Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 57.4 (1981): 453-476.*
2. Sturges' rule:\
   \&#xNAN;*Sturges, Herbert A. "The choice of a class interval." Journal of the American statistical association 21.153 (1926): 65-66.*
3. Freedman–Diaconis rule:

   *Scott, David W. "On optimal and data-based histograms." Biometrika 66.3 (1979): 605-610.*

There is also a histogram chart type available on "query based charts", more specifically on "pipes charts''.

#### Functional Tutorial

* We created the aggregation `binsCalculator` to suggest the number of bins to be used to generate the histogram. The input must be a numerical sequence.

  `=> normrandom(1, 0.1) as x every min` \
  `=> binsCalculator(x) over last 24 hours`

![binsCalculator return](/files/btnnoo5wwR423HDfISue)

* As output, binsCalculator will return three calculated values. The first value is the result of the Freedman-Diaconis’ formula, the second value is the result of the Sturges’ formula, and the third is the result of the Scott’ formula.

* After the result of the suggestion of the number of bins, we can now move to the main aggregation, which generates the pipes widget, called "histogram".  The input must be a numerical sequence, a minimum value, a maximum value and a bins quantity.\
  `=> normrandom(1, 0.1) as x every min` \
  `=> histogram(x, 0.5, 1.5, 18) over last 24 hours`

![Histogram with 18 bins](/files/eYYEjfDFXfavq1Ws2Kss)

`=> normrandom(1, 0.1) as x every min` \
`=> histogram(x, 0.5, 1.5, 10) over last 24 hours`

![Histogram with 10 bins](/files/hGV6ueI8D9xQznQFsuec)

`=> normrandom(1, 0.1) as x every min` \
`=> histogram(x, 0.5, 1.5, 14) over last 24 hours`

![Histogram with 14 bins](/files/uH0un2WaWJfYUOU6uiab)

* As mentioned before, the histogram chart type can handle a query that outputs a seq of bin object:\
  `=> normrandom(1, 0.1) as x every min` \
  `=> histogram(x, 0.5, 1.5, 3) over last 24 hours` \
  `=> _->bins`

![Histogram is generated normally only with the bins object](/files/3pzJ6uCTgtPaclbMLYHT)

* Or just the array with min and count of the bins:

  `=> normrandom(1, 0.1) as x every min`\
  `=> histogram(x, 0.5, 1.5, 3) over last 24 hours`\
  `=> _->bins |> newmap('min', _->min, 'count', _->count)`

![Histogram is generated normally just using the array with min and count](/files/0Cpbkvn71qIFPVN8ZN75)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://platform.intelie.com/data-visualization/pipes-widgets/histogram.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
