# module Statistics

## Overview

Basic descriptive statistics functionality.

More flexible than a scientific-calculator, but not as exhaustive, yet.

## Defined in:

lib/distributions.cr
statistics.cr

## Constant Summary

VERSION = `"1.0.0"`

## Instance Method Detail

def bin_count(values : Enumerable, bins : Int32, min = nil, max = nil, edge : Edge = :left) : Bins #

Counts the number of values in each bin of size `(max - min) / bins`.

Returns a `Bins` object where `edges` and `counts` are ordered by edge.

NOTE Any empty bin will also be included.

Parameters

• values: a one-dimensional dataset.
• bins: the number of equally-sized bins to divide the datapoints into.
• min: the left end of the first bin's edge. If none is provided, then `values.min` is used.
• max: the right end of the last bin's edge. If none is provided, then `values.max` is used.
• edge: determines whether the left edge of the bin, its mid-point or right edge should be returned. Choices are `:left`, `:centre` and `:right`. Default is `:left`.

def describe(values) #

Computes several descriptive statistics of the passed array.

Parameters

• values: a one-dimensional dataset.

def frequency(values : Enumerable(T)) forall T #

Computes the number of occurrences of each value in the dataset.

Returns a Hash with each the dataset values as keys and the number of times they appear as value.

Parameters

• values: a one-dimensional dataset.

def kurtosis(values, corrected = false, excess = false) #

Computes the kurtosis of a dataset.

Parameters

• values: a one-dimensional dataset.
• corrected: when set to `true`, then the calculations are corrected for statistical bias. Default is `false`.
• excess: when set to `true`, computes the excess kurtosis. Default is `false`.

This implementation is based on the scipy/stats.py.

def mean(values) #

Computes the mean of a dataset.

Parameters

• values: a one-dimensional dataset.

def median(values, sorted = false) #

Computes the median of all elements in a dataset.

For an even number of elements the mean of the two median elements will be computed.

Parameters

• values: a one-dimensional dataset.
• sorted: when `true`, the computations assume that the provided values are sorted. Default is `false`.

See Julia's Statistics.median.

def middle(a, b) #

Computes the middle of two values `a` and `b`.

def middle(values) #

Computes the middle of an array `a`, which consists of finding its extrema and then computing their mean.

Parameters

• values: a one-dimensional dataset.

See Julia's Statistics.middle.

def mode(values : Enumerable) #

Computes the modal (most common) value in a dataset.

Returns a pair with the modal value and the bin-count for the modal bin. If there is more than one such value, no guarantees are made which one will be picked.

NOTE Computing the mode requires traversing the entire dataset.

Parameters

• values: a one-dimensional dataset.

def moment(values, mean = nil, n = 1) #

Calculates the n-th moment about the mean for a sample.

Parameters

• values: a one-dimensional dataset.
• mean: a pre-computed mean. If a mean is not provided, then the sample's mean will be computed. Default is `nil`.
• n: order of central moment that is returned. Default is `1`.

def quantile(values, p, sorted = false) #

Computes the quantile of a dataset at a specified probability `p` on the interval [0,1].

Quantiles are computed via linear interpolation between the points `((k-1)/(n-1), v[k])`, for `k = 1:n` where `n = values.size`.

Parameters

• values: a one-dimensional dataset.
• p: probability. Values of `p` should be in the interval `[0, 1]`.
• sorted indicates whether values can be assumed to be sorted.

Implementation based on Julia's Statistics.quantile.

def skew(values, corrected = false) #

Computes the skewness of a dataset.

Parameters

• values: a one-dimensional dataset.
• corrected: when set to `true`, then the calculations are corrected for statistical bias. Default is `false`.

This implementation is based on the scipy/stats.py.

def std(values, mean = nil, corrected = false) #

Computes the standard deviation of a dataset.

Parameters

• values: a one-dimensional dataset.
• mean: a pre-computed mean. This could be a pre-computed sample's mean or the population's known mean. If a mean is not provided, then the sample's mean will be computed. Default is `nil`.
• corrected: when set to `true`, then the sum of squares is scaled with `values.size - 1`, rather than with `values.size`. Default is `false`.

def var(values, mean = nil, corrected = false) #

Computes the variance of a dataset.

Parameters

• values: a one-dimensional dataset.
• mean: a pre-computed mean. This could be a pre-computed sample's mean or the population's known mean. If a mean is not provided, then the sample's mean will be computed. Default is `nil`.
• corrected: when set to `true`, then the sum of squares is scaled with `values.size - 1`, rather than with `values.size`. Default is `false`.