module Statistics

Overview

Basic descriptive statistics functionality.

More flexible than a scientific-calculator, but not as exhaustive, yet.

Extended Modules

Statistics

Defined in:

lib/distributions.cr
statistics.cr

Constant Summary

VERSION = "1.0.1"

Instance Method Summary

#bin_count(values : Enumerable, bins : Int32, min = nil, max = nil, edge : Edge = :left, normed : Bool = false) : Bins
Counts the number of values in each bin of size (max - min) / bins.
#describe(values)
Computes several descriptive statistics of the passed array.
#frequency(values : Enumerable(T)) forall T
Computes the number of occurrences of each value in the dataset.
#kurtosis(values, corrected = false, excess = false)
Computes the kurtosis of a dataset.
#mean(values)
Computes the mean of a dataset.
#median(values, sorted = false)
Computes the median of all elements in a dataset.
#middle(a, b)
Computes the middle of two values a and b.
#middle(values)
Computes the middle of an array a, which consists of finding its extrema and then computing their mean.
#mode(values : Enumerable)
Computes the modal (most common) value in a dataset.
#moment(values, mean = nil, n = 1)
Calculates the n-th moment about the mean for a sample.
#quantile(values, p, sorted = false)
Computes the quantile of a dataset at a specified probability p on the interval [0,1].
#skew(values, corrected = false)
Computes the skewness of a dataset.
#std(values, mean = nil, corrected = false)
Computes the standard deviation of a dataset.
#var(values, mean = nil, corrected = false)
Computes the variance of a dataset.

Instance Method Detail

def bin_count(values : Enumerable, bins : Int32, min = nil, max = nil, edge : Edge = :left, normed : Bool = false) : Bins #

Counts the number of values in each bin of size (max - min) / bins.

Returns a Bins object where edges and counts are ordered by edge.

NOTE Any empty bin will also be included.

Parameters

values: a one-dimensional dataset.
bins: the number of equally-sized bins to divide the datapoints into.
min: the left end of the first bin's edge. If none is provided, then values.min is used.
max: the right end of the last bin's edge. If none is provided, then values.max is used.
edge: determines whether the left edge of the bin, its mid-point or right edge should be returned. Choices are :left, :centre and :right. Default is :left.
normed : bool, optional If False, the result will contain the number of samples in each bin. If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Note that the sum of the histogram values will not be equal to 1 unless bins of unity width are chosen; it is not a probability mass function.

def describe(values) #

Computes several descriptive statistics of the passed array.

Parameters

values: a one-dimensional dataset.

def frequency(values : Enumerable(T)) forall T #

Computes the number of occurrences of each value in the dataset.

Returns a Hash with each the dataset values as keys and the number of times they appear as value.

Parameters

values: a one-dimensional dataset.

def kurtosis(values, corrected = false, excess = false) #

Computes the kurtosis of a dataset.

Parameters

values: a one-dimensional dataset.
corrected: when set to true, then the calculations are corrected for statistical bias. Default is false.
excess: when set to true, computes the excess kurtosis. Default is false.

This implementation is based on the scipy/stats.py.

def mean(values) #

Computes the mean of a dataset.

Parameters

values: a one-dimensional dataset.

def median(values, sorted = false) #

Computes the median of all elements in a dataset.

For an even number of elements the mean of the two median elements will be computed.

Parameters

values: a one-dimensional dataset.
sorted: when true, the computations assume that the provided values are sorted. Default is false.

See Julia's Statistics.median.

def middle(a, b) #

Computes the middle of two values a and b.

def middle(values) #

Computes the middle of an array a, which consists of finding its extrema and then computing their mean.

Parameters

values: a one-dimensional dataset.

See Julia's Statistics.middle.

def mode(values : Enumerable) #

Computes the modal (most common) value in a dataset.

Returns a pair with the modal value and the bin-count for the modal bin. If there is more than one such value, no guarantees are made which one will be picked.

NOTE Computing the mode requires traversing the entire dataset.

Parameters

values: a one-dimensional dataset.

def moment(values, mean = nil, n = 1) #

Calculates the n-th moment about the mean for a sample.

Parameters

values: a one-dimensional dataset.
mean: a pre-computed mean. If a mean is not provided, then the sample's mean will be computed. Default is nil.
n: order of central moment that is returned. Default is 1.

def quantile(values, p, sorted = false) #

Computes the quantile of a dataset at a specified probability p on the interval [0,1].

Quantiles are computed via linear interpolation between the points ((k-1)/(n-1), v[k]), for k = 1:n where n = values.size.

Parameters

values: a one-dimensional dataset.
p: probability. Values of p should be in the interval [0, 1].
sorted indicates whether values can be assumed to be sorted.

Implementation based on Julia's Statistics.quantile.

def skew(values, corrected = false) #

Computes the skewness of a dataset.

Parameters

values: a one-dimensional dataset.
corrected: when set to true, then the calculations are corrected for statistical bias. Default is false.

This implementation is based on the scipy/stats.py.

def std(values, mean = nil, corrected = false) #

Computes the standard deviation of a dataset.

Parameters

values: a one-dimensional dataset.
mean: a pre-computed mean. This could be a pre-computed sample's mean or the population's known mean. If a mean is not provided, then the sample's mean will be computed. Default is nil.
corrected: when set to true, then the sum of squares is scaled with values.size - 1, rather than with values.size. Default is false.

def var(values, mean = nil, corrected = false) #

Computes the variance of a dataset.

Parameters

values: a one-dimensional dataset.
mean: a pre-computed mean. This could be a pre-computed sample's mean or the population's known mean. If a mean is not provided, then the sample's mean will be computed. Default is nil.
corrected: when set to true, then the sum of squares is scaled with values.size - 1, rather than with values.size. Default is false.