module Statistics

Overview

Basic descriptive statistics functionality.

More flexible than a scientific-calculator, but not as exhaustive, yet.

Extended Modules

Defined in:

lib/distributions.cr
statistics.cr

Constant Summary

VERSION = "1.0.1"

Instance Method Summary

Instance Method Detail

def bin_count(values : Enumerable, bins : Int32, min = nil, max = nil, edge : Edge = :left, normed : Bool = false) : Bins #

Counts the number of values in each bin of size (max - min) / bins.

Returns a Bins object where edges and counts are ordered by edge.

NOTE Any empty bin will also be included.

Parameters

  • values: a one-dimensional dataset.
  • bins: the number of equally-sized bins to divide the datapoints into.
  • min: the left end of the first bin's edge. If none is provided, then values.min is used.
  • max: the right end of the last bin's edge. If none is provided, then values.max is used.
  • edge: determines whether the left edge of the bin, its mid-point or right edge should be returned. Choices are :left, :centre and :right. Default is :left.
  • normed : bool, optional If False, the result will contain the number of samples in each bin. If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Note that the sum of the histogram values will not be equal to 1 unless bins of unity width are chosen; it is not a probability mass function.

def describe(values) #

Computes several descriptive statistics of the passed array.

Parameters

  • values: a one-dimensional dataset.

def frequency(values : Enumerable(T)) forall T #

Computes the number of occurrences of each value in the dataset.

Returns a Hash with each the dataset values as keys and the number of times they appear as value.

Parameters

  • values: a one-dimensional dataset.

def kurtosis(values, corrected = false, excess = false) #

Computes the kurtosis of a dataset.

Parameters

  • values: a one-dimensional dataset.
  • corrected: when set to true, then the calculations are corrected for statistical bias. Default is false.
  • excess: when set to true, computes the excess kurtosis. Default is false.

This implementation is based on the scipy/stats.py.


def mean(values) #

Computes the mean of a dataset.

Parameters

  • values: a one-dimensional dataset.

def median(values, sorted = false) #

Computes the median of all elements in a dataset.

For an even number of elements the mean of the two median elements will be computed.

Parameters

  • values: a one-dimensional dataset.
  • sorted: when true, the computations assume that the provided values are sorted. Default is false.

See Julia's Statistics.median.


def middle(a, b) #

Computes the middle of two values a and b.


def middle(values) #

Computes the middle of an array a, which consists of finding its extrema and then computing their mean.

Parameters

  • values: a one-dimensional dataset.

See Julia's Statistics.middle.


def mode(values : Enumerable) #

Computes the modal (most common) value in a dataset.

Returns a pair with the modal value and the bin-count for the modal bin. If there is more than one such value, no guarantees are made which one will be picked.

NOTE Computing the mode requires traversing the entire dataset.

Parameters

  • values: a one-dimensional dataset.

def moment(values, mean = nil, n = 1) #

Calculates the n-th moment about the mean for a sample.

Parameters

  • values: a one-dimensional dataset.
  • mean: a pre-computed mean. If a mean is not provided, then the sample's mean will be computed. Default is nil.
  • n: order of central moment that is returned. Default is 1.

def quantile(values, p, sorted = false) #

Computes the quantile of a dataset at a specified probability p on the interval [0,1].

Quantiles are computed via linear interpolation between the points ((k-1)/(n-1), v[k]), for k = 1:n where n = values.size.

Parameters

  • values: a one-dimensional dataset.
  • p: probability. Values of p should be in the interval [0, 1].
  • sorted indicates whether values can be assumed to be sorted.

Implementation based on Julia's Statistics.quantile.


def skew(values, corrected = false) #

Computes the skewness of a dataset.

Parameters

  • values: a one-dimensional dataset.
  • corrected: when set to true, then the calculations are corrected for statistical bias. Default is false.

This implementation is based on the scipy/stats.py.


def std(values, mean = nil, corrected = false) #

Computes the standard deviation of a dataset.

Parameters

  • values: a one-dimensional dataset.
  • mean: a pre-computed mean. This could be a pre-computed sample's mean or the population's known mean. If a mean is not provided, then the sample's mean will be computed. Default is nil.
  • corrected: when set to true, then the sum of squares is scaled with values.size - 1, rather than with values.size. Default is false.

def var(values, mean = nil, corrected = false) #

Computes the variance of a dataset.

Parameters

  • values: a one-dimensional dataset.
  • mean: a pre-computed mean. This could be a pre-computed sample's mean or the population's known mean. If a mean is not provided, then the sample's mean will be computed. Default is nil.
  • corrected: when set to true, then the sum of squares is scaled with values.size - 1, rather than with values.size. Default is false.