module Statistics
Overview
Basic descriptive statistics functionality.
More flexible than a scientificcalculator, but not as exhaustive, yet.
Extended Modules
Defined in:
lib/distributions.crstatistics.cr
Constant Summary

VERSION =
"1.0.0"
Instance Method Summary

#bin_count(values : Enumerable, bins : Int32, min = nil, max = nil, edge : Edge = :left) : Bins
Counts the number of values in each bin of size
(max  min) / bins
. 
#describe(values)
Computes several descriptive statistics of the passed array.

#frequency(values : Enumerable(T)) forall T
Computes the number of occurrences of each value in the dataset.

#kurtosis(values, corrected = false, excess = false)
Computes the kurtosis of a dataset.

#mean(values)
Computes the mean of a dataset.

#median(values, sorted = false)
Computes the median of all elements in a dataset.

#middle(a, b)
Computes the middle of two values
a
andb
. 
#middle(values)
Computes the middle of an array
a
, which consists of finding its extrema and then computing their mean. 
#mode(values : Enumerable)
Computes the modal (most common) value in a dataset.

#moment(values, mean = nil, n = 1)
Calculates the nth moment about the mean for a sample.

#quantile(values, p, sorted = false)
Computes the quantile of a dataset at a specified probability
p
on the interval [0,1]. 
#skew(values, corrected = false)
Computes the skewness of a dataset.

#std(values, mean = nil, corrected = false)
Computes the standard deviation of a dataset.

#var(values, mean = nil, corrected = false)
Computes the variance of a dataset.
Instance Method Detail
Counts the number of values in each bin of size (max  min) / bins
.
Returns a Bins
object where edges
and counts
are ordered by edge.
NOTE Any empty bin will also be included.
Parameters
 values: a onedimensional dataset.
 bins: the number of equallysized bins to divide the datapoints into.
 min: the left end of the first bin's edge. If none is provided, then
values.min
is used.  max: the right end of the last bin's edge. If none is provided, then
values.max
is used.  edge: determines whether the left edge of the bin, its midpoint or right edge should be returned.
Choices are
:left
,:centre
and:right
. Default is:left
.
Computes several descriptive statistics of the passed array.
Parameters
 values: a onedimensional dataset.
Computes the number of occurrences of each value in the dataset.
Returns a Hash with each the dataset values as keys and the number of times they appear as value.
Parameters
 values: a onedimensional dataset.
Computes the kurtosis of a dataset.
Parameters
 values: a onedimensional dataset.
 corrected: when set to
true
, then the calculations are corrected for statistical bias. Default isfalse
.  excess: when set to
true
, computes the excess kurtosis. Default isfalse
.
This implementation is based on the scipy/stats.py.
Computes the median of all elements in a dataset.
For an even number of elements the mean of the two median elements will be computed.
Parameters
 values: a onedimensional dataset.
 sorted: when
true
, the computations assume that the provided values are sorted. Default isfalse
.
See Julia's Statistics.median.
Computes the middle of an array a
, which consists of finding its
extrema and then computing their mean.
Parameters
 values: a onedimensional dataset.
See Julia's Statistics.middle.
Computes the modal (most common) value in a dataset.
Returns a pair with the modal value and the bincount for the modal bin. If there is more than one such value, no guarantees are made which one will be picked.
NOTE Computing the mode requires traversing the entire dataset.
Parameters
 values: a onedimensional dataset.
Calculates the nth moment about the mean for a sample.
Parameters
 values: a onedimensional dataset.
 mean: a precomputed mean. If a mean is not provided, then the sample's
mean will be computed. Default is
nil
.  n: order of central moment that is returned. Default is
1
.
Computes the quantile of a dataset at a specified probability p
on the interval [0,1].
Quantiles are computed via linear interpolation between the points ((k1)/(n1), v[k])
,
for k = 1:n
where n = values.size
.
Parameters
 values: a onedimensional dataset.
 p: probability. Values of
p
should be in the interval[0, 1]
.  sorted indicates whether values can be assumed to be sorted.
Implementation based on Julia's Statistics.quantile.
Computes the skewness of a dataset.
Parameters
 values: a onedimensional dataset.
 corrected: when set to
true
, then the calculations are corrected for statistical bias. Default isfalse
.
This implementation is based on the scipy/stats.py.
Computes the standard deviation of a dataset.
Parameters
 values: a onedimensional dataset.
 mean: a precomputed mean. This could be a precomputed sample's mean
or the population's known mean. If a mean is not provided, then the sample's
mean will be computed. Default is
nil
.  corrected: when set to
true
, then the sum of squares is scaled withvalues.size  1
, rather than withvalues.size
. Default isfalse
.
Computes the variance of a dataset.
Parameters
 values: a onedimensional dataset.
 mean: a precomputed mean. This could be a precomputed sample's mean
or the population's known mean. If a mean is not provided, then the sample's
mean will be computed. Default is
nil
.  corrected: when set to
true
, then the sum of squares is scaled withvalues.size  1
, rather than withvalues.size
. Default isfalse
.