module Statistics
Overview
Basic descriptive statistics functionality.
More flexible than a scientific-calculator, but not as exhaustive, yet.
Extended Modules
Defined in:
lib/distributions.crstatistics.cr
Constant Summary
-
VERSION =
"1.0.1"
Instance Method Summary
-
#bin_count(values : Enumerable, bins : Int32, min = nil, max = nil, edge : Edge = :left, normed : Bool = false) : Bins
Counts the number of values in each bin of size
(max - min) / bins
. -
#describe(values)
Computes several descriptive statistics of the passed array.
-
#frequency(values : Enumerable(T)) forall T
Computes the number of occurrences of each value in the dataset.
-
#kurtosis(values, corrected = false, excess = false)
Computes the kurtosis of a dataset.
-
#mean(values)
Computes the mean of a dataset.
-
#median(values, sorted = false)
Computes the median of all elements in a dataset.
-
#middle(a, b)
Computes the middle of two values
a
andb
. -
#middle(values)
Computes the middle of an array
a
, which consists of finding its extrema and then computing their mean. -
#mode(values : Enumerable)
Computes the modal (most common) value in a dataset.
-
#moment(values, mean = nil, n = 1)
Calculates the n-th moment about the mean for a sample.
-
#quantile(values, p, sorted = false)
Computes the quantile of a dataset at a specified probability
p
on the interval [0,1]. -
#skew(values, corrected = false)
Computes the skewness of a dataset.
-
#std(values, mean = nil, corrected = false)
Computes the standard deviation of a dataset.
-
#var(values, mean = nil, corrected = false)
Computes the variance of a dataset.
Instance Method Detail
Counts the number of values in each bin of size (max - min) / bins
.
Returns a Bins
object where edges
and counts
are ordered by edge.
NOTE Any empty bin will also be included.
Parameters
- values: a one-dimensional dataset.
- bins: the number of equally-sized bins to divide the datapoints into.
- min: the left end of the first bin's edge. If none is provided, then
values.min
is used. - max: the right end of the last bin's edge. If none is provided, then
values.max
is used. - edge: determines whether the left edge of the bin, its mid-point or right edge should be returned.
Choices are
:left
,:centre
and:right
. Default is:left
. - normed : bool, optional If False, the result will contain the number of samples in each bin. If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Note that the sum of the histogram values will not be equal to 1 unless bins of unity width are chosen; it is not a probability mass function.
Computes several descriptive statistics of the passed array.
Parameters
- values: a one-dimensional dataset.
Computes the number of occurrences of each value in the dataset.
Returns a Hash with each the dataset values as keys and the number of times they appear as value.
Parameters
- values: a one-dimensional dataset.
Computes the kurtosis of a dataset.
Parameters
- values: a one-dimensional dataset.
- corrected: when set to
true
, then the calculations are corrected for statistical bias. Default isfalse
. - excess: when set to
true
, computes the excess kurtosis. Default isfalse
.
This implementation is based on the scipy/stats.py.
Computes the median of all elements in a dataset.
For an even number of elements the mean of the two median elements will be computed.
Parameters
- values: a one-dimensional dataset.
- sorted: when
true
, the computations assume that the provided values are sorted. Default isfalse
.
See Julia's Statistics.median.
Computes the middle of an array a
, which consists of finding its
extrema and then computing their mean.
Parameters
- values: a one-dimensional dataset.
See Julia's Statistics.middle.
Computes the modal (most common) value in a dataset.
Returns a pair with the modal value and the bin-count for the modal bin. If there is more than one such value, no guarantees are made which one will be picked.
NOTE Computing the mode requires traversing the entire dataset.
Parameters
- values: a one-dimensional dataset.
Calculates the n-th moment about the mean for a sample.
Parameters
- values: a one-dimensional dataset.
- mean: a pre-computed mean. If a mean is not provided, then the sample's
mean will be computed. Default is
nil
. - n: order of central moment that is returned. Default is
1
.
Computes the quantile of a dataset at a specified probability p
on the interval [0,1].
Quantiles are computed via linear interpolation between the points ((k-1)/(n-1), v[k])
,
for k = 1:n
where n = values.size
.
Parameters
- values: a one-dimensional dataset.
- p: probability. Values of
p
should be in the interval[0, 1]
. - sorted indicates whether values can be assumed to be sorted.
Implementation based on Julia's Statistics.quantile.
Computes the skewness of a dataset.
Parameters
- values: a one-dimensional dataset.
- corrected: when set to
true
, then the calculations are corrected for statistical bias. Default isfalse
.
This implementation is based on the scipy/stats.py.
Computes the standard deviation of a dataset.
Parameters
- values: a one-dimensional dataset.
- mean: a pre-computed mean. This could be a pre-computed sample's mean
or the population's known mean. If a mean is not provided, then the sample's
mean will be computed. Default is
nil
. - corrected: when set to
true
, then the sum of squares is scaled withvalues.size - 1
, rather than withvalues.size
. Default isfalse
.
Computes the variance of a dataset.
Parameters
- values: a one-dimensional dataset.
- mean: a pre-computed mean. This could be a pre-computed sample's mean
or the population's known mean. If a mean is not provided, then the sample's
mean will be computed. Default is
nil
. - corrected: when set to
true
, then the sum of squares is scaled withvalues.size - 1
, rather than withvalues.size
. Default isfalse
.