# Summary Statistics Using Numpy

Last updated: October 31st, 2019

# Summary statistics using NumPy¶

In descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate the largest amount of information as simply as possible. Statisticians commonly try to describe the observations in

• a measure of location, or central tendency, such as the arithmetic mean.
• a measure of statistical dispersion like the standard deviation.
• a measure of the shape of the distribution like skewness or kurtosis.
• if more than one variable is measured, a measure of statistical dependence such as a correlation coefficient.

NumPy has quite a few useful statistical functions for calculating sum, mean, standard deviation and variance, etc. from the given elements in the array.

## Hands on!¶

In [1]:
import sys
import numpy as np


### Summary statistics¶

In [2]:
a = np.array([1, 2, 3, 4])

In [3]:
a.shape

Out[3]:
(4,)
In [4]:
a.sum()

Out[4]:
10
In [5]:
a.mean()

Out[5]:
2.5
In [6]:
a.std()

Out[6]:
1.118033988749895
In [7]:
a.var()

Out[7]:
1.25
In [8]:
A = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
])

In [9]:
A.sum()

Out[9]:
45
In [10]:
A.mean()

Out[10]:
5.0
In [11]:
A.std()

Out[11]:
2.581988897471611
In [12]:
A.sum(axis=0)

Out[12]:
array([12, 15, 18])
In [13]:
A.sum(axis=1)

Out[13]:
array([ 6, 15, 24])
In [14]:
A.mean(axis=0)

Out[14]:
array([4., 5., 6.])
In [15]:
A.mean(axis=1)

Out[15]:
array([2., 5., 8.])
In [16]:
A.std(axis=0)

Out[16]:
array([2.44948974, 2.44948974, 2.44948974])
In [17]:
A.std(axis=1)

Out[17]:
array([0.81649658, 0.81649658, 0.81649658])

#### Cumulative sum of elements starting from 0¶

In [18]:
A

Out[18]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [19]:
A.cumsum()

Out[19]:
array([ 1,  3,  6, 10, 15, 21, 28, 36, 45])

#### Cumulative product of elements starting from 1¶

In [20]:
A.cumprod()

Out[20]:
array([     1,      2,      6,     24,    120,    720,   5040,  40320,
362880])