Profile picture

Numpy Universal Functions

Last updated: October 31st, 20192019-10-31Project preview

rmotr


Numpy Universal functions (ufuncs)

In this lesson we'll learn about numpy "Universal Functions", or ufuncs. Ufuncs are functions that operate on arrays in an element-by-element basis, in a really efficient way.

In our previous lecture we introduced Vectorized Operations, which in turn, are one example of ufuncs. The important trait about ufuncs is that they're optimized internally using C code, which makes them REALLY fast and efficient.

We'll start by comparing the efficiency of regular "python loops" vs numpy vectorized operations (ufuncs).

purple-divider

Hands on!

Let's start with an example: given an array of numbers, we want to compare the reciprocal of each element. Our first approach will be with a regular Python for-loop (example taken from Data Science Handbook):

In [1]:
import numpy as np
In [2]:
def compute_reciprocals(values):
    output = np.empty(len(values))
    for i in range(len(values)):
        output[i] = 1.0 / values[i]
    return output
In [3]:
values = np.random.randint(1, 999, size=10)
values
Out[3]:
array([330, 674, 494, 866, 802, 994, 657, 795, 459, 634])
In [4]:
compute_reciprocals(values)
Out[4]:
array([0.0030303 , 0.00148368, 0.00202429, 0.00115473, 0.00124688,
       0.00100604, 0.00152207, 0.00125786, 0.00217865, 0.00157729])

The numpy, vectorized operation counterpart is a lot easier to write:

In [5]:
1 / values
Out[5]:
array([0.0030303 , 0.00148368, 0.00202429, 0.00115473, 0.00124688,
       0.00100604, 0.00152207, 0.00125786, 0.00217865, 0.00157729])

As you can see, it returns the same results. Numpy's vectorized operation version is a declarative one, compared to the for-loop based one, that is "imperative".

Now let's explore how much it takes to process a large array with our naive, loop based function:

In [6]:
big_array = np.random.randint(1, 999, size=1_000_000)

For loop version:

In [7]:
%time compute_reciprocals(big_array)
CPU times: user 1.78 s, sys: 7.05 ms, total: 1.79 s
Wall time: 1.83 s
Out[7]:
array([0.00103734, 0.0028169 , 0.00387597, ..., 0.00205761, 0.00107527,
       0.00232558])

Numpy vectorized operation version:

In [8]:
%time (1 / big_array)
CPU times: user 7.8 ms, sys: 0 ns, total: 7.8 ms
Wall time: 6.79 ms
Out[8]:
array([0.00103734, 0.0028169 , 0.00387597, ..., 0.00205761, 0.00107527,
       0.00232558])

The vectorized operation is a lot faster (about 30 times faster); that is because it's implemented using a numpy ufunc, which is internally optimized as a C operation.

green-divider

Understanding NumPy ufuncs

Technically speaking, universal functions, are instances of the numpy.ufunc class; many of which are implemented in C code.

They can be accessed from multiple interfaces; as we saw, you can use a regular operator with a ndarray (like array + 3), or you can use function invocation np.add:

In [9]:
%time np.divide(1, big_array)
CPU times: user 3.69 ms, sys: 4 ms, total: 7.69 ms
Wall time: 7.2 ms
Out[9]:
array([0.00103734, 0.0028169 , 0.00387597, ..., 0.00205761, 0.00107527,
       0.00232558])
In [10]:
%time (1 / big_array)
CPU times: user 3.18 ms, sys: 3.97 ms, total: 7.15 ms
Wall time: 6.78 ms
Out[10]:
array([0.00103734, 0.0028169 , 0.00387597, ..., 0.00205761, 0.00107527,
       0.00232558])

All the regular arithmetic operators applied to numpy arrays will be performed by ufuncs internally:

Operator ufunc Description
+ np.add Addition (e.g., 1 + 1 = 2)
- np.subtract Subtraction (e.g., 3 - 2 = 1)
- np.negative Unary negation (e.g., -2)
* np.multiply Multiplication (e.g., 2 * 3 = 6)
/ np.divide Division (e.g., 3 / 2 = 1.5)
// np.floor_divide Floor division (e.g., 3 // 2 = 1)
** np.power Exponentiation (e.g., 2 ** 3 = 8)
% np.mod Modulus/remainder (e.g., 9 % 4 = 1)
In [11]:
values + 10
Out[11]:
array([ 340,  684,  504,  876,  812, 1004,  667,  805,  469,  644])
In [12]:
np.add(values, 10)
Out[12]:
array([ 340,  684,  504,  876,  812, 1004,  667,  805,  469,  644])

green-divider

Other useful ufuncs

Aside from the regular operators described above, there are more ufuncs that are worth mentioning, for example:

Other basic arithmetic functions

In [13]:
np.abs(np.array([-5, -4, -3]))
Out[13]:
array([5, 4, 3])
In [14]:
values = np.arange(1, 6)
In [15]:
values
Out[15]:
array([1, 2, 3, 4, 5])
In [16]:
np.log(values)
Out[16]:
array([0.        , 0.69314718, 1.09861229, 1.38629436, 1.60943791])
In [17]:
np.log2(values)
Out[17]:
array([0.        , 1.        , 1.5849625 , 2.        , 2.32192809])
In [18]:
np.log10(values)
Out[18]:
array([0.        , 0.30103   , 0.47712125, 0.60205999, 0.69897   ])
In [19]:
np.exp(values)
Out[19]:
array([  2.71828183,   7.3890561 ,  20.08553692,  54.59815003,
       148.4131591 ])
In [20]:
np.exp2(values)
Out[20]:
array([ 2.,  4.,  8., 16., 32.])
In [21]:
np.power(3, values)
Out[21]:
array([  3,   9,  27,  81, 243])

Trigonometric functions

NumPy has standard trigonometric functions which return trigonometric ratios for a given angle in radians.

In [22]:
degrees = np.linspace(0, 360, 5)
degrees
Out[22]:
array([  0.,  90., 180., 270., 360.])

Convert degress to radians:

In [23]:
radians = np.multiply(degrees, np.pi/180)
radians
Out[23]:
array([0.        , 1.57079633, 3.14159265, 4.71238898, 6.28318531])

Now calculate trigonometric functions using that radians:

In [24]:
np.sin(radians)
Out[24]:
array([ 0.0000000e+00,  1.0000000e+00,  1.2246468e-16, -1.0000000e+00,
       -2.4492936e-16])
In [25]:
np.cos(radians)
Out[25]:
array([ 1.0000000e+00,  6.1232340e-17, -1.0000000e+00, -1.8369702e-16,
        1.0000000e+00])
In [26]:
np.tan(radians)
Out[26]:
array([ 0.00000000e+00,  1.63312394e+16, -1.22464680e-16,  5.44374645e+15,
       -2.44929360e-16])

purple-divider

Notebooks AI
Notebooks AI Profile20060