import numpy as npData Analytics Features of NumPy ✅
1 Aggregation
In this section, we introduce the concept of aggregation, and cover a number of vectorized aggregation functions that come with the NumPy library.
1.1 Aggregation functions
x = np.arange(12).reshape(3, 4)
xarray([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
np.sum(x)66
np.sum(x, axis=0)array([12, 15, 18, 21])
np.sum(x, axis=1)array([ 6, 22, 38])
1.2 Other aggregations
x.sum(axis=1)array([ 6, 22, 38])
x.max()11
x.max(axis=1)array([ 3, 7, 11])
x.min(axis=1)array([0, 4, 8])
x.prod(axis=1)array([ 0, 840, 7920])
x.any(axis=1)array([ True, True, True])
x.all(axis=1)array([False, True, True])
1.3 Statistical metrics as aggregations
np.mean(x)5.5
np.median(x)5.5
np.percentile(x, 75)8.25
np.std(x)3.452052529534663
np.var(x)11.916666666666666
1.4 Argmin and Argmax
x = np.random.uniform(0, 1, (3,4)).round(2)
xarray([[0.97, 0.18, 0.87, 0.09],
[0.52, 0.92, 0.12, 0.46],
[0.48, 0.09, 0.85, 0.4 ]])
x.argmin(axis=0)array([2, 2, 1, 0])
x.argmin(axis=1)array([3, 2, 1])
x.argmin()3
np.argmax(x, axis=0)array([0, 1, 0, 1])
np.argmax(x, axis=1)array([0, 1, 2])
np.argmax(x)0
2 Selection using boolean arrays
To illustrate the concepts of boolean arrays and how to use them for selection, let’s consider an example.
Suppose we use the performance of 5 students over three different subjects:
Index Math CS Biology
Jack 0. 90 80 75
Jill 1. 93 89 87
Joe 2. 67 98. 88
Jason 3. 77. 89. 80
Jennifer 4. 83. 70. 95
grades = np.array([
[90, 80, 75],
[93, 95, 87],
[67, 98, 88],
[77, 89, 80],
[93, 97, 95],
])
names = np.array([
'Jack',
'Jill',
'Joe',
'Jason',
'Jennifer',
])A boolean array can be obtained using various logical python predicates that are overloaded by Numpy.
==equality<,>,<=,>=np.logical_not&and|
# here are the math grades
grades[:, 0]array([90, 93, 67, 77, 83])
# boolean mask of who got A+ in math
grades[:, 0] >= 90array([ True, True, False, False, False])
# boolean mask can be used as a selection index
names[grades[:, 0] >= 90]array(['Jack', 'Jill'], dtype='<U8')
We can use the logical predicates to express more complex selection conditions.
# boolean mask for A+ in math and CS.
(grades[:, 0] >= 90) & (grades[:, 1] >= 90)array([False, True, False, False, True])
names[(grades[:, 0] >= 90) & (grades[:, 1] >= 90)]array(['Jill', 'Jennifer'], dtype='<U8')
# boolean mask for A+ in math and CS, but not in biology
(grades[:, 0] >= 90) & (grades[:, 1] >= 90) & np.logical_not(grades[:, 2]>= 90)array([False, True, False, False, False])
names[
(grades[:, 0] >= 90) &
(grades[:, 1] >= 90) &
np.logical_not(grades[:, 2]>= 90)
]array(['Jill'], dtype='<U8')