Data Analytics Features of NumPy ✅

import numpy as np

1 Aggregation

In this section, we introduce the concept of aggregation, and cover a number of vectorized aggregation functions that come with the NumPy library.

1.1 Aggregation functions

x = np.arange(12).reshape(3, 4)
x
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
np.sum(x)
66
np.sum(x, axis=0)
array([12, 15, 18, 21])
np.sum(x, axis=1)
array([ 6, 22, 38])

1.2 Other aggregations

x.sum(axis=1)
array([ 6, 22, 38])
x.max()
11
x.max(axis=1)
array([ 3,  7, 11])
x.min(axis=1)
array([0, 4, 8])
x.prod(axis=1)
array([   0,  840, 7920])
x.any(axis=1)
array([ True,  True,  True])
x.all(axis=1)
array([False,  True,  True])

1.3 Statistical metrics as aggregations

np.mean(x)
5.5
np.median(x)
5.5
np.percentile(x, 75)
8.25
np.std(x)
3.452052529534663
np.var(x)
11.916666666666666

1.4 Argmin and Argmax

x = np.random.uniform(0, 1, (3,4)).round(2)
x
array([[0.97, 0.18, 0.87, 0.09],
       [0.52, 0.92, 0.12, 0.46],
       [0.48, 0.09, 0.85, 0.4 ]])
x.argmin(axis=0)
array([2, 2, 1, 0])
x.argmin(axis=1)
array([3, 2, 1])
x.argmin()
3
np.argmax(x, axis=0)
array([0, 1, 0, 1])
np.argmax(x, axis=1)
array([0, 1, 2])
np.argmax(x)
0

2 Selection using boolean arrays

To illustrate the concepts of boolean arrays and how to use them for selection, let’s consider an example.

Suppose we use the performance of 5 students over three different subjects:

            Index   Math    CS    Biology
Jack            0.    90    80     75
Jill            1.    93    89     87
Joe             2.    67    98.    88
Jason           3.    77.   89.    80
Jennifer        4.    83.   70.    95
grades = np.array([
    [90, 80, 75],
    [93, 95, 87],
    [67, 98, 88],
    [77, 89, 80],
    [93, 97, 95],
])

names = np.array([
    'Jack',
    'Jill',
    'Joe',
    'Jason',
    'Jennifer',
])

A boolean array can be obtained using various logical python predicates that are overloaded by Numpy.

  • == equality
  • <, >, <=, >=
  • np.logical_not
  • & and |
# here are the math grades

grades[:, 0]
array([90, 93, 67, 77, 83])
# boolean mask of who got A+ in math

grades[:, 0] >= 90
array([ True,  True, False, False, False])
# boolean mask can be used as a selection index

names[grades[:, 0] >= 90]
array(['Jack', 'Jill'], dtype='<U8')

We can use the logical predicates to express more complex selection conditions.

# boolean mask for A+ in math and CS.

(grades[:, 0] >= 90) & (grades[:, 1] >= 90)
array([False,  True, False, False,  True])
names[(grades[:, 0] >= 90) & (grades[:, 1] >= 90)]
array(['Jill', 'Jennifer'], dtype='<U8')
# boolean mask for A+ in math and CS, but not in biology

(grades[:, 0] >= 90) & (grades[:, 1] >= 90) & np.logical_not(grades[:, 2]>= 90)
array([False,  True, False, False, False])
names[
    (grades[:, 0] >= 90) &
    (grades[:, 1] >= 90) & 
    np.logical_not(grades[:, 2]>= 90)
]
array(['Jill'], dtype='<U8')