import pandas as pd

Series¶

obj = pd.Series([4, 7, -5, 3])

obj

0    4
1    7
2   -5
3    3
dtype: int64

obj.values

array([ 4,  7, -5,  3])

obj.index

RangeIndex(start=0, stop=4, step=1)

Pandas is an extension of numpy with semantic information.

We can create custom index data to indicate the meaning (or _semantics) of the values.
The index can be specified during construction.

obj2 = pd.Series([4, 7, -5, 3], index=['jack', 'jill', 'joe', 'albert'])

obj2

jack      4
jill      7
joe      -5
albert    3
dtype: int64

# extract values given an index?

obj2['jack']

4

obj2['albert']

3

# We can make use of generalized numpy indexing syntax to extract
# sub series

obj2[['jack', 'jill', 'joe']]

jack    4
jill    7
joe    -5
dtype: int64

# Construct boolean indexes based on a Numpy boolean array construction

obj2 > 0

jack       True
jill       True
joe       False
albert     True
dtype: bool

# We can use the boolean series to index the series.
# (similar to numpy's indexing with boolean arrays)
obj2[obj2 > 0]

jack      4
jill      7
albert    3
dtype: int64

We can construct a Series from a dictionary.

dict_data = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}

dict_data

{'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}

dict_data.keys()

dict_keys(['Ohio', 'Texas', 'Oregon', 'Utah'])

obj3 = pd.Series(dict_data)

obj3

Ohio      35000
Texas     71000
Oregon    16000
Utah       5000
dtype: int64

Note:

dict_data does not have California count.
What if we insist that California is part of the index?

states = ['California', 'Ohio', 'Oregon', 'Texas']
obj4 = pd.Series(dict_data, index=states)
obj4

California        NaN
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
dtype: float64

#
# Check which index entries have missing value
# using pd.isnull(...)
#
pd.isnull(obj4)

California     True
Ohio          False
Oregon        False
Texas         False
dtype: bool

pd.notnull(obj4)

California    False
Ohio           True
Oregon         True
Texas          True
dtype: bool

#
# Extract just the non-missing value entries
#
obj4[pd.notnull(obj4)]

Ohio      35000.0
Oregon    16000.0
Texas     71000.0
dtype: float64

obj4.isnull()

California     True
Ohio          False
Oregon        False
Texas         False
dtype: bool

obj4.notnull()

California    False
Ohio           True
Oregon         True
Texas          True
dtype: bool

obj3

Ohio      35000
Texas     71000
Oregon    16000
Utah       5000
dtype: int64

obj4

California        NaN
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
dtype: float64

obj3 + obj4

California         NaN
Ohio           70000.0
Oregon         32000.0
Texas         142000.0
Utah               NaN
dtype: float64

Pandas also allow us to update indexes after they are created.

obj

0    4
1    7
2   -5
3    3
dtype: int64

obj.index = ['Bob', 'Steve', 'Jeff', 'Ryan']

obj

Bob      4
Steve    7
Jeff    -5
Ryan     3
dtype: int64

obj.name = 'trading_income'

obj

Bob      4
Steve    7
Jeff    -5
Ryan     3
Name: trading_income, dtype: int64

obj.index.name = 'customer_names'

obj

customer_names
Bob      4
Steve    7
Jeff    -5
Ryan     3
Name: trading_income, dtype: int64

Dataframes¶

import pandas as pd

Construct a DataFrame object from a dictionary of values.

data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
       'year': [2000, 2001, 2002, 2001, 2002],
       'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
data

{'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
 'year': [2000, 2001, 2002, 2001, 2002],
 'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}

frame = pd.DataFrame(data)

frame

# Individual columns can be extracted
# as a Pandas series

frame['state']

0      Ohio
1      Ohio
2      Ohio
3    Nevada
4    Nevada
Name: state, dtype: object

# Multiple columns can be extracted
# as a Pandas DataFrame
frame[['state', 'year']]

Another way of constructing a DataFrame is from a collection of rows.

data = [['Ohio', 2000, 1.5],
       ['Ohio', 2001, 1.7],
       ['Ohio', 2002, 3.6],
       ['Nevada', 2001, 2.4],
       ['Nevada', 2002, 2.9]]

data

[['Ohio', 2000, 1.5],
 ['Ohio', 2001, 1.7],
 ['Ohio', 2002, 3.6],
 ['Nevada', 2001, 2.4],
 ['Nevada', 2002, 2.9]]

frame2 = pd.DataFrame(data, columns=['state', 'year', 'pop'])
frame2

frame

Index

Introduction to Series and DataFrame

About Series

Series

Series

Series¶

About Dataframe

DataFrame

DataFrame

Dataframes¶

	state	year	pop
0	Ohio	2000	1.5
1	Ohio	2001	1.7
2	Ohio	2002	3.6
3	Nevada	2001	2.4
4	Nevada	2002	2.9