{:check ["true"], :draft ["true"], :rank ["about_series" "Series" "about_dataframe" "DataFrame"]}
Series
is a data structure provided by Pandas.import pandas as pd
obj = pd.Series([4, 7, -5, 3])
obj
obj.values
obj.index
Pandas is an extension of numpy with semantic information.
obj2 = pd.Series([4, 7, -5, 3], index=['jack', 'jill', 'joe', 'albert'])
obj2
# extract values given an index?
obj2['jack']
obj2['albert']
# We can make use of generalized numpy indexing syntax to extract
# sub series
obj2[['jack', 'jill', 'joe']]
# Construct boolean indexes based on a Numpy boolean array construction
obj2 > 0
# We can use the boolean series to index the series.
# (similar to numpy's indexing with boolean arrays)
obj2[obj2 > 0]
We can construct a Series from a dictionary.
dict_data = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}
dict_data
dict_data.keys()
obj3 = pd.Series(dict_data)
obj3
Note:
dict_data
does not have California
count.California
is part of the index?states = ['California', 'Ohio', 'Oregon', 'Texas']
obj4 = pd.Series(dict_data, index=states)
obj4
#
# Check which index entries have missing value
# using pd.isnull(...)
#
pd.isnull(obj4)
pd.notnull(obj4)
#
# Extract just the non-missing value entries
#
obj4[pd.notnull(obj4)]
obj4.isnull()
obj4.notnull()
obj3
obj4
obj3 + obj4
Pandas also allow us to update indexes after they are created.
obj
obj.index = ['Bob', 'Steve', 'Jeff', 'Ryan']
obj
obj.name = 'trading_income'
obj
obj.index.name = 'customer_names'
obj
import pandas as pd
Construct a DataFrame object from a dictionary of values.
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
'year': [2000, 2001, 2002, 2001, 2002],
'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
data
frame = pd.DataFrame(data)
frame
# Individual columns can be extracted
# as a Pandas series
frame['state']
# Multiple columns can be extracted
# as a Pandas DataFrame
frame[['state', 'year']]
Another way of constructing a DataFrame is from a collection of rows.
data = [['Ohio', 2000, 1.5],
['Ohio', 2001, 1.7],
['Ohio', 2002, 3.6],
['Nevada', 2001, 2.4],
['Nevada', 2002, 2.9]]
data
frame2 = pd.DataFrame(data, columns=['state', 'year', 'pop'])
frame2
frame