Python:Data Analytics and Visualization
上QQ阅读APP看书,第一时间看更新

Indexing and selecting data

In this section, we will focus on how to get, set, or slice subsets of Pandas data structure objects. As we learned in previous sections, Series or DataFrame objects have axis labeling information. This information can be used to identify items that we want to select or assign a new value to in the object:

>>> s4[['024', '002']] # selecting data of Series object
024 NaN
002 Mary
dtype: object
>>> s4[['024', '002']] = 'unknown' # assigning data
>>> s4
024 unknown
065 NaN
002 unknown
001 Nam
dtype: object

If the data object is a DataFrame structure, we can also proceed in a similar way:

>>> df5[['b', 'c']]
 b c
0 1 2
1 4 5
2 7 8

For label indexing on the rows of DataFrame, we use the ix function that enables us to select a set of rows and columns in the object. There are two parameters that we need to specify: the row and column labels that we want to get. By default, if we do not specify the selected column names, the function will return selected rows with all columns in the object:

>>> df5.ix[0]
a 0
b 1
c 2
Name: 0, dtype: int64
>>> df5.ix[0, 1:3]
b 1
c 2
Name: 0, dtype: int64

Moreover, we have many ways to select and edit data contained in a Pandas object. We summarize these functions in the following table:

Tip

Pandas data objects may contain duplicate indices. In this case, when we get or set a data value via index label, it will affect all rows or columns that have the same selected index name.