2. Accessing data in Pandas#

NumPy used an efficent indexing syntax, similar to lists, to extract values. pandas’ greater complexity means it has more options to access data, and these require a bit of practice to master. There are three ways to get at data in a DataFrame:

  • Standard notation - acts as if the DataFrame was a dictionary with lists in it.

  • .iloc() method - the traditional NumPy indexing approach, accessing rows and columns with numbers and colons.

  • .loc() method - allows you to use labels as indexes - no numbers required!

# Demonstrate standard notation for accessing Pandas. How to look at the 'mpg' column? Just the last 5 rows!
display(mtcars['mpg'].tail())
27    30.4
28    15.8
29    19.7
30    15.0
31    21.4
Name: mpg, dtype: float64
# You can access two or more columns with a list of column names!
display(mtcars[['mpg', 'hp']].head())
mpg hp
0 21.0 110
1 21.0 110
2 22.8 93
3 21.4 110
4 18.7 175
# Manipulation works like NumPy
display(mtcars[['mpg', 'hp']].head() + 500)
mpg hp
0 521.0 610
1 521.0 610
2 522.8 593
3 521.4 610
4 518.7 675

2.1. Accessing with .iloc#

Accessing data in a DataFrame with iloc is as simple as knowing what rows and columns you want, and passing them to the iloc method - this is slightly unusual, as the method is an indexer - so you call it with square brackets, and not parentheses.

# Get all rows and first two columns
display(mtcars.iloc[:,:2].head())
model mpg
0 Mazda RX4 21.0
1 Mazda RX4 Wag 21.0
2 Datsun 710 22.8
3 Hornet 4 Drive 21.4
4 Hornet Sportabout 18.7
# Just like in NumPy, various types of subsetting is allowed
display(mtcars.iloc[[0, 1, 2], [3, 4, 7]])
disp hp qsec
0 160.0 110 16.46
1 160.0 110 17.02
2 108.0 93 18.61

2.2. Accessing with .loc#

.loc exists to provide label based indexing. So, rather than specifying numbers, you specify labels, but using the [row, column] approach we are familiar with.

# Get mtcars first two columns - notice how the stride is label based but ALSO INCLUSIVE - the final element IS included!!
display(mtcars.loc[:, 'model':'cyl'].head())

# What if you want specific rows? Easy with iloc, impossible with standard notation - see here the indexes are inclusive too, and they are 'numbers'
display(mtcars.loc[1:3, 'model':'cyl'])
model mpg cyl
0 Mazda RX4 21.0 6
1 Mazda RX4 Wag 21.0 6
2 Datsun 710 22.8 4
3 Hornet 4 Drive 21.4 6
4 Hornet Sportabout 18.7 8
model mpg cyl
1 Mazda RX4 Wag 21.0 6
2 Datsun 710 22.8 4
3 Hornet 4 Drive 21.4 6
# But bear in mind - the numbers in the index are only integers because THE INDEX CONTAINS INTEGERS. We can reassign them -
mtcars.index = ['row_' + str(ind) for ind in mtcars.index]
display(mtcars.loc[['row_4', 'row_10'], ['hp', 'drat']])
hp drat
row_4 175 3.15
row_10 123 3.92