Contents

Accessing data in Pandas

Contents

2. Accessing data in Pandas#

NumPy used an efficent indexing syntax, similar to lists, to extract values. pandas’ greater complexity means it has more options to access data, and these require a bit of practice to master. There are three ways to get at data in a DataFrame:

Standard notation - acts as if the DataFrame was a dictionary with lists in it.
.iloc() method - the traditional NumPy indexing approach, accessing rows and columns with numbers and colons.
.loc() method - allows you to use labels as indexes - no numbers required!

# Demonstrate standard notation for accessing Pandas. How to look at the 'mpg' column? Just the last 5 rows!
display(mtcars['mpg'].tail())

  30.4
  15.8
  19.7
  15.0
  21.4
Name: mpg, dtype: float64

# You can access two or more columns with a list of column names!
display(mtcars[['mpg', 'hp']].head())

	mpg	hp
0	21.0	110
1	21.0	110
2	22.8	93
3	21.4	110
4	18.7	175

# Manipulation works like NumPy
display(mtcars[['mpg', 'hp']].head() + 500)

	mpg	hp
0	521.0	610
1	521.0	610
2	522.8	593
3	521.4	610
4	518.7	675

2.1. Accessing with `.iloc`#

Accessing data in a DataFrame with iloc is as simple as knowing what rows and columns you want, and passing them to the iloc method - this is slightly unusual, as the method is an indexer - so you call it with square brackets, and not parentheses.

# Get all rows and first two columns
display(mtcars.iloc[:,:2].head())

	model	mpg
0	Mazda RX4	21.0
1	Mazda RX4 Wag	21.0
2	Datsun 710	22.8
3	Hornet 4 Drive	21.4
4	Hornet Sportabout	18.7

# Just like in NumPy, various types of subsetting is allowed
display(mtcars.iloc[[0, 1, 2], [3, 4, 7]])

	disp	hp	qsec
0	160.0	110	16.46
1	160.0	110	17.02
2	108.0	93	18.61

2.2. Accessing with `.loc`#

.loc exists to provide label based indexing. So, rather than specifying numbers, you specify labels, but using the [row, column] approach we are familiar with.

# Get mtcars first two columns - notice how the stride is label based but ALSO INCLUSIVE - the final element IS included!!
display(mtcars.loc[:, 'model':'cyl'].head())

# What if you want specific rows? Easy with iloc, impossible with standard notation - see here the indexes are inclusive too, and they are 'numbers'
display(mtcars.loc[1:3, 'model':'cyl'])

	model	mpg	cyl
0	Mazda RX4	21.0	6
1	Mazda RX4 Wag	21.0	6
2	Datsun 710	22.8	4
3	Hornet 4 Drive	21.4	6
4	Hornet Sportabout	18.7	8

	model	mpg	cyl
1	Mazda RX4 Wag	21.0	6
2	Datsun 710	22.8	4
3	Hornet 4 Drive	21.4	6

# But bear in mind - the numbers in the index are only integers because THE INDEX CONTAINS INTEGERS. We can reassign them -
mtcars.index = ['row_' + str(ind) for ind in mtcars.index]

display(mtcars.loc[['row_4', 'row_10'], ['hp', 'drat']])

	hp	drat
row_4	175	3.15
row_10	123	3.92