{
"cells": [
{
"cell_type": "markdown",
"id": "7c064e80-f169-469e-ab13-cd508234e6ee",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"source": [
"## Accessing data in Pandas\n",
"NumPy used an efficent indexing syntax, similar to lists, to extract values. pandas' greater complexity means it has more options to access data, and these require a bit of practice to master. There are three ways to get at data in a DataFrame:\n",
"\n",
"* Standard notation - acts as if the DataFrame was a dictionary with lists in it.\n",
"* `.iloc()` method - the traditional NumPy indexing approach, accessing rows and columns with numbers and colons.\n",
"* `.loc()` method - allows you to use labels as indexes - no numbers required!"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "0b047875-e24f-46d2-abda-f9eb219e3b0d",
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"tags": [
"remove-cell"
]
},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"mtcars = pd.read_csv('https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv')"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "534fb243-f5ca-4578-963c-963d70b30c8d",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"27 30.4\n",
"28 15.8\n",
"29 19.7\n",
"30 15.0\n",
"31 21.4\n",
"Name: mpg, dtype: float64"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Demonstrate standard notation for accessing Pandas. How to look at the 'mpg' column? Just the last 5 rows!\n",
"display(mtcars['mpg'].tail())"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "67f81f54-b8e5-4368-bf76-a3cf208f4423",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" mpg | \n",
" hp | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 21.0 | \n",
" 110 | \n",
"
\n",
" \n",
" 1 | \n",
" 21.0 | \n",
" 110 | \n",
"
\n",
" \n",
" 2 | \n",
" 22.8 | \n",
" 93 | \n",
"
\n",
" \n",
" 3 | \n",
" 21.4 | \n",
" 110 | \n",
"
\n",
" \n",
" 4 | \n",
" 18.7 | \n",
" 175 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" mpg hp\n",
"0 21.0 110\n",
"1 21.0 110\n",
"2 22.8 93\n",
"3 21.4 110\n",
"4 18.7 175"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# You can access two or more columns with a list of column names!\n",
"display(mtcars[['mpg', 'hp']].head())"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "78dc9c24-5104-4a65-8367-56acb7b1d6c3",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" mpg | \n",
" hp | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 521.0 | \n",
" 610 | \n",
"
\n",
" \n",
" 1 | \n",
" 521.0 | \n",
" 610 | \n",
"
\n",
" \n",
" 2 | \n",
" 522.8 | \n",
" 593 | \n",
"
\n",
" \n",
" 3 | \n",
" 521.4 | \n",
" 610 | \n",
"
\n",
" \n",
" 4 | \n",
" 518.7 | \n",
" 675 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" mpg hp\n",
"0 521.0 610\n",
"1 521.0 610\n",
"2 522.8 593\n",
"3 521.4 610\n",
"4 518.7 675"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Manipulation works like NumPy\n",
"display(mtcars[['mpg', 'hp']].head() + 500)"
]
},
{
"cell_type": "markdown",
"id": "6d456004-8438-446f-bd1f-8643ba4ad4d3",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Accessing with `.iloc`\n",
"Accessing data in a DataFrame with `iloc` is as simple as knowing what rows and columns you want, and passing them to the `iloc` method - this is slightly unusual, as the method is an indexer - so you call it with square brackets, and not parentheses."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "34a04f98-dee4-46ae-a8e6-a9af57c93bac",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" model | \n",
" mpg | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Mazda RX4 | \n",
" 21.0 | \n",
"
\n",
" \n",
" 1 | \n",
" Mazda RX4 Wag | \n",
" 21.0 | \n",
"
\n",
" \n",
" 2 | \n",
" Datsun 710 | \n",
" 22.8 | \n",
"
\n",
" \n",
" 3 | \n",
" Hornet 4 Drive | \n",
" 21.4 | \n",
"
\n",
" \n",
" 4 | \n",
" Hornet Sportabout | \n",
" 18.7 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" model mpg\n",
"0 Mazda RX4 21.0\n",
"1 Mazda RX4 Wag 21.0\n",
"2 Datsun 710 22.8\n",
"3 Hornet 4 Drive 21.4\n",
"4 Hornet Sportabout 18.7"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Get all rows and first two columns\n",
"display(mtcars.iloc[:,:2].head())"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "e95e692d-9b86-499a-ab42-f9dd146c8ea4",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" disp | \n",
" hp | \n",
" qsec | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 160.0 | \n",
" 110 | \n",
" 16.46 | \n",
"
\n",
" \n",
" 1 | \n",
" 160.0 | \n",
" 110 | \n",
" 17.02 | \n",
"
\n",
" \n",
" 2 | \n",
" 108.0 | \n",
" 93 | \n",
" 18.61 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" disp hp qsec\n",
"0 160.0 110 16.46\n",
"1 160.0 110 17.02\n",
"2 108.0 93 18.61"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Just like in NumPy, various types of subsetting is allowed\n",
"display(mtcars.iloc[[0, 1, 2], [3, 4, 7]])"
]
},
{
"cell_type": "markdown",
"id": "62bf8280-fb87-43ee-b177-6d5a009fe966",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Accessing with `.loc`\n",
"`.loc` exists to provide *label* based indexing. So, rather than specifying numbers, you specify labels, but using the **[row, column]** approach we are familiar with."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "d0607bf1-3be7-4b6c-8d41-4f3297c3d2a6",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" model | \n",
" mpg | \n",
" cyl | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Mazda RX4 | \n",
" 21.0 | \n",
" 6 | \n",
"
\n",
" \n",
" 1 | \n",
" Mazda RX4 Wag | \n",
" 21.0 | \n",
" 6 | \n",
"
\n",
" \n",
" 2 | \n",
" Datsun 710 | \n",
" 22.8 | \n",
" 4 | \n",
"
\n",
" \n",
" 3 | \n",
" Hornet 4 Drive | \n",
" 21.4 | \n",
" 6 | \n",
"
\n",
" \n",
" 4 | \n",
" Hornet Sportabout | \n",
" 18.7 | \n",
" 8 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" model mpg cyl\n",
"0 Mazda RX4 21.0 6\n",
"1 Mazda RX4 Wag 21.0 6\n",
"2 Datsun 710 22.8 4\n",
"3 Hornet 4 Drive 21.4 6\n",
"4 Hornet Sportabout 18.7 8"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" model | \n",
" mpg | \n",
" cyl | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" Mazda RX4 Wag | \n",
" 21.0 | \n",
" 6 | \n",
"
\n",
" \n",
" 2 | \n",
" Datsun 710 | \n",
" 22.8 | \n",
" 4 | \n",
"
\n",
" \n",
" 3 | \n",
" Hornet 4 Drive | \n",
" 21.4 | \n",
" 6 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" model mpg cyl\n",
"1 Mazda RX4 Wag 21.0 6\n",
"2 Datsun 710 22.8 4\n",
"3 Hornet 4 Drive 21.4 6"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Get mtcars first two columns - notice how the stride is label based but ALSO INCLUSIVE - the final element IS included!!\n",
"display(mtcars.loc[:, 'model':'cyl'].head())\n",
"\n",
"# What if you want specific rows? Easy with iloc, impossible with standard notation - see here the indexes are inclusive too, and they are 'numbers'\n",
"display(mtcars.loc[1:3, 'model':'cyl'])"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "4c769875-8b3e-4d04-a4a7-1e9559dc6056",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"# But bear in mind - the numbers in the index are only integers because THE INDEX CONTAINS INTEGERS. We can reassign them -\n",
"mtcars.index = ['row_' + str(ind) for ind in mtcars.index]"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "51d89891-bd2f-47bd-9f5a-0558b2e21a8f",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" hp | \n",
" drat | \n",
"
\n",
" \n",
" \n",
" \n",
" row_4 | \n",
" 175 | \n",
" 3.15 | \n",
"
\n",
" \n",
" row_10 | \n",
" 123 | \n",
" 3.92 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" hp drat\n",
"row_4 175 3.15\n",
"row_10 123 3.92"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"display(mtcars.loc[['row_4', 'row_10'], ['hp', 'drat']])"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}