{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "tags": [
     "remove-cell"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "mtcars = pd.read_csv('https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## The power of `groupby` and aggregation\n",
    "So far, pandas probably seems like a more user friendly NumPy. However, it allows much greater flexibility that NumPy does not.\n",
    "\n",
    "A common operation in psychology is to examine how some measure varies between or across groups. For example, if we measure depression and want to see how it differs between men and women, we will need to average depression scores separately for men and women. You're experienced enough to know many psychology experiments have many more complex designs - one score under different levels of different variables. \n",
    "\n",
    "How could you calculate those means and standard deviations in your raw data? You could use very complex subsettings for this in an array. But Pandas has a method which is the forefront of the **split-apply-combine** approach. \n",
    "\n",
    "This can be confusing, but very powerful - its worth learning in detail."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "![split_apply](sac.png)\n",
    "- From [The Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/03.08-aggregation-and-grouping.html)\n",
    "\n",
    "The **split-apply-combine** approach works by taking a set of data, and subsetting it (i.e., split) into sub-groups where the grouping variable of choice is constant.\n",
    "\n",
    "Then, the desired function is applied - this could be the mean, standard deviation, or some other more complex function. \n",
    "\n",
    "Finally, the newly calculated data is combined back into a DataFrame that looks similar in appearance to the original. \n",
    "\n",
    "Sounds like a lot of work - but pandas makes this easy using the `.groupby()` method."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "`.groupby()` is called on a DataFrame, and takes a keyword, `by`. This tells the DataFrame to split by the unique values in this variable. Its worth noting Pandas won't return anything at this point - just that it has rearranged your data in an efficient way and is ready to apply some functions.\n",
    "\n",
    "A simple example - compute the mean values of all variables in the `mtcars` dataset, differing between *automatic or manual* cars - stored in the `am` variable (has 0 or 1)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<pandas.core.groupby.generic.DataFrameGroupBy object at 0x110127850>\n"
     ]
    }
   ],
   "source": [
    "# Demonstrate groupby\n",
    "grouped = mtcars.groupby(by='am')\n",
    "\n",
    "# Look\n",
    "print(grouped)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "You get a special 'object' that indicates your data is successfully split, but has all the usual methods of a DataFrame that you can use to interact with."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mpg</th>\n",
       "      <th>cyl</th>\n",
       "      <th>disp</th>\n",
       "      <th>hp</th>\n",
       "      <th>drat</th>\n",
       "      <th>wt</th>\n",
       "      <th>qsec</th>\n",
       "      <th>vs</th>\n",
       "      <th>gear</th>\n",
       "      <th>carb</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>am</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>17.147368</td>\n",
       "      <td>6.947368</td>\n",
       "      <td>290.378947</td>\n",
       "      <td>160.263158</td>\n",
       "      <td>3.286316</td>\n",
       "      <td>3.768895</td>\n",
       "      <td>18.183158</td>\n",
       "      <td>0.368421</td>\n",
       "      <td>3.210526</td>\n",
       "      <td>2.736842</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>24.392308</td>\n",
       "      <td>5.076923</td>\n",
       "      <td>143.530769</td>\n",
       "      <td>126.846154</td>\n",
       "      <td>4.050000</td>\n",
       "      <td>2.411000</td>\n",
       "      <td>17.360000</td>\n",
       "      <td>0.538462</td>\n",
       "      <td>4.384615</td>\n",
       "      <td>2.923077</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          mpg       cyl        disp          hp      drat        wt  \\\n",
       "am                                                                    \n",
       "0   17.147368  6.947368  290.378947  160.263158  3.286316  3.768895   \n",
       "1   24.392308  5.076923  143.530769  126.846154  4.050000  2.411000   \n",
       "\n",
       "         qsec        vs      gear      carb  \n",
       "am                                           \n",
       "0   18.183158  0.368421  3.210526  2.736842  \n",
       "1   17.360000  0.538462  4.384615  2.923077  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mpg</th>\n",
       "      <th>cyl</th>\n",
       "      <th>disp</th>\n",
       "      <th>hp</th>\n",
       "      <th>drat</th>\n",
       "      <th>wt</th>\n",
       "      <th>qsec</th>\n",
       "      <th>vs</th>\n",
       "      <th>gear</th>\n",
       "      <th>carb</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>am</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>17.147368</td>\n",
       "      <td>6.947368</td>\n",
       "      <td>290.378947</td>\n",
       "      <td>160.263158</td>\n",
       "      <td>3.286316</td>\n",
       "      <td>3.768895</td>\n",
       "      <td>18.183158</td>\n",
       "      <td>0.368421</td>\n",
       "      <td>3.210526</td>\n",
       "      <td>2.736842</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>24.392308</td>\n",
       "      <td>5.076923</td>\n",
       "      <td>143.530769</td>\n",
       "      <td>126.846154</td>\n",
       "      <td>4.050000</td>\n",
       "      <td>2.411000</td>\n",
       "      <td>17.360000</td>\n",
       "      <td>0.538462</td>\n",
       "      <td>4.384615</td>\n",
       "      <td>2.923077</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          mpg       cyl        disp          hp      drat        wt  \\\n",
       "am                                                                    \n",
       "0   17.147368  6.947368  290.378947  160.263158  3.286316  3.768895   \n",
       "1   24.392308  5.076923  143.530769  126.846154  4.050000  2.411000   \n",
       "\n",
       "         qsec        vs      gear      carb  \n",
       "am                                           \n",
       "0   18.183158  0.368421  3.210526  2.736842  \n",
       "1   17.360000  0.538462  4.384615  2.923077  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Now apply a function - mean\n",
    "means = grouped.mean()\n",
    "\n",
    "display(means)\n",
    "\n",
    "# One line\n",
    "one_liner = mtcars.groupby(by='am').mean()\n",
    "display(one_liner)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Adding complexity to this is very simple. If you want to split by more variables, pass them as a list to `groupby()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>mpg</th>\n",
       "      <th>cyl</th>\n",
       "      <th>disp</th>\n",
       "      <th>hp</th>\n",
       "      <th>drat</th>\n",
       "      <th>wt</th>\n",
       "      <th>qsec</th>\n",
       "      <th>vs</th>\n",
       "      <th>carb</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>am</th>\n",
       "      <th>gear</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">0</th>\n",
       "      <th>3</th>\n",
       "      <td>16.106667</td>\n",
       "      <td>7.466667</td>\n",
       "      <td>326.3000</td>\n",
       "      <td>176.133333</td>\n",
       "      <td>3.132667</td>\n",
       "      <td>3.8926</td>\n",
       "      <td>17.692</td>\n",
       "      <td>0.20</td>\n",
       "      <td>2.666667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>21.050000</td>\n",
       "      <td>5.000000</td>\n",
       "      <td>155.6750</td>\n",
       "      <td>100.750000</td>\n",
       "      <td>3.862500</td>\n",
       "      <td>3.3050</td>\n",
       "      <td>20.025</td>\n",
       "      <td>1.00</td>\n",
       "      <td>3.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">1</th>\n",
       "      <th>4</th>\n",
       "      <td>26.275000</td>\n",
       "      <td>4.500000</td>\n",
       "      <td>106.6875</td>\n",
       "      <td>83.875000</td>\n",
       "      <td>4.133750</td>\n",
       "      <td>2.2725</td>\n",
       "      <td>18.435</td>\n",
       "      <td>0.75</td>\n",
       "      <td>2.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>21.380000</td>\n",
       "      <td>6.000000</td>\n",
       "      <td>202.4800</td>\n",
       "      <td>195.600000</td>\n",
       "      <td>3.916000</td>\n",
       "      <td>2.6326</td>\n",
       "      <td>15.640</td>\n",
       "      <td>0.20</td>\n",
       "      <td>4.400000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               mpg       cyl      disp          hp      drat      wt    qsec  \\\n",
       "am gear                                                                        \n",
       "0  3     16.106667  7.466667  326.3000  176.133333  3.132667  3.8926  17.692   \n",
       "   4     21.050000  5.000000  155.6750  100.750000  3.862500  3.3050  20.025   \n",
       "1  4     26.275000  4.500000  106.6875   83.875000  4.133750  2.2725  18.435   \n",
       "   5     21.380000  6.000000  202.4800  195.600000  3.916000  2.6326  15.640   \n",
       "\n",
       "           vs      carb  \n",
       "am gear                  \n",
       "0  3     0.20  2.666667  \n",
       "   4     1.00  3.000000  \n",
       "1  4     0.75  2.000000  \n",
       "   5     0.20  4.400000  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Group by transmission (am) and by number of gears (gear)\n",
    "trans_gear = mtcars.groupby(by=['am', 'gear']).mean()\n",
    "display(trans_gear)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "This is an easy way to group your data. But sometimes you don't want all of your variables out of a groupby object. In that case, simply *index* the groupby object before applying the function!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>mpg</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>am</th>\n",
       "      <th>gear</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">0</th>\n",
       "      <th>3</th>\n",
       "      <td>16.106667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>21.050000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">1</th>\n",
       "      <th>4</th>\n",
       "      <td>26.275000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>21.380000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               mpg\n",
       "am gear           \n",
       "0  3     16.106667\n",
       "   4     21.050000\n",
       "1  4     26.275000\n",
       "   5     21.380000"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Get mean miles per gallon from am and gear\n",
    "mean_mpg = mtcars.groupby(by=['am', 'gear'])[['mpg']].mean()\n",
    "display(mean_mpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### `.reset_index()` - a note\n",
    "You might notice that the index of the DataFrames look a little unusual - there appears to be a kind of 'nested' structure to them. This is intentional on Python's part - it allows you to  store multidimensional (more than 2) data in the essentially 2D data structure of the DataFrame. This is known as a `MultiIndex`, which we won't use much in this course.\n",
    "\n",
    "You can easily remove it back to a standard representation by using the `.reset_index()` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>am</th>\n",
       "      <th>gear</th>\n",
       "      <th>mpg</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>16.106667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>21.050000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>26.275000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>21.380000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   am  gear        mpg\n",
       "0   0     3  16.106667\n",
       "1   0     4  21.050000\n",
       "2   1     4  26.275000\n",
       "3   1     5  21.380000"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Reset \n",
    "display(mean_mpg.reset_index())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Other pandas data manipulation tools - `.transform()`\n",
    "There are two more tools to know about for manipulating data with Pandas.\n",
    "\n",
    "The first is `.transform()`. This allows you to apply a function of your choice to a DataFrame, but with the restriction that the output *will be forced* to be the same size as the original DataFrame. This is helpful when you want to have a value repeated, rather, as you have seen, collapsing the data down into a smaller or different sized DataFrame. An example will help:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>model</th>\n",
       "      <th>gear</th>\n",
       "      <th>hp</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Mazda RX4</td>\n",
       "      <td>4</td>\n",
       "      <td>110</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Mazda RX4 Wag</td>\n",
       "      <td>4</td>\n",
       "      <td>110</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Datsun 710</td>\n",
       "      <td>4</td>\n",
       "      <td>93</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Hornet 4 Drive</td>\n",
       "      <td>3</td>\n",
       "      <td>110</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Hornet Sportabout</td>\n",
       "      <td>3</td>\n",
       "      <td>175</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               model  gear   hp\n",
       "0          Mazda RX4     4  110\n",
       "1      Mazda RX4 Wag     4  110\n",
       "2         Datsun 710     4   93\n",
       "3     Hornet 4 Drive     3  110\n",
       "4  Hornet Sportabout     3  175"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Subset mt cars with just a few variables\n",
    "sub = mtcars[['model', 'gear', 'hp']].copy()\n",
    "display(sub.head())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/var/folders/mw/xt4ddf0j2n3dr4qcr__qhqlr0000gn/T/ipykernel_38375/2279592361.py:2: FutureWarning: Dropping invalid columns in DataFrameGroupBy.transform is deprecated. In a future version, a TypeError will be raised. Before calling .transform, select only columns which should be valid for the function.\n",
      "  sub['Subgroup_Mean_HP'] = sub.groupby('gear').transform('mean')\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>model</th>\n",
       "      <th>gear</th>\n",
       "      <th>hp</th>\n",
       "      <th>Subgroup_Mean_HP</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Mazda RX4</td>\n",
       "      <td>4</td>\n",
       "      <td>110</td>\n",
       "      <td>89.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Mazda RX4 Wag</td>\n",
       "      <td>4</td>\n",
       "      <td>110</td>\n",
       "      <td>89.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Datsun 710</td>\n",
       "      <td>4</td>\n",
       "      <td>93</td>\n",
       "      <td>89.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Hornet 4 Drive</td>\n",
       "      <td>3</td>\n",
       "      <td>110</td>\n",
       "      <td>176.133333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Hornet Sportabout</td>\n",
       "      <td>3</td>\n",
       "      <td>175</td>\n",
       "      <td>176.133333</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               model  gear   hp  Subgroup_Mean_HP\n",
       "0          Mazda RX4     4  110         89.500000\n",
       "1      Mazda RX4 Wag     4  110         89.500000\n",
       "2         Datsun 710     4   93         89.500000\n",
       "3     Hornet 4 Drive     3  110        176.133333\n",
       "4  Hornet Sportabout     3  175        176.133333"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>model</th>\n",
       "      <th>gear</th>\n",
       "      <th>hp</th>\n",
       "      <th>Subgroup_Mean_HP</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>Lotus Europa</td>\n",
       "      <td>5</td>\n",
       "      <td>113</td>\n",
       "      <td>195.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>Ford Pantera L</td>\n",
       "      <td>5</td>\n",
       "      <td>264</td>\n",
       "      <td>195.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>Ferrari Dino</td>\n",
       "      <td>5</td>\n",
       "      <td>175</td>\n",
       "      <td>195.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>Maserati Bora</td>\n",
       "      <td>5</td>\n",
       "      <td>335</td>\n",
       "      <td>195.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>Volvo 142E</td>\n",
       "      <td>4</td>\n",
       "      <td>109</td>\n",
       "      <td>89.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             model  gear   hp  Subgroup_Mean_HP\n",
       "27    Lotus Europa     5  113             195.6\n",
       "28  Ford Pantera L     5  264             195.6\n",
       "29    Ferrari Dino     5  175             195.6\n",
       "30   Maserati Bora     5  335             195.6\n",
       "31      Volvo 142E     4  109              89.5"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Group by the number of gears, then 'transform' by computing the mean of HP - add this back to the original DF!\n",
    "sub['Subgroup_Mean_HP'] = sub.groupby('gear').transform('mean')\n",
    "display(sub.head(), sub.tail())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Other pandas data manipulation tools - `.agg()`\n",
    "On the other hand, `.agg()` is designed to give you access to a range of 'aggregation' functions. It is in a sense the opposite of `.transform()`, because `.agg()` will collapse the DataFrame down into aggregated versions, but offers a wide range of flexibility, including asking for multiple functions, and applying across either rows or columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/var/folders/mw/xt4ddf0j2n3dr4qcr__qhqlr0000gn/T/ipykernel_38375/3822526560.py:2: FutureWarning: ['model'] did not aggregate successfully. If any error is raised this will raise in a future version of pandas. Drop these columns/ops to avoid this warning.\n",
      "  display(mtcars.agg(['mean', 'std', 'var', 'sum']))\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>model</th>\n",
       "      <th>mpg</th>\n",
       "      <th>cyl</th>\n",
       "      <th>disp</th>\n",
       "      <th>hp</th>\n",
       "      <th>drat</th>\n",
       "      <th>wt</th>\n",
       "      <th>qsec</th>\n",
       "      <th>vs</th>\n",
       "      <th>am</th>\n",
       "      <th>gear</th>\n",
       "      <th>carb</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>NaN</td>\n",
       "      <td>20.090625</td>\n",
       "      <td>6.187500</td>\n",
       "      <td>230.721875</td>\n",
       "      <td>146.687500</td>\n",
       "      <td>3.596563</td>\n",
       "      <td>3.217250</td>\n",
       "      <td>17.848750</td>\n",
       "      <td>0.437500</td>\n",
       "      <td>0.406250</td>\n",
       "      <td>3.687500</td>\n",
       "      <td>2.812500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>NaN</td>\n",
       "      <td>6.026948</td>\n",
       "      <td>1.785922</td>\n",
       "      <td>123.938694</td>\n",
       "      <td>68.562868</td>\n",
       "      <td>0.534679</td>\n",
       "      <td>0.978457</td>\n",
       "      <td>1.786943</td>\n",
       "      <td>0.504016</td>\n",
       "      <td>0.498991</td>\n",
       "      <td>0.737804</td>\n",
       "      <td>1.615200</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>var</th>\n",
       "      <td>NaN</td>\n",
       "      <td>36.324103</td>\n",
       "      <td>3.189516</td>\n",
       "      <td>15360.799829</td>\n",
       "      <td>4700.866935</td>\n",
       "      <td>0.285881</td>\n",
       "      <td>0.957379</td>\n",
       "      <td>3.193166</td>\n",
       "      <td>0.254032</td>\n",
       "      <td>0.248992</td>\n",
       "      <td>0.544355</td>\n",
       "      <td>2.608871</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sum</th>\n",
       "      <td>Mazda RX4Mazda RX4 WagDatsun 710Hornet 4 Drive...</td>\n",
       "      <td>642.900000</td>\n",
       "      <td>198.000000</td>\n",
       "      <td>7383.100000</td>\n",
       "      <td>4694.000000</td>\n",
       "      <td>115.090000</td>\n",
       "      <td>102.952000</td>\n",
       "      <td>571.160000</td>\n",
       "      <td>14.000000</td>\n",
       "      <td>13.000000</td>\n",
       "      <td>118.000000</td>\n",
       "      <td>90.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                  model         mpg  \\\n",
       "mean                                                NaN   20.090625   \n",
       "std                                                 NaN    6.026948   \n",
       "var                                                 NaN   36.324103   \n",
       "sum   Mazda RX4Mazda RX4 WagDatsun 710Hornet 4 Drive...  642.900000   \n",
       "\n",
       "             cyl          disp           hp        drat          wt  \\\n",
       "mean    6.187500    230.721875   146.687500    3.596563    3.217250   \n",
       "std     1.785922    123.938694    68.562868    0.534679    0.978457   \n",
       "var     3.189516  15360.799829  4700.866935    0.285881    0.957379   \n",
       "sum   198.000000   7383.100000  4694.000000  115.090000  102.952000   \n",
       "\n",
       "            qsec         vs         am        gear       carb  \n",
       "mean   17.848750   0.437500   0.406250    3.687500   2.812500  \n",
       "std     1.786943   0.504016   0.498991    0.737804   1.615200  \n",
       "var     3.193166   0.254032   0.248992    0.544355   2.608871  \n",
       "sum   571.160000  14.000000  13.000000  118.000000  90.000000  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Grab multiple summary statistics from DataFrame\n",
    "display(mtcars.agg(['mean', 'std', 'var', 'sum']))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice how pandas warns us that for some columns, things are going wrong. For example, pandas cannot compute the mean of the `model` column - what does it mean to average of a bunch of strings? However, notice it can *sum* the model names together, because Python can 'add' strings together. It warns us we should drop these columns before we do anything, so let us do so, and follow it with a group-by."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th colspan=\"2\" halign=\"left\">mpg</th>\n",
       "      <th colspan=\"2\" halign=\"left\">cyl</th>\n",
       "      <th colspan=\"2\" halign=\"left\">disp</th>\n",
       "      <th colspan=\"2\" halign=\"left\">hp</th>\n",
       "      <th colspan=\"2\" halign=\"left\">drat</th>\n",
       "      <th colspan=\"2\" halign=\"left\">wt</th>\n",
       "      <th colspan=\"2\" halign=\"left\">qsec</th>\n",
       "      <th colspan=\"2\" halign=\"left\">vs</th>\n",
       "      <th colspan=\"2\" halign=\"left\">carb</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>am</th>\n",
       "      <th>gear</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">0</th>\n",
       "      <th>3</th>\n",
       "      <td>16.106667</td>\n",
       "      <td>3.371618</td>\n",
       "      <td>7.466667</td>\n",
       "      <td>1.187234</td>\n",
       "      <td>326.3000</td>\n",
       "      <td>94.852735</td>\n",
       "      <td>176.133333</td>\n",
       "      <td>47.689272</td>\n",
       "      <td>3.132667</td>\n",
       "      <td>0.273665</td>\n",
       "      <td>3.8926</td>\n",
       "      <td>0.832993</td>\n",
       "      <td>17.692</td>\n",
       "      <td>1.349916</td>\n",
       "      <td>0.20</td>\n",
       "      <td>0.414039</td>\n",
       "      <td>2.666667</td>\n",
       "      <td>1.175139</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>21.050000</td>\n",
       "      <td>3.069745</td>\n",
       "      <td>5.000000</td>\n",
       "      <td>1.154701</td>\n",
       "      <td>155.6750</td>\n",
       "      <td>13.978883</td>\n",
       "      <td>100.750000</td>\n",
       "      <td>29.010056</td>\n",
       "      <td>3.862500</td>\n",
       "      <td>0.115000</td>\n",
       "      <td>3.3050</td>\n",
       "      <td>0.156738</td>\n",
       "      <td>20.025</td>\n",
       "      <td>2.041854</td>\n",
       "      <td>1.00</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>1.154701</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">1</th>\n",
       "      <th>4</th>\n",
       "      <td>26.275000</td>\n",
       "      <td>5.414465</td>\n",
       "      <td>4.500000</td>\n",
       "      <td>0.925820</td>\n",
       "      <td>106.6875</td>\n",
       "      <td>37.162978</td>\n",
       "      <td>83.875000</td>\n",
       "      <td>24.174588</td>\n",
       "      <td>4.133750</td>\n",
       "      <td>0.345912</td>\n",
       "      <td>2.2725</td>\n",
       "      <td>0.460814</td>\n",
       "      <td>18.435</td>\n",
       "      <td>1.158916</td>\n",
       "      <td>0.75</td>\n",
       "      <td>0.462910</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>1.309307</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>21.380000</td>\n",
       "      <td>6.658979</td>\n",
       "      <td>6.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>202.4800</td>\n",
       "      <td>115.490636</td>\n",
       "      <td>195.600000</td>\n",
       "      <td>102.833847</td>\n",
       "      <td>3.916000</td>\n",
       "      <td>0.389525</td>\n",
       "      <td>2.6326</td>\n",
       "      <td>0.818925</td>\n",
       "      <td>15.640</td>\n",
       "      <td>1.130487</td>\n",
       "      <td>0.20</td>\n",
       "      <td>0.447214</td>\n",
       "      <td>4.400000</td>\n",
       "      <td>2.607681</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               mpg                 cyl                disp              \\\n",
       "              mean       std      mean       std      mean         std   \n",
       "am gear                                                                  \n",
       "0  3     16.106667  3.371618  7.466667  1.187234  326.3000   94.852735   \n",
       "   4     21.050000  3.069745  5.000000  1.154701  155.6750   13.978883   \n",
       "1  4     26.275000  5.414465  4.500000  0.925820  106.6875   37.162978   \n",
       "   5     21.380000  6.658979  6.000000  2.000000  202.4800  115.490636   \n",
       "\n",
       "                 hp                  drat                wt              qsec  \\\n",
       "               mean         std      mean       std    mean       std    mean   \n",
       "am gear                                                                         \n",
       "0  3     176.133333   47.689272  3.132667  0.273665  3.8926  0.832993  17.692   \n",
       "   4     100.750000   29.010056  3.862500  0.115000  3.3050  0.156738  20.025   \n",
       "1  4      83.875000   24.174588  4.133750  0.345912  2.2725  0.460814  18.435   \n",
       "   5     195.600000  102.833847  3.916000  0.389525  2.6326  0.818925  15.640   \n",
       "\n",
       "                     vs                carb            \n",
       "              std  mean       std      mean       std  \n",
       "am gear                                                \n",
       "0  3     1.349916  0.20  0.414039  2.666667  1.175139  \n",
       "   4     2.041854  1.00  0.000000  3.000000  1.154701  \n",
       "1  4     1.158916  0.75  0.462910  2.000000  1.309307  \n",
       "   5     1.130487  0.20  0.447214  4.400000  2.607681  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th colspan=\"2\" halign=\"left\">mpg</th>\n",
       "      <th colspan=\"2\" halign=\"left\">cyl</th>\n",
       "      <th colspan=\"2\" halign=\"left\">disp</th>\n",
       "      <th colspan=\"2\" halign=\"left\">hp</th>\n",
       "      <th colspan=\"2\" halign=\"left\">drat</th>\n",
       "      <th colspan=\"2\" halign=\"left\">wt</th>\n",
       "      <th colspan=\"2\" halign=\"left\">qsec</th>\n",
       "      <th colspan=\"2\" halign=\"left\">vs</th>\n",
       "      <th colspan=\"2\" halign=\"left\">carb</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>am</th>\n",
       "      <th>gear</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">0</th>\n",
       "      <th>3</th>\n",
       "      <td>16.106667</td>\n",
       "      <td>3.371618</td>\n",
       "      <td>7.466667</td>\n",
       "      <td>1.187234</td>\n",
       "      <td>326.3000</td>\n",
       "      <td>94.852735</td>\n",
       "      <td>176.133333</td>\n",
       "      <td>47.689272</td>\n",
       "      <td>3.132667</td>\n",
       "      <td>0.273665</td>\n",
       "      <td>3.8926</td>\n",
       "      <td>0.832993</td>\n",
       "      <td>17.692</td>\n",
       "      <td>1.349916</td>\n",
       "      <td>0.20</td>\n",
       "      <td>0.414039</td>\n",
       "      <td>2.666667</td>\n",
       "      <td>1.175139</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>21.050000</td>\n",
       "      <td>3.069745</td>\n",
       "      <td>5.000000</td>\n",
       "      <td>1.154701</td>\n",
       "      <td>155.6750</td>\n",
       "      <td>13.978883</td>\n",
       "      <td>100.750000</td>\n",
       "      <td>29.010056</td>\n",
       "      <td>3.862500</td>\n",
       "      <td>0.115000</td>\n",
       "      <td>3.3050</td>\n",
       "      <td>0.156738</td>\n",
       "      <td>20.025</td>\n",
       "      <td>2.041854</td>\n",
       "      <td>1.00</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>1.154701</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">1</th>\n",
       "      <th>4</th>\n",
       "      <td>26.275000</td>\n",
       "      <td>5.414465</td>\n",
       "      <td>4.500000</td>\n",
       "      <td>0.925820</td>\n",
       "      <td>106.6875</td>\n",
       "      <td>37.162978</td>\n",
       "      <td>83.875000</td>\n",
       "      <td>24.174588</td>\n",
       "      <td>4.133750</td>\n",
       "      <td>0.345912</td>\n",
       "      <td>2.2725</td>\n",
       "      <td>0.460814</td>\n",
       "      <td>18.435</td>\n",
       "      <td>1.158916</td>\n",
       "      <td>0.75</td>\n",
       "      <td>0.462910</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>1.309307</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>21.380000</td>\n",
       "      <td>6.658979</td>\n",
       "      <td>6.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>202.4800</td>\n",
       "      <td>115.490636</td>\n",
       "      <td>195.600000</td>\n",
       "      <td>102.833847</td>\n",
       "      <td>3.916000</td>\n",
       "      <td>0.389525</td>\n",
       "      <td>2.6326</td>\n",
       "      <td>0.818925</td>\n",
       "      <td>15.640</td>\n",
       "      <td>1.130487</td>\n",
       "      <td>0.20</td>\n",
       "      <td>0.447214</td>\n",
       "      <td>4.400000</td>\n",
       "      <td>2.607681</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               mpg                 cyl                disp              \\\n",
       "              mean       std      mean       std      mean         std   \n",
       "am gear                                                                  \n",
       "0  3     16.106667  3.371618  7.466667  1.187234  326.3000   94.852735   \n",
       "   4     21.050000  3.069745  5.000000  1.154701  155.6750   13.978883   \n",
       "1  4     26.275000  5.414465  4.500000  0.925820  106.6875   37.162978   \n",
       "   5     21.380000  6.658979  6.000000  2.000000  202.4800  115.490636   \n",
       "\n",
       "                 hp                  drat                wt              qsec  \\\n",
       "               mean         std      mean       std    mean       std    mean   \n",
       "am gear                                                                         \n",
       "0  3     176.133333   47.689272  3.132667  0.273665  3.8926  0.832993  17.692   \n",
       "   4     100.750000   29.010056  3.862500  0.115000  3.3050  0.156738  20.025   \n",
       "1  4      83.875000   24.174588  4.133750  0.345912  2.2725  0.460814  18.435   \n",
       "   5     195.600000  102.833847  3.916000  0.389525  2.6326  0.818925  15.640   \n",
       "\n",
       "                     vs                carb            \n",
       "              std  mean       std      mean       std  \n",
       "am gear                                                \n",
       "0  3     1.349916  0.20  0.414039  2.666667  1.175139  \n",
       "   4     2.041854  1.00  0.000000  3.000000  1.154701  \n",
       "1  4     1.158916  0.75  0.462910  2.000000  1.309307  \n",
       "   5     1.130487  0.20  0.447214  4.400000  2.607681  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Combine with drop/groupby chained operation\n",
    "grouped_data = mtcars.drop(columns='model').groupby(['am', 'gear'])\n",
    "display(grouped_data.agg(['mean', 'std']))\n",
    "\n",
    "# Works the same in a single line of course\n",
    "display(mtcars.drop(columns='model').groupby(by=['am', 'gear']).agg(['mean', 'std']))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th colspan=\"2\" halign=\"left\">mpg</th>\n",
       "      <th>hp</th>\n",
       "      <th colspan=\"2\" halign=\"left\">cyl</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>sum</th>\n",
       "      <th>var</th>\n",
       "      <th>median</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>am</th>\n",
       "      <th>gear</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">0</th>\n",
       "      <th>3</th>\n",
       "      <td>16.106667</td>\n",
       "      <td>3.371618</td>\n",
       "      <td>2642</td>\n",
       "      <td>1.409524</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>21.050000</td>\n",
       "      <td>3.069745</td>\n",
       "      <td>403</td>\n",
       "      <td>1.333333</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">1</th>\n",
       "      <th>4</th>\n",
       "      <td>26.275000</td>\n",
       "      <td>5.414465</td>\n",
       "      <td>671</td>\n",
       "      <td>0.857143</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>21.380000</td>\n",
       "      <td>6.658979</td>\n",
       "      <td>978</td>\n",
       "      <td>4.000000</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               mpg              hp       cyl       \n",
       "              mean       std   sum       var median\n",
       "am gear                                            \n",
       "0  3     16.106667  3.371618  2642  1.409524    8.0\n",
       "   4     21.050000  3.069745   403  1.333333    5.0\n",
       "1  4     26.275000  5.414465   671  0.857143    4.0\n",
       "   5     21.380000  6.658979   978  4.000000    6.0"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Even cooler, pass specific functions to specific columns using a dictionary, omitting the need to drop nuisance columns\n",
    "various = grouped_data.agg({'mpg':['mean', 'std'], 'hp':'sum', 'cyl':['var', 'median']})\n",
    "display(various)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "Sometimes, you don't want the grouping variables to be in the index. If so, you can pass `as_index=False` in the call to `groupby`. In addition, the `agg` function supports a named-tuple assignment that allows you to change the names of the resulting aggregation outputs. Lets see what that looks like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>am</th>\n",
       "      <th>gear</th>\n",
       "      <th>average_mpg_right_here</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>16.106667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>21.050000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>26.275000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>21.380000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   am  gear  average_mpg_right_here\n",
       "0   0     3               16.106667\n",
       "1   0     4               21.050000\n",
       "2   1     4               26.275000\n",
       "3   1     5               21.380000"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Demonstrate as_index=False and named tuple assignment\n",
    "cool = mtcars.groupby(['am', 'gear'], as_index=False).agg(average_mpg_right_here=('mpg', 'mean'))\n",
    "display(cool)"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}