Statistical Modelling with Python

Statistical Modelling with Python#

The Python programming language is one of the most popular programming languages in the world, and is leading the revolution in the data science ecosystem. It emphasises clear, readable code, supports a range of programming approaches, and has a huge array of packages to support almost any need.

While Python is a general-purpose programming language used for a wide range of tasks, the last 20 years have seen significant investment in its data handling, visualisation, and statistical capabilities. However, if you have never programmed before, using Python can be intimidating, and its barrier to entry is arguably higher than specialised statistical languages like R.

This course represents a more advanced take on the use of Python for building and interpreting statistical models. A lot of the course is focused on linear regression and its variations, but later chapters expand to factor analysis and clustering models. The course emphasises principled model building and interpretation over the use of inferential methods such as null hypothesis significance testing and more traditional frequentist inference methods. There’s also little content on data handling and preparation, but see here for more content in that realm.

I strongly recommend the use of the Anaconda distribution as well the JupyterLab environment. Anaconda will come with all of those readily installed, as well as almost all of the packages one uses to be productive with data in Python.

Please reach out with any questions or if I have made mistakes anywhere - there are always some!

Python Basics

The GLM

Advancing the GLM

The art of prediction

Binary outcome models

Applications and other kinds of hypotheses

Linear Mixed Effects Models

Exploratory Factor Analysis

Confirmatory Factor Analysis

Clustering