An introduction to data analysis in Python#

logo

The Python programming language is one of the most popular programming languages in the world, and is leading the revolution in the data science ecosystem. It emphasises clear, readable code, supports a range of programming approaches, and has a huge array of packages to support almost any need.

While Python is a general-purpose programming language used for a wide range of tasks, the last 20 years have seen significant investment in its data handling, visualisation, and statistical capabilities. However, if you have never programmed before, using Python can be intimidating, and its barrier to entry is arguably higher than specialised statistical languages like R.

This short course represents the knowledge I wish I had available when I started out on my research journey in psychology over ten years ago, adopting Python as my language of choice. Given Python’s status as a general-purpose language, finding help online was sometimes difficult, with answers requiring high-level programming or statistical knowledge to grasp. Coming from a psychological science background, my own statistical and programming knowledge has been acquired over many battles with real-world research and data problems, and I hope this applied perspective is accessible to new users of the language.

The content here has been adapted and expanded from my own teaching and research experiences, and absolutely represents only the start of the journey. I have purposefully omitted many advanced capabilities of the packages shown here, or written code in a more long-winded way that is more readable to those starting out.

I strongly recommend the use of the Anaconda distribution as well the JupyterLab environment. Anaconda will come with all of those readily installed, as well as almost all of the packages one uses to be productive with data in Python.

Please reach out with any questions or if I have made mistakes anywhere - there are always some!