Python 102 for scientific computing and data analysis¶
This tutorial covers topics that are essential for scientific computing and data analysis in Python, but typically not covered in an introductory course or workshop.
These are the thing you need to know if you are writing software that meets any of the following criteria:
- You expect to be working on it for more than a couple of weeks.
- You expect that it will be composed of more than a hundred or so lines of code.
- You want it to produce results that can be trusted - for example, if you are publishing a research paper based on those results.
- You expect that it will be used by one or more other people.
- You are contributing to another project - e.g., an open-source software package.
What you will learn¶
- How to organize the code for your project, and how to make it an installable package rather than a loose collection of files.
- How to write tests for your code so that you can be sure it always produces the correct answer, even as you make changes to it.
- How to document your code so that it is easy for you and others to use and navigate.
- How to improve the usability of your code.
- How to improve the performance of your code.
What you need to know¶
This tutorial assumes you know the very basics of programming with Python.
If you can write a loop and a function in Python,
and if you know how to run a .py
script,
you should be able to follow this tutorial easily.
What you need to have¶
If you plan to participate in the hands-on exercises, you will need:
- A laptop with Anaconda installed on it
- 1 or more friends. It is highly encouraged to work in groups, so if you haven’t already, please introduce yourself to your neighbour(s).
- Organizing code for a Python project
- Testing your code
- Documenting your code
- Improving the usability of Python programs
- Improving the performance of Python programs
- Timing code and identifying bottlenecks
- Install optimized versions of libraries
- Choose the right algorithm
- Choose the appropriate data format
- Don’t reinvent the wheel
- Benchmark, benchmark, benchmark!
- Avoid explicit loops
- Avoid repeatedly allocating, copying and rearranging data
- Access data from memory efficiently
- Interfacing with compiled code
- Parallelization