A minimalist guide to learning Python for scientific use

Why another guide to Python programming? The main reason is that scientific programming is different from commercial software development, in particular for those who need results quickly and might not have the desire or the time to become a Python expert. The following guide is an ordered list of things I consider the most useful things to know for scientific code development in Python. It assumes the ability to write and run a single-file python script (or a Jupyter Notebook), and ends with writing and publishing your own package.

Mastering all of these topics takes years of practice (and I don’t claim I do), so start with the basics, and once you are familiar with one topic, move on to the next.

  • The most important libraries: numpy and matplotlib.pyplot. Get familiar with their functionality and how to use them.
  • Algorithms and data structures:
    • Learn a few of the basic algorithms and check if there is a library implementation.
    • Difference between arrays and dictionaries, and how to use both is very important.
  • Use functions: once a block of code exceeds the space on your screen, refactor parts to a function (recursively, if needed).
    • Make use of the flexibility in return type.
    • Use keyword arguments and default values wherever appropriate.
  • Use the time module to determine slowest part of code.
  • Learn simple optimization techniques:
    • Avoid loops.
    • Save intermediate results of calculation in files (numpy or pickle) or in memory.
    • Use of library functions whenever possible.
  • Create modules
    • In particular, packages (section 6.4 in ‘create modules’ link)
  • Documentation of functions and classes.
  • If you don’t to this already: use version control, e.g. git for your project
  • Get familiar with classes and object-oriented programming in Python. But:
    • Learn about design patterns and design principles. Using classes without having heard about these can be counterproductive.
    • Stick to the PEP8 style guide naming conventions
  • Unit tests via pytest
  • More advanced optimization:
    • Cython/using compiled code
    • Parallelism in python e.g. mpi4py
  • Create your own package
    • setuptools to install packages as a library
    • Continuous integration/ automated test and style check
    • Versioning (maybe a bit over-the-top for most purposes, but one suggestion here)
    • Creating a python package (e.g. for pip)

The question now is how to learn all of this. First and foremost, you learn to code by coding. There is no way around this. However, sometimes it is easy to fall back to somewhat complicated but known ways to implement things. Therefore spending some time trying new things is a good investment in your programming skills. Search engines are essential for concrete problems. For improving code structure, reading more experienced programmers’ code can be helpful. Once you are more advanced, I would therefore recommend looking at implementations of some of the major libraries and trying to understand them.