Thursday, September 18, 2014

High Performance Python by Micha Gorelick & Ian Ozsvald; O'Reilly Media

One of the big draws of the Python programming language is that it is very easy to develop something relatively complex quite rapidly.  However, Python is much more than a prototyping language, and High Performance Python: Practical Performant Programming for Humans is a great resource to help you think about how you approach problems in Python, as well as tracking down and improving bottlenecks in your code.


This book is definitely not going to teach you Python.  There are many other tutorials and references out there to learn about the language, and this book assumes you are already a proficient Python programmer and will be able to read and understand the code examples they provide.  This book is about tuning your Python code to run faster. 

The progression of the chapters is very logical, and some of the same toy problems re-appear throughout the book as additional optimizations provide even greater efficiency improvements.  The book introduces a large number of tools, and it mostly gives you an idea of what the tool is and why you might consider it.  To really use any of the tools in practice, you'll want to reference online documentation, but this book gives you a good idea of where to start looking.


I was particularly interested in reading the "Clusters and Job Queues" chapter before I got the book, and it helped guide me to an IPython.parallel solution that fits my current problem quite nicely, as well as provide some other tools I may investigate in the future.

The authors recommend the Anaconda Python distribution by Continuum Analytics on several occasions, and I definitely agree.  Some of the tools and techniques in the book use only the Standard Library, but most of the more advanced topics require external modules.  Many of the modules referenced (numpy, Cython, Tornado, & IPython to name just a few) are included in the Anaconda distribution as one simple download.

This book's use is twofold.  First, it is worth a full read-through for the discussion of the various things that tend to slow down Python code (or code in general) and what kinds of approaches you should be aware of.  Second, it provides good, brief examples of many different tools in practice, as well as listing other recommended resources at the end of each chapter, allowing it to serve as a good reference text.


One point the authors make repeatedly is that you must consider the trade between code execution time and development velocity.  Many of the things you can do to speed up your code will make it considerably harder to understand and work with in the future.  It's important to always have proof that you are optimizing the right portions of code and that the benefits are worth it.  They help you to look for the "big wins" where you can get drastic speed improvements with minimal effort and complexity.

Disclaimer: I received a free Ebook copy of this work under the O'Reilly Blogger Review Program.  I also happened to like it so much that I bought a hard copy as well so I can have it on my reference shelf at work.