Testing Django Performance

I presented at the MelbDjango user group on 11 July 2013, here's the slides:

Notes

Some quick scribblings re: the content of my talk, and the feedback from the Djangonauts present.

Performance Testing

I'm talking about Performance Testing, not particularly about Performance Improvement, although I'll touch on a few common traps later. Specifically, I'm splitting testing up into:

  • Code Profiling: where are the bottlenecks, what do we try next?
  • Relative Performance: we've made some changes, did it help?
  • Absolute Performance: can we survive success?

Test Platform

This is really just a plug for Amazon AWS :-)

Synthetic Test Data

Empty databases go fast! You need to work with the business people, or your own business plan, to work out what the requirements of "success" are for your project, and then generate synthetic test data which fits that profile, so that your testing actually has some meaningful results.

Django makes it very easy to generate test data from simple Python code. I like the 'random', 'inflect' and 'loremipsum' modules for this. Generating test data which looks a lot like real data not only makes your testing more relistic, it makes the test data useful for your designers etc as dummy data.

Note

Sam Stewart pointed out Factory Boy which I haven't used but looks like an easy way to set this up.

Simulating Load -- Siege

Siege is one of the easier load testing tools to get into.

The important thing here is to make sure your load testing model also matches the business model ... Three big things to consider are:

  • Concurrency: how many users do we expect, and how much will they overlap
  • URLs: we need a list of URLs for Siege to probe, generally we can generate this from python code too.
  • Cacheability: There's a big difference between 10000 articles accessed uniformly and 10000 articles for which 90% of accesses are for the "top 100".

I had collected some data showing this off but it didn't really fit into the time available, maybe next time!

System Under Load

Note

I somehow didn't end up with a slide for this, but ...

Look at where the CPU is going:

  • If it is largely in django's 'python' processes, suspect templates.
  • If it is largely in postgres/mysql processes, suspect bad queries.
  • If there doesn't seem to be much load but everything is going slow, suspect I/O bottleneck (probably bad queries doing table scans).

Antipatterns -- N + 1

Or: how to not torture your SQL layer.

Query Logging

Postgres & MySQL both offer quite good options for logging queries.

Note

There's also lots of middleware options, which I think I forgot to mention.

Other Bottlenecks

  • Disk IOPS, you'll want to learn about 'iostat' or your system's equivalent

  • Network: sometimes you run into limited bandwidth or resources such as sockets / ports.

  • Entropy: I thought I'd written a blog post about this but it turns out I never got around to finishing it. Some refs:

    I couldn't remeber the name of the system command to check the entropy pool in linux, because all you have to do is:

    cat /proc/sys/kernel/random/entropy_avail
    

    It may sound a bit farfetched but I've run into this problem on a couple of production systems which happen to be running on virtuals.

Other Stuff