I/O is no longer the bottleneck
When interviewing programmers, I often ask them to code a simple program to count word frequencies in a text file. It’s a good problem that tests a bunch of skills, and with some follow-up questions, allows you to go surprisingly deep.
One of the follow-up questions I ask is, “What’s the performance bottleneck in your program?” Most people say something like “reading from the input file”.
before I analyzed the performance of the count-words problem, I thought the same. It’s what we’ve all been taught, right? “I/O is slow.”
Not anymore! Disk I/O may have been slow 10 or 20 years ago, but in 2022, reading a file sequentially from disk is very fast.
So what is the bottleneck in a program that counts word frequencies like the one above? It’s processing or parsing the input and the associated memory allocations: splitting the input into words, converting to lowercase, and counting frequencies with a hash table.
I modified my Python and Go count-words programs to record the times of the various stages of the process: reading the input, processing (the slow part), sorting by most frequent, and outputting.
The sorting and outputting is negligible here: because the input is 100 copies, the number of unique words is comparatively low.
The optimized Go version is significantly faster, but also quite a lot more complicated. We’re avoiding most memory allocations by converting to lowercase and splitting on word boundaries in place. That’s a good rule of thumb for optimizing CPU-bound code: reduce memory allocations.
I haven’t shown an optimized Python version because it’s hard to optimize Python much further! […] It’s as fast as it is because the core operations are happening in C code–that’s why it so often doesn’t matter that “Python is slow”.