Laws of Performant Software
Crista’s Five Laws of Performant Software
1. Programming Language « Programmer’s awareness of performance
The programming language doesn’t matter as much as the programmer’s awareness about the implementation of that language and its libraries.
For better or for worse, the high-level languages provide a large surface area of candy features and libraries that are really awesome to use… until you realize they require huge amounts of memory, or have a super-linear behavior with size of input. It is critical that people question “how does this magic actually work?,” go search for the answer, and figure out the est way of scaling things if the convenient candy is not as good as needed.
2. d(f^T(x), f^T(y)) > e^{ατ} d(x, y) or: small design details matter
Here, x and y denote two versions of the same code, and d(x, y) is the difference between them. f^T(x) and f^T(y) are the effects of running that code after a certain time, and d(f^T(x), f^T(y)) is the difference between them.
What this law says is that small code differences can have huge effect differences. Some may recognize this law from chaotic systems.
In high-level languages, unless your application design is seriously broken, you don’t need big code changes to make your program perform better; very small code changes can have huge consequences for performance.
3. corr(performance degradation, unbounded resource usage) > 0.9
The slope from writing small apps to writing robust programs that survive all sorts of abuse from uncontrolled input is very steep, and requires a mindset focused on operation rather than on function. When you don’t limit the use of resources, chances are they will be exhausted.
Don’t use resources in an unbound manner, or the operation of your program will degrade very quickly after a certain threshold.
4. Performance improvements = log(controlled experiments)
It seems pretty obvious, but it’s amazing how many times I’ve seen people not being able to answer simple questions about their programs such as “how long does this function actually take to run?” or “how much memory does this data structure entail?” Not knowing is not a problem; being oblivious of the need to find out is a big problem!
there’s a law of diminishing returns on these experiments: a few parts of the code are bottlenecks, and need to be tuned carefully; most parts don’t contribute much to performance, so there’s not much value in measuring those. A few, meaningfully placed performance measurements can get you 90% of the value of measuring things.
5. N*bad ≠ good
with sufficiently large data, no amount of nodes, cores, or memory will save you from code written with performance-related design flaws.
Code that doesn’t perform, or that crashes, in a single core with just a few G of RAM will be equally bad in more powerful hardware.