Taco Bell Programming
Every item on the menu at Taco Bell is just a different configuration of roughly eight ingredients.
The more I write code and design systems, the more I understand that many times, you can achieve the desired functionality simply with clever reconfigurations of the basic Unix tool set. After all, functionality is an asset, but code is a liability.
Taco Bell Programming is about developers knowing enough about Ops (and Unix in general) so that they don’t overthink things, and arrive at simple, scalable solutions.
suppose you have millions of web pages that you want to download and save to disk for later processing.
The Taco Bell answer?
xargs
andwget
. In the rare case that you saturate the network connection, add somesplit
andrsync
. A “distributed crawler” is really only like 10 lines of shell script.
Moving on, once you have these millions of pages (or even tens of millions), how do you process them? Surely Hadoop MapReduce is necessary, after all, that’s what Google uses to parse the web, right?
find crawl_dir/ -type f -print0 | xargs -n1 -0 P32 ./process
If you don’t want to think of it from a Zen perspective, be capitalist: you are writing software to put food on the table. You can minimize risk by using the well-proven tool set, or you can step into the land of the unknown.