Let 1000 flowers bloom. Then rip 999 of them out by the roots

2023-02-08

http://www.gigamonkeys.com/flowers/

Peter Seibel

2015-09-28

For the past eleven months I’ve been the tech lead for Twitter’s Engineering Effectiveness group [what might also be called] Developer Tools, Developer Productivity, Engineering Infrastructure, or Developer Efficiency.

as an industry we mostly don’t know how to do it and consequently massively under-invest in making our engineering orgs actually effective.

Scaling our Software: let a thousand flowers bloom

What started as a simple Rails app grew into what was eventually probably the largest monolithic Rails app on the planet, known as the Monorail.

However Twitter wasn’t all Ruby for long. In 2008, Twitter acquired a five-person search company whose technology stack was Java based.

2008 was also the year the first Scala was written at Twitter.

[During the 2010 World Cup] More or less every time someone scored a goal, the Tweets per second would spike and knock over the site. GOOOOAAAAALLLL! Fail whale. GOOOOAAAAALLLL! Fail whale.

The “off the monorail” effort began in earnest and we started teaching Ruby developers Scala so they would write services to replace the Monorail. And over on the ads side some data scientists started work on a Scala DSL for writing Map Reduce jobs that became Scalding. Soon we had three kinds of Scala written at Twitter: Scala written by people who wished it was Ruby, Scala written by people who wished it was Java, and Scala written by people who wished it was Haskell. Let a thousand flowers bloom.

Fast forward four years to the next World Cup and it’s a totally different story. We handled a massive tweet volume basically without problem

The garden is overrun

The Science repo had grown up to be one of Twitter’s two monorepi. The other monorepo was formed from all those Scala services that had spun out of the off-the-monorail repo. They had started out as separate repos but were eventually consolidated into a single repo, Birdcage, in order to make it easier for Finagle developers to upgrade their library and all its clients together.

Why, you might ask, when the Scala developers decided they needed a monorepo didn’t they move their code into Science; why make another monorepo? Good question.

Conflicting build systems; posturing

Unfortunately Finagle had also really taken off and eventually code in Science had started taking dependencies on Finagle. And code in the Birdcage took dependencies on libraries in Science.

Pants had been open sourced in throw it over the wall fashion and picked up by a few engineers at other companies, such as Square and Foursquare, and moved forward. In the meantime, again because there weren’t enough people whose job it was to take care of these things, Science was still on the original internally developed version and had in fact evolved independently of the open source version.

All of which is to say, it was a mess.

How to think about engineering effectiveness

we–as an industry–are not very good about thinking about how to make engineers effective.

Engineers' effectiveness, on the other hand, is hard to measure. We don’t even really know what makes people productive; thus we talk about 10x engineers as though that’s a thing when even the studies that lead to the notion of a 10x engineer pointed more strongly to the notion of a 10x office.

But we’d all agree, I think, that it is possible to affect engineers' productivity. At the very least it is possible to harm it.

The Twitter EE motto is “Quality, Speed, Joy”. […] Unlike that other famous triple, Fast, Cheap, Good, we believe you don’t have to pick just two. In fact they feed into each other.

One place to start is with simple time savings. […] Assuming a standard 8-hour–or 480-minute–day we only have to save everyone about five minutes a day to get a 1% speed gain. Obviously to save everyone five minutes a day, every day, we have to be working on something that everyone uses all the time. […] An extra hour spent every couple weeks debugging a problem due to confusing error messages or logs is equivalent to five minutes a day

Another more dramatic way we can influence people’s effectiveness is to help them stay in flow state. […] It’s generally thought that it takes about fifteen minutes to get into flow and only an instant to lose it, if you are interrupted.

We know from Dune that fear is the mind killer. So how does fear manifest in the context of software development. I would say tech debt.

As with financial debt, a small amount, taken on with eyes wide open, can be a good thing. But also like financial debt, it compounds over time.

Engineering Effectiveness teams often have an intimate relationship with tech debt because it so often piles up in tooling: until you have people whose job it is to work on them, tools tend to be something hacked together just well enough to get something done. The good news about that, is after you’ve cleared the accumulated tech debt out of your tools, your team will be well positioned to help other teams tackle their own tech debt, which will lead to really massive gains in effectiveness.

Finally there’s a psychological aspect to providing good tools to engineers that I have to believe has a really good impact on people’s overall effectiveness. On one hand, good tools are just a pleasure to work with. On that basis alone, we should provide good tools for the same reason so many companies provided awesome food to their employees: it just makes coming to work every day that much more of a pleasure. But good tools play another important role: because the tools we use are themselves software, and we all spend all day writing software, having to do so with bad tools has this corrosive psychological effect of suggesting that maybe we don’t actually know how to write good software. Intellectually we may know that there are different groups working on internal tools than the main features of the product but if the tools you use get in your way or are obviously poorly engineered, it’s hard not to doubt your company’s overall competence.

Let’s build a model

Here’s a simple model for the total effectiveness of an engineering org:

E = ( eng - ee ) x (1 + (ee^s x b))

E is the total effectiveness of an org where eng is the total number of engineers, ee is the number of engineers devoted to an Engineering Effectiveness style team, b is the boost the first EE engineer gives to the remaining engineers' effectiveness, and s represents how each additional EE engineer scales the total productivity boost.

Assuming your total number of engineers is more or less given, the two interesting parameters to this model are the scaling factor, s, and the boost, b.

let’s look at some graphs of total effectiveness depending on how many of our engineers we devote to effectiveness work assuming s= 0.7 and b = 0.02.

for an engineering org of ten people, it’s in fact not worth it to devote any engineers to tooling […]. In such a small engineering org, individual engineers will probably automate things that are bugging them.

when you devote one engineer to EE work, you lose their work but you gain back a bit because that engineer is making the other nine more effective. The problem is with only nine other engineers, the benefit doesn’t add up to enough to make up for the lost work of the EE engineer

At one hundred engineers, with these parameters, the curve starts to bend more noticeably as we have enough engineers for the effectiveness gains to make up for the cost of a couple EE engineers. The model suggests we should devote two engineers to EE who will bring the total productivity up to 101 engineers worth, so a free engineers worth of work.

once we get to a thousand engineers, the small gains per engineer start to add up, even though each additional EE engineer is adding less and less of an effectiveness boost. If these parameters are right, for a thousand person engineering org we should devote over a quarter of our engineers–255–to engineering effectiveness

Weeding the garden

In order for engineering effectiveness engineers to be able to boost effectiveness across all of engineering, things need to be standardized. As we just saw, the big investments in engineering effectiveness work only starts to pay off when you are doing the work for lots of engineers. You may work for a 1,000 person engineering org but if the tool or process you’re working on is only used by a hundred of them, one hundred is the relevant number not 1,000.

This is where tearing out those 999 flowers by the roots comes in. Once your engineering org gets to be a certain size the benefits you can obtain by investing in making all your engineers slightly more productive start to swamp the slight gains that one team might get from doing things their own, slightly different way. During the “let a thousand flowers bloom” phase people will have planted all kinds of exotic blossoms, some of which are lovely and even well adapted to their local micro-climate; you need to be able to decide which ones are going to be first class, nurtured members of your garden and which ones are weeds.

Everyone would love for you to make some decisions about how to do things and bring some consistency, as long as you choose their current preferred way of doing things. On the other hand, when things are sufficiently chaotic, lots of people will be happy to have someone come in and make some decisions about how to do things and actually make those decisions work.

the good news is, each time you deliver some real improvement in quality, speed, and joy for your company’s engineers, you’ll earn a little more trust to make these decisions.

Your goal should be to pick the set of tools and processes you will support and support the heck out of them.

If you get behind, as Twitter did, you may need to invest even more to get your garden back in shape. When you’re behind and the “official” ways don’t really work well, you’ll get more flowers blooming because people and teams will need to find their own ways to get their work done.

once you get to the point where all the flowers you tend are awesome, people will use them. And if they don’t it will be because they have a real reason not to.

In other words, when your garden is tidy and well tended, if a pretty new volunteer sprouts up you don’t have to freak out because you’re afraid it’s going to overrun the garden. You can watch it grow and if it looks like it might be a valuable contribution to the garden, you can start to nurture it like the rest of your flowers.