Learning from Giants #67
Small change-lists make Google fast, AWS S3's incredible emergent properties at scale, an introduction to CRDTs, and a fantastic re-post: the Power of Product Thinking.
Hi! This is Mathias with your weekly drop of the 1% best, most actionable, and timeless resources to grow as an engineering or product leader. Handpicked from the best authors and companies. Guaranteed 100% GPT-free content.
Did a friend send this to you? Subscribe to get these weekly drops directly in your inbox. Read the archive for even more great content. Also, I share these articles daily on LinkedIn.
Small Change-Lists make Google fast
Google engineers' secret trick to ship 100x faster than the industry average.
Small change lists, really small ones, make a massive difference on a team's velocity. They're quicker, easier to visualize, better reviewed, and ultimately contain fewer bugs.
Why does everyone know about this rule, but most don't follow it?
They don't truly know what is "small"
"The right size for a CL is one self-contained change. [...] The CL makes a minimal change that addresses just one thing."
"There are no hard and fast rules about how large is "too large." 100 lines is usually a reasonable size for a CL, and 1000 lines is usually too large, but it's up to the judgment of your reviewer."
Yes, Google recommends 100-line changes. If you're experienced, you know that's truly small. Still, that limited size shouldn't make the changes harder to understand or split away tests.
The team isn't prioritizing reviews enough, and small CLs pile up
"You want to find some way to work that won't block you while you're waiting for review. This could involve having multiple projects to work on simultaneously, finding reviewers who agree to be immediately available, doing in-person reviews, pair programming, or splitting your CLs in a way that allows you to continue working immediately."
High-velocity teams usually use all the above: efficient tooling to stack CLs, strict review SLAs, and a lot of sync review and work.
They don't know how to split
"It's often useful to think about how to split and organize those CLs at a high level before diving into coding."
There are many ways to split work: by file, feature, or layer. Sometimes, it's harder to do, but there are some general rules you must always follow:
Separate refactorings. They're the hardest to review, so they should never be mixed up with new functionality.
CLs should include related test code.
Keep the CI green for every chunk.
📗 Google's Small CLs article is part of the company's published Code Review guidelines. It's small, super actionable, and probably the most impactful practice one can have on a team's velocity.
Building and operating a pretty big storage system called S3
AWS's S3 is a lot more than disk storage as a service. Its massive scale leads to emergent properties that no disk can match.
Andrew Warfield worked on S3 for many years – in a recent presentation, he explains how he understands and reasons about the system:
"S3 is an object storage service with an HTTP REST API. There is a frontend fleet with a REST API, a namespace service, a storage fleet that's full of hard disks, and a fleet that does background operations."
The main feat of S3 is managing a very spiky workload on cheap hard drives, all while enabling extreme I/O performance. The 20TB hard drives are incredibly cheap storage, but their I/O performance doesn't scale with size.
"This tension between HDDs growing in capacity but staying flat for performance is a central influence in S3's design. We need to scale the number of bytes we store by moving to the largest drives we can as aggressively as we can."
That's solved by splitting data into chunks and replicating it across hundreds of drives, which enables the distribution of load so widely that spiky workloads are averaged out in the massive number of requests.
"As I really started to understand the system I realized that it was the scale of customers and workloads using the system in aggregate that really allow it to be built differently, and building at this scale means that any one of those individual workloads is able to burst to a level of performance that just wouldn't be practical to build if they were building without this scale."
📗 Andy Warfield's Building and operating a pretty big storage system called S3 is a fascinating read about the unique scale of S3. While Andy makes it look simple, the data distribution is probably only one half of the problem, the write path. I imagine choosing which disk to read from and orchestrating these reads is at least as interesting!
An interactive guide to CRDTs
Every app has real-time collaboration now. Build yours with CRDTs!
And while you'll likely use an off-the-shelf solution (sponsors are welcome 😉), a little theory never hurts!
"CRDT stands for Conflict-free Replicated Data Type."
It's a kind of data structure that can be stored on different computers (peers). Peers may have different states at different points in time, but are guaranteed to eventually converge on a single agreed-upon state."
There's your real-time collaboration feature. Let's get a little more specific: there are two approaches to CRDTs: state-based or operation-based.
"State-based CRDTs transmit their full state between peers, and a new state is obtained by merging all the states together. Operation-based CRDTs transmit only the actions that users take, which can be used to calculate a new state."
As often happens in software, the best-looking solution is a lot more complex to get right. So, let's study the simplest first: state-based replication.
The theory behind state-based CRDTs is indeed simple. The peers exchange states to agree on a common one and rely on a merge function to apply others' changes to their local state. The CRDT works if this merge function satisfies the following mathematical properties: commutativity, associativity, and idempotency.
The magic is that because these properties are composable, you can build complex CRDTs by combining the simplest valid CRDT constructs.
One of these simple constructs is the Last Write Wins (LWW) register.
"Last Write Wins Registers, as the name suggests, simply overwrite their current value with the last value written. They determine which write occurred last using timestamps, represented here by integers that increment whenever the value is updated."
📗 Jake Lazaroff's An interactive guide to CRDTs then builds an LWW register using Typescript, and combines multiple ones into an LWW Map. All of this is illustrated with brilliant interactive examples. The smoothest introduction to CRDTs!
Top 0.1% re-post: The power of Product Thinking
If you're building a product, your job is to make hypotheses and guesses. Iterate and experiment, right?
Yet each of these iterations has a cost, so how do you avoid betting on guesswork?
💡 Product thinking.
"Product thinking is the skill of knowing what makes a product useful — and loved — by people. [...] Product thinking is a habit, an eye, a mindset."
In a fast-changing environment, product thinking can make the difference between 0 and 1. And lucky us, it can be developed!
"The two most important habits are observation and inquiry."
"Observation is about paying attention to people's reactions when they encounter products or services in their day-to-day lives."
"Inquiry comes from genuine curiosity about people and their behaviors, and can take different forms depending on how you learn best. The key is understanding the "why" behind the reactions."
📗
's The Power of Product Thinking is a reference article. After explaining that product thinking will make you and your products stand out, the author describes how to develop it with simple habits. With practice and curiosity, observation and inquiry become a defining mindset.