Learning from Giants #13
Idempotency lessons from Stripe, Engineering levels beyond scope, Instacart's in-store picking optimisation, Spotify's thoughtful execution framework, and Jepsen's consistency models guide.
👋 Hi, this is Mathias with your weekly drop of the 1% best, most actionable, and timeless resources to grow as an engineering or product leader. Handpicked from the best authors and companies.
Did a friend send this to you? Subscribe to get these weekly drops directly in your inbox. Read the archive for even more great content. Also: I share these articles daily on LinkedIn.
Implementing Stripe-like Idempotency Keys in Postgres
I woke up one morning with hundreds of alerts in our backend. Stripe had shipped a breaking change without telling their customers for the 1000th time. And our backend kept retrying requests.
Still, this had zero impact on our customers. The reason: used the right way, all Stripe endpoints are idempotent.
So I figured that was an excellent subject to cover this week!
"An idempotent endpoint is one that can be called any number of times while guaranteeing that the side effects will occur only once."
Idempotency is a powerful concept for APIs, it guards against so many things that can go wrong in any system with side effects.
"After we reach the scale of millions of API calls a day, basic probability will dictate that we’ll be seeing these sorts of things happening all the time."
📗 Brandur Leach's Implementing Stripe-like Idempotency Keys in Postgres is a well-illustrated introduction to idempotency keys in the real world. It starts by explaining the problem, all the way to proposing an almost production-ready solution. What's interesting is that idempotency isn't just about caching a request/response, but also about ensuring that all mutations happening in between are idempotent.
Engineering Levels at Honeycomb: Avoiding the Scope Trap
All tech companies end up building an engineering ladder. It makes sense for engineers to have a clear path forward. When they do, they often are heavily inspired by the dozens of publicly shared ladders.
That’s what Honeycomb did at first, but soon they started seeing gaps and small unsettling details. The biggest one? The importance that they gave to growing scope compared to other criteria.
“A ladder that conflates advancement with scope is a ladder that only rewards engineers who work on the largest projects. This doesn’t seem fair.”
“More often than not, scope is the enemy. We would rather reward engineers who find clever ways to limit scope by decomposing problems in both time and size.”
📗 Honeycomb’s Engineering Levels: Avoid the Scope Trap is an eye-opening post on how the company defined engineer career growth. It’s great because Ben Darfler explains their current solution's problems and how they fixed them. Super actionable and relatable.
The main result is a two-dimensional ladder, adding ownership to balance the impact of scope.
“Scope progresses from focusing on tasks to features to projects, products, and the company as a whole.”
“Ownership progresses from focusing on the execution of work to the process of delivering work, to the discovery of solutions to deliver, to the discovery of problems to solve.”
Driving In-Store Picking Efficiencies at Instacart
How Instacart solved the in-store picking routing problem with partial item location data, heterogeneous data sources, and a pragmatic problem-solving mindset.
Give this problem to most data and engineering teams, and you will get a monstrous, ML-based, case-by-case data parsing solution that on-field users hate or ignore. That’s why this article, while seemingly simple, is an inspiring lesson.
Back to the problem: Instacart has hundreds of partner retailers, all of which have a different store management system and thus diverse item location data formats. Their job: build an optimized in-store routing application for their pickers, i.e., sort a shopping list in the best order.
“Unpacking this data and creating efficient sorts from it is a complicated task involving multiple heuristics and algorithms. The cost associated with building and maintaining the associated pipelines is not trivial.”
So instead of going down this rabbit hole, the team took a step back and looked at what cleaner data they could leverage.
“While product category data is extensible, we lack precise information regarding where within the store the product resides. To estimate this, we measure the amount of time elapsed between when certain products were acquired by an Instacart shopper.”
“The resulting distance matrix is our “source of truth,” and is routinely updated.”
A simpler and more elegant solution. But that’s not all. Sometimes algorithms can’t beat field experience, so the pragmatic team added a way for store managers to override the algo’s output manually. That solves day-to-day pains for pickers and gives invaluable live feedback to engineers.
“By adding the capacity for manual sorting to our’ toolbox,’ Instacart can simultaneously empower innovation while continuing to optimize solutions grounded in ML.”
📗 Instacart’s How Hundreds of Retailers Use the Instacart Partner Platform to Drive In-Store Efficiencies is a real gem because of that pragmatic approach that is so rare in mixed data & engineering teams. It reminds all of us in the field to take a step back before going for highly complex solutions.
Spotify’s Thoughtful Execution Framework
“After a business goal was set, teams felt tempted to quickly jump into generating ideas on how to reach it.”
Jumping to a solution just feels so good. It’s a classic bias of even the greatest problem solvers. And most of the time, you can get away with it.
“If you go from a goal directly to a single solution, and the solution doesn’t work, it’s really hard to backtrack why.”
How do you keep autonomous teams, but support them to avoid that trap?
“We realized that we should remind our teams of the necessary steps in a thoughtful product development process.”
“Thoughtful Execution invites you to leverage data and insights in a way that leads to identifying multiple problems or opportunities that could be solved, and advocates for going wide in hypothesis generation and design explorations before zooming into a single solution.”
📗 Spotify’s Thoughtful Execution Framework is a step-by-step process to go from a Goal to a Plan documented by Annina Koskinen, Principal Designer. A powerful tool given to teams to turn solution-oriented thinking into a tree of goal, data & insights, problems & opportunities, hypotheses, solutions, and learnings. That higher-level thinking makes failure a milestone rather than a “back to square one” situation.
Datastore Consistency Models
Consistency is one of the most overused, least understood concepts in databases and distributed systems. The C in ACID, and C in CAP. Central to many database marketing pitches.
But consistency is not a boolean feature. Instead, it’s a scale from nothing to “Strict Serializable”.
The weakest is read uncommitted.
"Read uncommitted is a consistency model which prohibits dirty writes, where two transactions modify the same object concurrently before committing." (Note: it doesn't prevent dirty reads).
And the strongest is strict serializability.
"Strict serializability means that operations appear to have occurred in some order, consistent with the real-time ordering of those operations; e.g. if operation A completes before operation B begins, then A should appear to precede B in the serialization order."
And depending on where a database is on that scale, you have radically different guarantees and thus viable use cases.
This was the starting point of Jepsen, an independent consulting firm that assesses the level of consistency of databases.
Jepsen made headlines on multiple occasions, including when it debunked MongoDB 4.2.6’s “full ACID transactions” and found numerous crucial consistency issues.
📗 Jepsen’s Consistency Models series is a clickable map of the different consistency models and their relationships, from strongest to weakest. Click on the boxes to read what they mean, and you’ll see that consistency is not that simple!