Learning from Giants #31
Airbnb's solution to scattered payment data in a service-oriented world, the Perils of prudence for start-ups, and how SQLite scales read concurrency with the WAL.
👋 Hi, this is Mathias with your weekly drop of the 1% best, most actionable, and timeless resources to grow as an engineering or product leader. Handpicked from the best authors and companies.
Did a friend send this to you? Subscribe to get these weekly drops directly in your inbox. Read the archive for even more great content. Also: I share these articles daily on LinkedIn.
Unified Payments Data Read at Airbnb
When Airbnb moved from a monolith to service-oriented architecture (SOA), the engineers split data entities between many different services.
"Data was normalized and scattered across many payments domains according to each team's responsibilities. This subdivision of labor had an important side effect: presentation layers now often needed to integrate with multiple payments services to fetch all the required data."
What follows is the story of a critical step in a large company's SOA journey: creating domain-level abstractions.
When large SOA applications reach a critical size, engineering productivity comes to a halt. The reasons are often similar:
Simple tasks now require calling dozens of APIs.
API producers cannot update them for fear of breaking one of the (too) many clients.
The system performance could be better.
The solution: technical and conceptual abstraction into domains; here the "Payments" domain.
"Our first task was to unify the payments data read entry points. To accomplish this, we leveraged Viaduct, Airbnb's data-oriented service mesh, where clients query for the "entity" instead of needing to identify dozens of services and their APIs."
"Instead of making our clients deal with this complexity, we opted to hide payments internal details as much as possible by coming up with higher-level domain entities."
📗 Airbnb's Unified Payments Data Read article describes the issues the engineers faced when reaching that critical SOA application size and how they fixed it using that domain layer abstraction. I must end with a warning: do not add domain abstractions (or any abstraction) too early. The heavy build and maintenance costs will only be worth it if the problem really hurts.
The Perils of Prudence
Explained through the 1911 South Pole race between Scott and Amundsen.
In the early XXth century, no human had ever reached the Earth's South Pole. Two expeditions, a British and a Norwegian one, attempted this feat around the same time. The race was on...
I won't spoil the results if you haven't heard of these expeditions, but you should know that one team won by a large margin. How? They understood something that the others didn't.
"In polar exploration, moving fast reduces your risk. [...] Humans cannot survive unassisted on the Polar Plateau. It's too cold; the air is too thin; there's no source of food or fuel or drinkable water. When your supplies run out, you die."
"<Winner> had a single strategic goal: to get to the Pole and back as quickly as possible. He was happy to take tactical risks if they served this overall strategic goal."
See where this is going?
"By now, gentle reader, you will have realized that this essay is not about polar exploration at all; it's about technology start-ups. [...] Tech start-ups, like Scott and Amundsen, operate in conditions that are unknown, and unforgiving."
And like Scott and Amundsen, start-ups are "default-dead" situations, where "slowly but surely" is a lot bigger risk than "move fast and (occasionally) break things".
📗
's The Perils of Prudence is a wonderful read, not only because of the thrilling story of Scott vs. Amundsen but because of how good the analogy is. It shows the hidden risk of wanting the be super sure and checking all corners twice at the expense of speed. The winning team's story also shows that it's not about going unprepared. They were extremely prepared. It's about optimizing for speed.How the WAL enables SQLite to scale concurrent read/write
First, WAL stands for Write-Ahead-Log. Write transactions append data to that log before it's available for readers (i.e. committed), making it possible for readers to read the database in the meantime. But how does that work in practice?
"The WAL writes the new version of a page to another file and leaves the original page in-place in the main database file."
This other file is the ".db-wal" file.
"Every transaction that occurs will simply write the new version of changed pages to the end of the WAL file. This append-only approach gives us an interesting property. The state of the database can be reconstructed at any point in time simply by using the latest version of each page seen in the WAL starting from a given transaction."
For the WAL to have acceptable performance, the SQLite engineers have added two main features. First, the WAL is regularly reset to keep it from growing indefinitely: checkpoints. And second, SQLite keeps an index of pages in WAL to make data lookup performant.
📗 Ben Johnson's How SQLite Scales Read Concurrency is part of Ben's series on the internals of SQLite. This one details how the WAL works in practice and why it's almost always a good idea to enable WAL mode in your SQLite databases.