Learning from Giants #47
How Meta turned MySQL into a Raft-powered distributed system, Metric systems to structure your approach to growth metrics, and the multiple layers of abstraction of Encore, Spotify's design system.
👋 Hi, this is Mathias with your weekly drop of the 1% best, most actionable, and timeless resources to grow as an engineering or product leader. Handpicked from the best authors and companies. Guaranteed 100% OpenAI-free content.
Did a friend send this to you? Subscribe to get these weekly drops directly in your inbox. Read the archive for even more great content. Also: I share these articles daily on LinkedIn.
Building and deploying MySQL Raft at Meta
"At Meta, we run one of the largest deployments of MySQL in the world. The deployment powers the social graph along with many other services, like Messaging, Ads, and Feed."
Millions of shards, petabytes of data, all in MySQL.
👇 Read the story of how the operational burden of replication led the team to embed Raft inside MySQL and turn it into a large distributed database.
If you've read last week's article about sharding, you understand why Meta has to shard MySQL. Sharding solves the data size issue, but it doesn't solve resiliency and replication. One large server cannot hold that much data. MySQL still is a one-server thing. It has built-in replication but lacks consistency guarantees.
One solution Meta has used for a few years is MySQL's "semi-sync" replication protocol that guarantees a transaction is committed to replicas before committing it on the primary server. While this works in practice, it was an operational nightmare because failover and crash recovery had to be orchestrated by an in-house control plane, not part of MySQL.
"To help guarantee safety and avoid data loss during the complex promotion and failover operations, several automation daemons and scripts would use locking, orchestration steps, a fencing mechanism, and SMC, a service discovery system."
For the next iteration, the team decided to embed that distributed consensus and leadership election aspect directly inside the MySQL logic.
"For this, we used the well-understood consensus protocol Raft. This also meant that the source of truth of membership and leadership moved inside the server (mysqld)."
Raft works with a concept of "replicated log" as a basis for distributed consensus. Meta's MySQL-Raft uses that log to consistently replicate the transaction binlog, which records all database modifications.
📗 Meta's Building and deploying MySQL Raft at Meta is a deep dive into how the team designed that new system and the contributions they had to write to make it happen. Beyond the MySQL-Raft system, Meta's globally replicated setup and extreme-scale operations make for unique challenges worth reading about.
Your metrics belong in a system
Picking core metrics, evolving them, and choosing what input metrics to follow can quickly become highly complex. And if you lack a systemic approach, it will inevitably become a mess.
The only concept you must understand is Growth Accounting.
"A metric moves from one point to another, over a period of time, because some things add to it, and some things subtract from it. Growth Accounting is about breaking down the different components that cause a move."
You can apply that to any core metric of your company to start building your system:
Figure out your core metrics.
Pick a cadence (daily, monthly, ...)
Do your growth accounting breakdown: identify the components that add or subtract to the metric.
"For monthly active users, there's:
New: Users that showed up for the first time this month and are Active"
Churn: Users that were Active in the past month but not this month"
Resurrected: Users that joined before, weren't active in the past month, but are now active"
Introduce metrics to make connections.
"Establish how different metrics relate and introduce any concepts needed to make connections. [...] If each core metric is like a currency, these are like your conversion rates."
Identify your blindspots and fill in the gaps.
Simplify with a small number of calculated metrics on top of the core metrics.
📗 Amogh Sarda's Your metrics belong in a system introduces Growth Accounting and the step-by-step guide to your company's own metrics system. If you want to go deeper, it links to many canonical resources around growth metrics that I highly recommend too!
Multiple layers of abstraction in Design Systems
Building your own design system is not just UX work. It's also a complex API design exercise, balancing the simplicity of use with the breadth of covered use cases.
An equilibrium of abstractions.
Spotify's Encore design system team faces this challenge, serving hundreds of designers and thousands of engineers.
"We have a lot of customers who all have different use cases to fulfill with the design system. When trying to meet many different needs at once, a "this versus that" approach simply won't do".
They realized one abstraction level couldn't cover all use cases. So they added three levels, all available on the same components.
"We want to provide as much utility as we can out of the box while still offering the opportunity to modify some aspects of the component or to construct your own flows out of individual pieces."
"Config: Passing in just data to props."
Configuration works best for most use cases: you give the data, and the design system handles everything.
"Slots: Passing in subcomponents to props."
By exposing sub-components used by the "Config" level and accepting ReactNode as input, Encore allows engineers to use this component slightly differently. In most cases, it gives enough customization.
"Custom: We provide just the base. Subcomponents are managed by the customer".
This abstraction gives the engineer complete control for complex use cases requiring great detail and control.
📗 Spotify's Multiple Layers of Abstraction in Design Systems describes how the Encore team reasons about the configurability and composability of their system's APIs. A foundational read if you're building reusable components.