Learning from Giants #7
Everything you need to know about Database Indexes, UUIDs, Defining your product strategy, Architecture of LinkedIn's Data Governance system, and Architecture decision records.
👋 Hi, this is Mathias with your weekly drop of the 1% best, most actionable, and timeless resources to grow as an engineering or product leader. Handpicked from the best authors and companies.
Did a friend send this to you? Subscribe to get these weekly drops directly in your inbox. Read the archive for even more great content. Also: I share these articles daily on LinkedIn.
Use The Index Luke
☠️ Full Table Scan ☠️
Large sequential scans are the one thing you generally want to avoid when querying a database.
The first principle most of us learned when dealing with SQL databases is to use indexes to prevent full table scans But once things get a little more complex, we're back to my (now repeated) saying. It's hard to build a career treating the database as a black box.
How do indexes play with composite primary keys or with the LIKE operator? Can they be combined? Can JOIN clauses use indexes?
📗 Markus Winand's Use the Index, Luke! is the free web edition of the SQL Performance Explained book. It's a comprehensive introduction to relational database indexes. It will answer all these questions and more, with detailed examples of why some approaches will work better than others. It is a reference that you should have read at least once and will probably revisit several times a year.
Defining your Product Strategy
✨🔭 Product Strategy 🔭✨
Most early-stage companies just have to focus on delighting users by solving a problem for them. But once that problem is solved, a question often arises: what next?
In other words: what's our Product Strategy for the coming years?
"Product strategy answers the question, "How will your product delight customers, in hard to copy, margin-enhancing ways?""
Answering that question will be the entry point to your strategy-defining journey. Then you will learn to turn these answers into an actionable Product Strategy.
📗 Gibson Biddle's How to Define Your Product Strategy is a step-by-step guide to defining your product strategy, illustrated with Netflix (where Gibson was VP of Product).
Data Governance at LinkedIn with DataHub
Datasets are like servers.
They scaled so much in volume in recent years that we have to treat them as cattle, not pets.
And even with the best naming conventions, field names are just field names. If you want to validate, anonymize fields, or create any generic process, you need dataset metadata to be up-to-date.
And like code documentation, the further it is from the production code, the less likely dataset metadata will stay up-to-date. So naturally, teams tend to "Shift (metadata) left" i.e. push annotations to live with the production source of truth: the data schema.
"Let us suggest, instead of thinking about governance and annotation as an activity that happens post-hoc, embedding annotations directly into the schemas of our datasets as they are being created and updated."
📗 LinkedIn's Shifting left on governance: DataHub and schema annotations is a quick read detailing how LinkedIn moved dataset metadata from a central data-owned hub to a distributed schema annotation method. Generally, it follows a trend of "IDL-driven development" that you shouldn't miss.
Writing Architecture Decision Records at Spotify
Do you make many code review comments like "The correct way to do X is Y"? Are you sometimes unable to explain why?
Your team needs ADRs!
As software projects grow, as does the number of decisions. After a few years, the people and context change, but these decisions stay. The cost of undocumented decisions only increases with time.
"An Architecture Decision Record (ADR) is a document that captures a decision, including the context of how the decision was made and the consequences of adopting the decision."
"An ADR should be written whenever a decision of significant impact is made; it is up to each team to align on what defines a significant impact."
📗 Spotify's When Should I Write an Architecture Decision Record is a short, actionable article discussing the benefits of using ADRs and specific use cases. Josef Blake explains ADRs can be written for small and large, present and past decisions. It's never too late, and almost always a good idea to write an ADR. Not actionable enough for you? The article ends with a decision diagram as a visual summary.
If you want a public example of Spotify teams using ADRs, read Backstage's (a Spotify-led open-source product) decision records.
Everything you need to know about UUIDs
When using UUIDs, many software engineers just use the default version their framework or library provides.
We encounter UUIDs daily in software, yet do we know enough about them? Did you know there are officially five different types of UUIDs, eight if you count recently proposed ones?
All UUIDs represent a specific use case, from entirely random (v4) to content-aware ids and time-ordered ones.
And with storage systems evolving into large distributed systems, new constraints like database locality arise. These new constraints lead to updated UUID versions. For instance, the proposed UUID v6 is a replacement for UUID v1.
📗 Cockroach Labs' What is a UUID is a brief justification of UUIDs and an overview of the different types and use-cases of such identifiers. If you have read that and not learned enough, you can read the IETF draft for new UUID formats.
PS: I'll add one tip: there is a reason your database has a built-in UUID type. It's to prevent you from storing it as a string. You'll more than double the storage space if you store the 36 chars string instead of a 16 bytes UUID.