Learning from Giants #19
Understanding SQL JOINs, Three Feature Buckets for product planning, Engineering management 101, A large database migration at Stripe, and an introduction to Edge compute.
👋 Hi, this is Mathias with your weekly drop of the 1% best, most actionable, and timeless resources to grow as an engineering or product leader. Handpicked from the best authors and companies.
Did a friend send this to you? Subscribe to get these weekly drops directly in your inbox. Read the archive for even more great content. Also: I share these articles daily on LinkedIn.
A Visual Explanation of SQL JOINs
Most people in tech learn SQL by doing and making heavy use of Google and Stackoverflow.
Yet sometimes, it's worth it to take a minute to really understand concepts.
Take JOINs for instance; how much do you know about:
INNER JOIN
FULL OUTER JOIN
LEFT OUTER JOIN
CROSS JOIN
Let's learn about them through Venn diagrams.
▶︎ "Since SQL joins appear to be set-based, the use of Venn diagrams to explain them seems, at first blush, to be a natural fit."
📗 Jeff Atwood's A Visual Explanation of SQL Joins is a super accessible article from Stackoverflow's founder. If you have first learned about Joins on Stackoverflow, here's your opportunity to close the loop.
Guide to Product Planning: Three Feature Buckets
Product Managers: these Three Feature Buckets can help you reflect on strategy.
"Metrics Movers". These features should move your company's North star metric or OKRs significantly.
“Customer Requests”
“Customer Delight”
"I’ve found that categorizing features into these buckets forces product teams to be intellectually honest with why they are implementing a certain feature."
It will also help you identify under-represented buckets and fix your product execution. Or make sure your releases have a little bit of each.
📗 Adam Nash's Guide to Product Planning: Three Feature Buckets describes a technique that looks simple but will teach you something about your current strategy. Try it, and let me know what you discover.
Engineering Management 101 for new EMs (or Tech Leads)
"I wrote this for my tech leads who needed a crash course in engineering management. The advice you see in this article is the advice I used to help level up my team leads."
If you work in a modern tech organization, after a few years of experience, you will have to pick either the individual contributor or management path. While this won't be an irreversible choice, choosing the management path will expose you to new problems and required skills.
There's a lot to learn when starting in eng management.
"When I first became an engineering manager, my CTO gave me this analogy to think of engineering management skills like knobs on a guitar amp. For EMs, there are four dials: people, process, product, and prowess."
📗 Adam Conrad's Technical Lead Management is an actionable introduction to engineering management by a Meta EM. While it starts as a guide for Tech Leads to learn the management side of their job, the advice applies to all engineering managers.
Here are a few handpicked, precious quotes:
"Successful engineering management is all about maximizing the effectiveness of your people. If you take nothing else away from this guide, know that simply caring about your people will get you 80% of the way there."
"Delegation is the hardest part about being a TLM (or any manager for that matter). You are naturally going to want to do what you're comfortable doing: coding. Fortunately for you, coding is no longer your highest value skill."
"For TLMs, coding, tech leadership, and review should still be anywhere from 50 to 80% of your duties [...]. Do whatever you can to keep that number above 0%, even if that means coding after work. Why? No one wants to work for someone who can't remember what it's like to be in the trenches."\
Executing a large database migration in small, meticulous steps at Stripe
The problem is quite simple: a team at Stripe wants to move some data fields from two tables to a new database table. Millions of companies use their services and cannot afford any outage or discrepancy (it's legal KYC data).
They couldn't just run a SQL query and call it a day.
"During the time we are migrating all of this data, we are going to have ten thousand bajillion pesky users wanting to read and write new information."
So they broke this migration into small, independent, and safe steps that engineers could individually roll back at any moment. They did so by doing double writes, then switching over reads before decommissioning the old reads and writes. And because it's ruby, they threw a little bit of meta-programming magic in there because why would you be using Ruby otherwise?
"This was conceptually relatively simple, but the devil and the ability to sleep at night is in the details."
📗 Robert Heaton's Migrating bajillions of database records at Stripe is a detailed tale of this multi-step happening at Stripe's scale. While it doesn't mean we should all be that cautious when migrating data, it's a good reminder that there are always alternatives to large, risky, one-shot migrations. And usually, these alternatives involve shadow traffic, double-writes, and feature flags.
"If you ever find yourself writing a single, enormous yolo-pull-request to migrate a very large amount of anything, think hard about whether it is possible to make the moment of deploying less pant-wettingly terrifying."
What is edge compute, and what problems does it solve?
If you're reading software articles, you may have come across the idea of edge computing quite a few times. Spoiler: it's a lot about performance.
"When it comes to speed, there are three major factors:
Distance a request and response has to travel (aka, latency).
Download size for a response to be parse and executed.
Device capabilities based on the hardware, software, and available resources."
How does Edge computing solve all three?
Latency is a "speed-of-light" limitation we can only solve by moving the content closer.
"To solve this latency problem, very smart folks came up with the idea of deploying multiple copies of a program and distributing it around the world. When a user makes a request, it can be handled by the closest copy, thus reducing the distance traveled and the time spent in transit."
CDNs do that, but they're "edge", not "compute", because they serve static content.
"A Content Delivery Network (CDN) is a network of globally distributed servers designed to deliver static assets like CSS, JavaScript, images, fonts, etc."
Download size can be solved by sending just what the user needs, but this requires adding "compute" logic to the CDN.
Device capabilities can be bypassed by moving the computation to the server. The difference could be imperceptible if it's close enough to the user.
"Edge compute is a programmable runtime (like cloud functions) that are globally distributed (like a CDN)."
📗 Austin Gil's What is Edge Compute? It's kind of like knitting dog hats makes the funny analogy of selling knitted dog hats to explain the different components of the "Edge" and "Edge computing". It will only grow more important with time, so now is an excellent time to understand what it means.