Learning from Giants #48
An introduction to Precision & Recall, Fly.io's custom workload Scheduler, and Learning as the meta-skill for accelerating impact.
👋 Hi, this is Mathias with your weekly drop of the 1% best, most actionable, and timeless resources to grow as an engineering or product leader. Handpicked from the best authors and companies. Guaranteed 100% OpenAI-free content.
Did a friend send this to you? Subscribe to get these weekly drops directly in your inbox. Read the archive for even more great content. Also: I share these articles daily on LinkedIn.
Precision & Recall
"Nearly all detection systems are probabilistic."
No system can be 100% sure of whatever they detect. It's up to the builders or operators of such systems to decide the operating point: the limit between true and false.
A trade-off of precision vs. recall.
"In the ideal state you want to be in a state such that:"
Everything you detect is bad (high true positives, low false positives) we call this precision.
You detect all the bad stuff (high true positives, low false negatives) we call this recall.
It takes a few discussions to get the concepts right, but once you get them, you'll see how the world is just a bunch of precision vs. recall trade-offs.
Beyond the glossary, how should you reason about the precision vs. recall trade-off?
Plot the precision-recall curve of your detection system. It'll help to visualize the trade-off.
Pick your operating point somewhere on the curve.
"In general terms, high-recall low-precision is favoured when the cost of acting is low, but the cost of not-acting is high. And vice versa." Think fire-detection vs Justice systems.
Improve your algorithm by focusing on improving precision or recall, or on specific segments that may have a higher (or lower) cost of acting or not acting.
📗 Simon Cross's Precision & Recall is an excellent introduction to detection systems and precision vs. recall trade-offs. Such problems are best tackled with a structured approach, and the article gives you a great starting point. Thank you, Simon, for writing "the article I wish I could have sent myself back in 2018"!
Carving the Scheduler out of Fly.io’s orchestrator
"Orchestrators link clusters of worker servers together and offer up an API to run jobs on them."
"Scheduling means deciding which worker to run each task on."
Fly.io is a simple, micro-VM-based (edge) hosting platform. You can use them to deploy apps globally from a single Docker container.
They run a global cluster of machines with a common network overlay and run your workloads on it, where and when you want them. So orchestrating and scheduling these workloads on servers is pretty much their main job. And they started simple!
"For the year following our launch, Fly.io's platform was a Rust proxy and a Golang Nomad driver. The driver could check out a Docker image, convert it to a block device, and start Firecracker on it."
While Nomad was the perfect tool for them to start, their use case slowly diverged from Nomad's.
"Bin packing is wrong for platforms like Fly.io. [...] We run one global cluster [when Nomad's model is smaller federated cluster]. We outgrew the orchestration model."
So Fly built their own scheduler, specialized to their specific need: picking a server to run a workload in a specific region, in a very short (real-time) timeframe.
📗 Thomas Ptacek's Carving the Scheduler Out of Our Orchestrator tells a brief history of orchestrators and schedulers, from Google's Borg to what became Hashicorp's Nomad, before explaining how Fly outgrew Nomad and built their own scheduler. Thomas then details how Fly's distributed scheduler works, an interesting peek!
Learning: the meta-skill for accelerating impact
"Your ability to learn new skills will expand the breadth of your impact dramatically."
A good part of growing as a professional is learning. Learning is a core skill that you'll rarely be directly evaluated against. A hidden force multiplier.
What does learning mean? How can you identify your gaps?
Learning is about knowing what you know and what you don't."
Knowing and clearly communicating your skills and where you can stretch enables you to spend as much time as possible on projects that get you to that optimal balance [of repetition and learning], creating a virtuous cycle of growth and impact."
That doesn't mean you have to be an expert at everything. Knowing something exists is often enough to get you started on the right path.
Asking effective questions accelerates your progress"
There's a lot to say and learn about when and how to ask questions. The earlier in your career, the more the 'when' matters. The 'how' stays crucial your entire career.
"When you ask questions, make sure you've done some research first, and be very specific about what you know and what you don't"
"3. Growing your technical skills creates new opportunities for impact"
Learning must be intentional. You must have a plan to learn consistently and fast. But keep in mind it will be extra effective if you can get some dopamine out of it!
"Think critically about how you learn best. Are you a reader, a watcher, a doer, a talk-it-througher, or a combination of several of those? Know how you learn best and lean into those modalities."
"4. Teaching others multiplies your impact"
📗 Caitlin Moorman's Learning: The Meta-Skill for Accelerating Impact is a crucial article for all tech professionals. In each section, Caitlin details what underdeveloped and highly developed skills look like and tips for developing them, to help you figure out your path.