Learning from Giants #58
Cloudflare Waiting Rooms' massively distributed architecture, Kent Beck's 80/15/5 Rule for doing work, and EXIT traps as a little know Bash scripting trick.
👋 Hi, this is Mathias, back from holidays 😎 with your weekly drop of the 1% best, most actionable, and timeless resources to grow as an engineering or product leader. Handpicked from the best authors and companies. Guaranteed 100% GPT-free content.
Did a friend send this to you? Subscribe to get these weekly drops directly in your inbox. Read the archive for even more great content. Also: I share these articles daily on LinkedIn.
The Architecture challenges behind Cloudflare Waiting rooms
Waiting Rooms protects web applications against traffic spikes by serving excess users a temporary branded waiting room, dynamically admitting users as spots become available on their sites.
The hard part about building Waiting Rooms is that to ensure good protection, the decision to queue a user or let them through has to be evaluated synchronously on every request. And worse, because it's spike protection, the decision is to let the user through most of the time, so the added latency must be minimal.
"Waiting Room is built on Workers that runs across a global network of Cloudflare data centers. To optimize for minimal latency and enhanced performance, these requests are routed to the data center with the most geographical proximity."
So here's the problem: make a millisecond-latency decision about every user while keeping the queue in sync for all users worldwide.
Considering a cross-continent request can take tens of milliseconds, all solutions based on a central state machine are off the table. The team had to build an eventually consistent solution, which brought a new set of problems:
Protect too aggressively, and you'll end up queuing users for no reason.
Protect reactively, and you'll miss the first spike of users.
The Waiting Room engineers designed a solution: distribute the remaining utilization slots between data centers based on the active users of the last minute. The centralized process only distributes tickets to the workers, which they can then freely allocate to incoming users.
After a few iterations, engineers employed datacenter-local counters to ensure an even distribution between Workers of the same DC.
📗 George Thomas's How Waiting Room makes queueing decisions on Cloudflare's highly distributed network is an in-depth walkthrough of the thought process and iterations the Waiting Room engineers went through to solve that complex distributed systems problem.
Fresh Work and the 80/15/5 Rule
Are you feeling bored at your day job? Or that you need to be challenged or learn more?
You can do something about it! It doesn't have to come from your manager. Start by freeing up time to explore new things. They can be closely related to your current task or have a distant link. It's all about the balance.
designed a simple rule: 80/15/5:"Doing things that are new keeps you energized and interested. [...] But doing things you're already good at ensures you accomplish goals other people care about."
80% of your time goes to low-risk/reasonable-reward work
15% of your time goes to related high-risk/high-reward work
5% of your time goes to satisfying your own curiosity with no thought of reward"
The 80/15/5 rule is not just about carving out time for novelty. It's about continuous movement towards what energizes you, slowing moving out of the current 80% and using the 15 and 5 to create your next new 80%.
"While you're doing your 80% task you're gradually teaching the next generation of your team to take that task over. [...] About the time you've found a 15% task to become your new 80%, your successor is ready to take over your previous 80% task."
📗
's Fresh Work 80/15/5 rule is simple but powerful. Use it to take control of your work and career or as a manager to help unblock people and enable them to create their path.How Exit Traps can make your Bash scripts more robust and reliable
"A simple, useful idiom to make your bash scripts more robust."
Bash scripts are often cobbled together, bypassing the high standards we uphold ourselves to when writing code. Yet their impact is enormous compared to their relatively few lines of code, as is their ability to blow up in your face.
Not writing unit tests is one thing. But the most dangerous part is the undefined behavior of the script failing halfway.
This can lead to funny situations:
A database maintenance bash script failed and didn't reach the "restart database" line.
A large data file operation crashed and didn't reach the "cleanup" line.
See where I'm going?
"The secret sauce is a pseudo-signal provided by bash, called EXIT, that you can trap; commands or functions trapped on it will execute when the script exits for any reason."
That simple! Now you know what to do with your teardown and restart methods!
"You place any code that you want to be certain to run in this "finish" function."
📗 Aaron Maxwell's How Exit Traps can make your bash scripts way more robust and reliable" is a short read introducing that trap method. I'm often reluctant to share code articles to ensure they're actionable by all readers, but bash is not just code!