Learning from Giants #59
How computers run programs, Slack's globally distributed networking setup, and four foundational principles of Design communication.
👋 Hi, this is Mathias, back from holidays 😎 with your weekly drop of the 1% best, most actionable, and timeless resources to grow as an engineering or product leader. Handpicked from the best authors and companies. Guaranteed 100% GPT-free content.
Did a friend send this to you? Subscribe to get these weekly drops directly in your inbox. Read the archive for even more great content. Also: I share these articles daily on LinkedIn.
Putting the "You" in CPU: a rabbit hole into how your computer runs programs
As modern software engineers, our day job happens a thousand abstraction layers above the CPU. Unless you're from a pure CS background, you've never bothered to learn how it works. Why should you?
Because it's cool, and it's beautifully simple!
"The one thing that surprised me over and over again while writing this article was how simple computers are."
That simple? Let's dive into it:
"The central processing unit (CPU) of a computer is in charge of computation. [...] The "instructions" that CPUs execute are just binary data: a byte or two to represent what instruction is being run (the opcode), followed by whatever data is needed to run the instruction."
"What we call machine code is nothing but a series of these binary instructions in a row. Assembly is a helpful syntax for reading and writing machine code that's easier for humans to read and write than raw bits; it is always compiled to the binary that your CPU knows how to read."
"The CPU always reads machine code directly from RAM, and code can't be run if it isn't loaded into RAM."
The CPU fetches instructions from RAM, executes them, and then moves on to fetch and execute more instructions. While it does that, it is in one of two modes: User or Kernel.
"In kernel mode, anything goes: the CPU is allowed to execute any supported instruction and access any memory. In user mode, only a subset of instructions is allowed, I/O and memory access is limited, and many CPU settings are locked."
Processors start in kernel mode and switch to user mode when executing non-OS code. User code can only access memory and secure instructions by securely switching back to kernel mode for a defined set of instructions: system calls.
"Programs typically use these syscalls by calling shared library functions. These wrap machine code [...] that transfers control to the OS kernel and switches rings. The kernel does its business and switches back to user mode and returns to the program code."
📗 Lexi Mattick's Putting the You in CPU is a long-form read on how computers run code, starting from the lowest system abstraction. What's unique about it is how clearly the author describes the different layers, and how, step-by-step, they build your understanding of computers' internals.
Slack’s resilient networking infrastructure
Maintaining high-availability SLAs for a real-time product with millions of customers globally is a complex networking challenge. Slack is an excellent example with relatively low latency requirements. The team has designed a redundant networking architecture that can sustain quite a lot of failure. And lucky us, they have written about it!
Slack hosts their central application services and databases in the us-east-1 AWS region. They're resilient to failure thanks to a multi-availability zone setup.
That means all requests, like new messages, flow through the US, wherever you're in the world. Not only does this create latency problems, but it can also harm availability if your network can't route requests to the US. So Slack has edge points-of-presence (PoPs) close to the users to ensure the requests enter Slack's network as soon as possible and cross the world in that faster, more resilient network.
"These edge PoPs sit closer to our users to reduce latency, improve performance, and connect them back to Slack's main region in the AWS us-east-1."
Even though they don't host the primary Slack services, edge PoPs each have a collection of caches, image proxies, and services to ensure all requests are handled as close to the user as possible.
How is traffic routed to and through these edge PoPs?
Everything happens at the DNS level through a collection of primary and backup domains with different rules to select the IP (so the data center) the DNS records point to.
Primary domains route to the geographically closest edge PoP that is up. Backup domains distribute the load between the three nearest PoPs to ensure clients can skip an unhealthy PoP after a few retries.
📗 Rafael Elvira's Traffic 101: Packets Mostly Flow describes Slack's distributed networking architecture that enables it to handle traffic gracefully during partial outage events. It's an interesting insight into how such a large company plans for the unexpected!
Design communication is a critical skill
The best designers excel at communication. And not being able to communicate "why" clearly is probably designers' most common issue.
It makes sense in your head; your solution is the best. How can you communicate that effectively to stakeholders?
"One of the most important skills for designers is the ability to communicate design intent. We need to help stakeholders understand why a design decision provides the best possible solution to a problem."
So, as with all crucial skills, it's good to go back to basics periodically. Here are the four communication foundations:
Product over process. Focus your communication on the product; show how you're changing it.
Explain the problems.
"Set the stage for what you're sharing by starting with the problems. Never assume that stakeholders have as much context as you do."
Link solutions to problems.
Explain why solutions matter.
"Stakeholders often need to understand the why so that they understand the value of what you're proposing. [...] With every decision you make, ask yourself — "how will this affect the user?""
Beyond these general principles, the golden rule is to tailor your communication to your stakeholders. People have very different interests and value different angles based on their roles and positions in the company.
📗 Kazdem Cattapan's Design communication is a critical skill is a very actionable read around communicating design decisions. Communication is essential to build trust with all stakeholders, and understanding how to do it effectively can 10x your impact.