Learning from Giants #51
Dynamic partitioning with Consistent Hash Rings, Ben Horowitz's take on Titles and Promotions, and Git's database internals.
đŸ‘‹ Hi, this is Mathias with your weekly drop of the 1% best, most actionable, and timeless resources to grow as an engineering or product leader. Handpicked from the best authors and companies. Guaranteed 100% GPT-free content.
Did a friend send this to you? Subscribe to get these weekly drops directly in your inbox. Read the archive for even more great content. Also: I share these articles daily on LinkedIn.
Consistent Hashing Explained
What do Discord, Amazon DynamoDB, Vimeo, and Netflix have in common?
Dynamic partitioning with Consistent Hashing.
Consistent hashing is a central part of many distributed systems. While it comes in many flavors, the problem to solve and the main idea is always the same.
System must scale horizontally to meet increased load: distribute the data or compute over multiple nodes.
Load is dynamic, and it's too costly to over-provision for peaks: the number of servers will change.
Clients need a deterministic way to identify which server to ask for data.
While many solutions can solve two of these constraints, getting all three requires a more complex setup.
Random assignment fails at 3: "The client cannot easily identify the node to retrieve the data due to random distribution."
A simple hash + modulo fails at 2: "The removal of a node breaks the existing mappings between the keys and nodes. The keys must be rehashed to restore mapping between keys and nodes."
A better solution is an extension of the hash technique that minimizes the amount of data moved around when a server is added or removed—a more consistent hash-based strategy.
đŸ’¡Consistent Hashing.
"Consistent hashing is a distributed systems technique that operates by assigning the data objects and nodes a position on a virtual ring structure (hash ring)."
"The key of the data object is hashed using the same hash function to locate the position of the key on the hash ring. The hash ring is traversed in the clockwise direction starting from the position of the key until a node is found. The data object is stored on the node that was found."
đŸ“— NK's Consistent Hashing Explained is a clearly illustrated introduction to consistent hashing. As you probably guessed, the technique must be adapted to specific situations. The article links to articles from Netflix, Discord, and Vimeo that explain how they use consistent hashing in real-world production systems.
Titles and Promotions
Titles and Promotions are a very cheap currency with immense perceived value, and thus a powerful tool. But they can also cause chaos if not managed carefully.
So as a leader, it's important to understand titles and promotions and the associated social dynamics.
Titles and promotions do not naturally converge toward the best situation for the business.
"One challenge is the Peter Principle. [...] It holds that in a hierarchy, members are promoted so long as they work competently. Sooner or later they are promoted to a position at which they are no longer competent (their "level of incompetence"), and there they remain being unable to earn further promotions."
"Another challenge is a phenomenon that I call The Law of Crappy People. For any title level in a large organization, the talent on that level will eventually converge to the crappiest person with the title."
Your job as a leader is to swim against the crappiness current constantly. Ben recommends doing so with a “properly constructed and disciplined promotion process". A company-wide process that levels titles across the organization and creates trust.
“You might think that so much time spent on promotions and titles places too much importance and focus on silly formalisms. The opposite is actually true. Without a well-thought out, disciplined process for titles and promotions, your employees will become obsessed with the resulting inequities."
đŸ“— Ben Horowitz needs no introduction. As a world-renowned business leader and advisor, he describes social dynamics around titles and how to best use them in a business. As always, there is no silver bullet, which he shows describing two opposite schools of thoughts around titles from Andreessen and Zuckerberg.
Git’s database internals
Git: the distributed database hidden in plain sight.
Most software engineers use Git to manage from the smallest to largest projects. And for most of us, it really is a black box with a set of commands.
Yet behind is a fascinating, highly-specialized storage system that enables excellent performance up to remarkable repository sizes.
Unlike traditional databases, Git isn't a server-side long-lived process. It also manages a minimal set of different objects and unstructured plaintext content blobs. Something that your standard PostgreSQL instance would struggle at.
"The most fundamental concepts in Git are Git objects. These are the "atoms" of your Git repository."
"The .git/objects directory is called the object store. It is a content-addressable data store, meaning that we can retrieve the contents of an object by providing a hash of those contents."
The content itself lives in these objects, only identified by their hash. Another layer gives access to that content: references.
"Git has references that allow you to create named pointers to keys in the object database."
That's it: objects and references are most of Git's storage. Simple? Not really...
"However, it does not take many objects before it is infeasible to store an entire Git repository using only loose objects. Not only does it strain the filesystem to have so many files, it is also inefficient when storing many versions of the same text file."
Neither your remote repository nor your machine will love duplicating your codebase for every commit. So Git compresses content in many different ways.
đŸ“—
's Git's database internals is a piece-by-piece unpacking of Git's storage. The system is a unique solution to a fairly unique set of problems, making for an interesting case study.
I didnt know about consistent hashing, super interesting!