Learning from Giants #29
An animated guide to Arrays, the Rate limiters protecting Stripe's APIs, and Data structures of databases.
👋 Hi, this is Mathias with your weekly drop of the 1% best, most actionable, and timeless resources to grow as an engineering or product leader. Handpicked from the best authors and companies.
Did a friend send this to you? Subscribe to get these weekly drops directly in your inbox. Read the archive for even more great content. Also: I share these articles daily on LinkedIn.
How do arrays work?
"The simple array is probably the most popular data structure in programming. It's a straightforward yet powerful tool — it lets you represent an ordered list of items with fast random access."
Basic array operations all map to memory calls and properties.
Memory for the array is first allocated, then array elements are accessed through a deterministic memory address. The complexity lies in making array size and contents dynamic.
An array can grow by filling more of its allocated memory. But once it's full, it must be copied over to a larger space. The first operation is almost free compared to the latter; this is what you must optimize.
"Adding more items to an array might mean we have to resize it under the hood. But as long as we resize correctly, we don't have to resize too often."
📗 Nanda Syahrasyad's How do arrays work? is an exceptionally well-illustrated and animated introduction to software arrays. It's the data structure we all use the most, so the read will be worth your while.
The rate limiters protecting Stripe's API
"Availability and reliability are paramount for all web applications and APIs. At Stripe, we've found that carefully implementing a few rate limiting strategies helps keep the API available for everyone."
What's rate limiting?
"A rate limiter is used to control the rate of traffic sent or received on the network. At Stripe, we operate 4 different types of limiters in production."
Rate limiters are patches to protect an API's availability in specific scenarios. These can be attacks, unwanted user behavior, or simply internal incidents. It's not surprising that Stripe has four solutions to these many scenarios.
First, the user traffic rate limiters; these ensure one user doesn't put pressure on the system that would degrade other users' experience by limiting to N requests per second or N concurrent requests. These rate limiters apply rules that do not depend on the system's status.
Yet during incidents, it may be required to drop some low-priority requests to keep serving 100% of the critical API traffic. This type of system-aware limiter is called a load shedder.
"A load shedder makes its decisions based on the whole state of the system rather than the user who is making the request."
Stripe uses two types of load shedders: fleet usage reservation and worker utilization.
📗 Stripe's Scaling your API with rate limiters is an insightful look into how Stripe solves that problem. Paul Tarjan goes in great depth into the different types of rate limiters and even provides material and sample code to set up your own. Don't feel like you must have all four from day one though!
Data structures of databases
We software engineers often rely on databases to store and retrieve data efficiently. But have you ever stopped to think about the underlying data structures that make this possible?
"Most of us are familiar with the Swiss army knife of data structures, the hash map. It boasts O(1) lookup time which makes it an excellent choice for things like simple caching, de-duplicating records, and just about anything else we want."
Hash maps are a common and versatile data structure known for their fast lookup times. But their main limitation is that the index must be stored in memory. This makes them less suitable for larger datasets.
"SSTable stands for Sorted String Table. This differs from an append only log file by requiring the sequence of key-value pairs to be sorted."
SSTables offer several benefits for database indexing, including the ability to merge segments efficiently, support for reading from disk, and the ability to compress data. They are also well-suited for range queries, as the sorted nature of the data allows for faster scans.
"Similar to SSTables, B-Trees keep key-value pairs sorted by key for fast, efficient lookups, but beyond that they differ greatly. B-Trees break the database down into blocks or pages, reading and writing only one page at a time"
B-Trees are a data structure used in databases to store data on disk in a sorted and self-balancing manner. They allow for fast searches, sequential access, insertions, and deletions, making them a highly efficient choice for large datasets.
📗 In Data Structures of Databases, Jason Adam provides a detailed look at the various data structures used in database indexing. While such a topic requires more than an article to master, it'll introduce you to these critical building blocks of most of our world's databases.