Learning from Giants #36

Caching architectures and maintaining cache consistency, Long-term product vision and strategy, and Bloom filters in production at Amplitude.

Mar 11, 2023

👋 Hi, this is Mathias with your weekly drop of the 1% best, most actionable, and timeless resources to grow as an engineering or product leader. Handpicked from the best authors and companies.

Did a friend send this to you? Subscribe to get these weekly drops directly in your inbox. Read the archive for even more great content. Also: I share these articles daily on LinkedIn.

Different ways of caching and maintaining cache consistency

"There are only two hard things in Computer Science: cache invalidation and naming things." Phil Karlton.

And who best to write about caching than an engineer who has worked on Facebook's TAO, one of the largest caches in the world?

First, a definition.

"[A cache is] a separate system, which stores a materialized partial view of the underlying data store."

But why use a cache? There are many reasons, but most use cases fall into a single bucket: performance, defined by latency and/or resource use.

And if we're talking about cache invalidation, another necessary definition is cache consistency:

"[A cache is consistent if] eventually the value of key k should be the same as the underlying data store, if k exits in cache."

Depending on the use case, you may have different consistency and scale requirements, and every combination has a caching architecture solution:

"Look-aside cache: a client will query cache first before querying the data store. If it's a HIT, it will return the value in cache. If it's a MISS, it will return the value from data store."
"Write-through / read-through cache." The cache stands between the client and the database and handles data synchronization to and from the database.
"Write-back / memory-only cache." A performance trade-off for durability, where the cache acknowledges the write before ensuring it's committed to the data store.

📗 Lu Pan's Different ways of caching and maintaining cache consistency is an excellent introduction to caching and its associated consistency problems. One of the key takeaways is that although caching can improve performance, it's important to be aware of potential permanent inconsistency issues that can arise if not handled correctly.

Read the full article on Lu Pan's blog

Ants & aliens: long-term product vision & strategy

What's your long-term product vision and strategy?

"There is no point in having a 5-year plan in this industry. With each step forward, the landscape you're walking on changes. So we have a pretty good idea of where we want to be in six months, and where we want to be in thirty years. And every six months we take another look at where we want to be in thirty years to plan out the next six months." Facebook's Little Red Book.

5-year plans are pointless because we always extrapolate on the safe side when thinking about the near future. Will the planet still be viable? It will be in as little as five years if it's viable today. Will we ride flying cars? We aren't today, so we won't in five years.

"But what if I ask you to imagine your product in thirty years? Something appealing happens when you contemplate that time horizon. It's so far into the future that the little details have to fall away."

The impact of the current state of things on a 30-year plan is almost meaningless. Instead, you must start from the essence of your business, its mission and vision, and consider how global trends impact them.

Need help figuring out where to start? Take these categories:

"Schwartz offers a useful framework for leading the discussion and contemplating the types of forces that will shape your future. He uses the handy acronym STEEP—Social, Technological, Economic, Environmental, and Political."

The toughest is technological changes because we'll fall back into the present extrapolation trap. Again, we should look at trends.

"This wide, fast-moving system of technology bends the culture subtly, but steadily, so it amplifies the following forces: Becoming, Cognifying, Flowing, Screening, Accessing, Sharing, Filtering, Remixing, Interacting, Tracking, Questioning, and then Beginning."

How does that impact your business?

"Once you have considered what these possible futures look like, you can form an opinion about where your product should go, which long-term trends you can't ignore, and against which trends you might need to hedge. It should also help you with every product manager's toughest challenge: deciding which things not to do. Which efforts just aren't important to the long-term yet distract you from what does matter?"

📗 Ken Norton's Ants & Aliens: Long-Term Product Vision & Strategy is an excellent read on 30-year plans, why they're essential, and how to write them. The main idea you should remember is that by thinking far enough into the future, you can stop questioning if significant changes will happen but rather build a plan around them, with them, for your business.

Read the full article on Ken's blog

Bloom filters in production at Amplitude

Last week, I shared an excellent visual introduction to probabilistic filters. Learning takes repetition, so let's see Bloom filters in a real-world use case: data deduplication at scale.

Amplitude is a product analytics company. They ingest millions of events per second into their storage and querying infrastructure. Because of network or customer errors, there can be many duplicate events. That's a problem when your value prop is to have an accurate view of what customers do.

"To address this issue we recommended sending a unique insert identifier with each ingested event. We do deduplication of any subsequent events sent (within the last 7 days) with the same insert identifier."

Storing each identifier in a database poses huge operational issues at this scale.

"Bloom filter seems to fit the use-case very well, we want a quick way to get an answer on the presence or absence of specific data item (without even storing the item). We can pick the probability values such that the results remain accurate."

The perfect fit.

"Now, if we round this to 20 million events per day per partition, we will need a bloom filter of size ~108Mb per partition with false positive probability of 1 in billion."

The architecture of Amplitude’s distributed bloom filter system. Source: Amplitude.

📗 Gurminder Singh's Deduplication at Scale introduces a production deployment of a perfect use case for Bloom filters. Deploying Bloom filters at scale requires maintaining the integrity and resilience of the filter itself and planning for quick recovery in case of failure, which the article explains well.

Read the full article

Standing on the Shoulders of Giants

Discussion about this post