Learning from Giants #8
Security for start-ups, Incident management at Dropbox, Stripe's history, and FoundationDB.
👋 Hi, this is Mathias with your weekly drop of the 1% best, most actionable, and timeless resources to grow as an engineering or product leader. Handpicked from the best authors and companies.
Did a friend send this to you? Subscribe to get these weekly drops directly in your inbox. Read the archive for even more great content. Also: I share these articles daily on LinkedIn.
Where to Start on Security as a Start-Up
Depending on your experience, industry, and customers, Security will come up at different times in start-ups. But I know one thing: it won't be a pleasure.
And if it is, maybe you've been focusing on the wrong thing 🤔.
"You can't just hire a couple security engineers to shoulder this burden. [...] Security is part of the overall technical challenge you face."
Hiring a security team won't cut it; you need a security culture.
"An engineering organization with a few security engineers tacked on is doomed to failure, while an organization of engineers taught to respect security will be far better off."
When SOC-2, PCI-DSS, and other audits and certifications show up on your todo-list, you will need to upgrade your security culture with a lot of documentation.
Yes. Expect disappointment. The auditor will not care about how shiny your Security is. They will ask how often you review your Baseline Device Configuration, and if you have written the 100 other policies they require.
Where to start?
📗 Ryan McGeehan's Starting Up Security is a gentle introduction to early-stage Security. It lists a few points to keep in mind when thinking about the Security of your products, infrastructure, and employees. But above all, it introduces hundreds of essays Ryan has written over the past years about Security. From templates to start writing these policy documents to advanced risk management theory or analysis and links to post-mortems of the web's significant incidents. He has it all!
Lessons learned in Incident Management at Dropbox
Ever been informed of an outage by your customers or operations team?
Not something anyone is proud of, yet it happens at most companies. And when incidents start this way, what follows is probably not great.
Once you hit a few of these, it's time to consider setting up an incident management process.
"To simplify, we break this [incident management process] up into three overall phases:
Detection: The time it takes to identify an issue and alert a responder
Diagnosis: The time it takes for responders to root-cause an issue, and/or identify a resolution approach
Recovery: The time it takes to mitigate an issue for users once we have a resolution approach"
And if your contracts have SLAs, the clock starts ticking before detection!
"To stay within our SLA of 99.9% uptime, we must limit any down periods to roughly 43 minutes total per month. We set the bar even higher for ourselves internally, targeting 99.95% (21 minutes per month)."
21 minutes means there are some recoveries you won't have time to perform anyways. All incidents with a higher remediation time must be prevented. The remaining ones require automation, training, clear ownership, and playbooks.
📗 Dropbox's Lessons learned in incident management details the current process at Dropbox, how it evolved, and its main tools. These three 9s of reliability are not something to laugh about! Joey Beyda and Ross Delinger explain that 21 minutes per month means you cannot let anything up to improvisation.
A Deep Dive on Stripe’s history
/dev/payments
That's the initial name of one of our time's biggest and most inspiring private companies.
Hint: its founders are siblings and probably two of the most brilliant business people on the planet.
"In Patrick and John Collison, the payments company has a founding pair unusual in their intellect, rarer still in their ego-less vision. From that inspiration, the brothers have manifested an entire culture, characterized by long-term thinking, rapid execution, and the desire to build something meaningful. They've done so while constructing a remarkable business capable of dominating the internet's financial infrastructure even as it empowers millions of other merchants."
"All of which is to say that Stripe thinks like a civilization. It is well on its way to becoming one."
📗 The Generalist's Stripe: Thinking Like a Civilization may be a slight stretch from what I usually share, but I believe as a product or engineering leader, there are some stories you must know. Moreover, the essay is packed with anecdotes and details about Stripe's culture and how the Collison brothers see the world.
FoundationDB: A Distributed Unbundled Transactional Key-Value Store
"NewSQL: (Databases that) combine the flexibility and scalability of NoSQL architectures with the power of ACID transactions."
These constraints make them fascinating distributed systems.
One that is seeing a new wave of adoption in recent years (following its re-open-sourcing in 2018) is FoundationDB.
Most database companies build bundled systems that can speak and execute SQL, sometimes forcing an ACID model on databases that don't initially offer such guarantees. the FoundationDB team saw the lack of distributed transactional foundations as an opportunity.
"FoundationDB (FDB) was created in 2009 and gets its name from the focus on providing what we saw as the foundational set of building blocks required to build higher-level distributed systems.
"It is an ordered, transactional, key-value store natively supporting multi-key strictly serializable transactions across its entire keyspace."
It's the foundation for data storage systems, allowing such systems to get scalability and ACID transactions by default. FoundationDB does not force any other model on top.
"It (FDB) provides no structured semantics, no query language, data model or schema management, secondary indices or many other features one normally finds in a transactional database."
Instead, it leaves it up to the user to build stateless data-management "layers" on top. Apple built the "Record Layer" powering CloudKit, Snowflake their metadata storage, or Datadog their Event Store (Husky).
📗 The FoundationDB: A Distributed Unbundled Transactional Key Value Store paper written by the contributors is an excellent introduction to this fascinating system. If I introduced a few specificities above, there is much more to discover in the paper, starting with the deterministic simulation framework that the team built to test the system with a level of rigor that has rarely been matched.