Learning from Giants #42
Message queuing using only native PostgreSQL, The importance of a magical onboarding experience for your product, and using Canonical log lines at Stripe.
👋 Hi, this is Mathias with your weekly drop of the 1% best, most actionable, and timeless resources to grow as an engineering or product leader. Handpicked from the best authors and companies.
Did a friend send this to you? Subscribe to get these weekly drops directly in your inbox. Read the archive for even more great content. Also: I share these articles daily on LinkedIn.
Message queuing using native PostgreSQL
Your startup does not need Kafka, RabbitMQ, or any other job queuing system.
A few days ago, a Hackernews user explained how they got burned by unexpected client logic in their preferred queueing solution. After hours of troubleshooting, they discovered they did not understand how it worked, leading to the issue.
They used a complex distributed queue solution for a few jobs per second. But this post's goal isn't shaming that user. It's opening your mind about alternatives and how you would go about them.
The first alternative to using a message queue is to use your existing database as a queue. While it can seem daunting at first, this technique has many benefits:
You have complete control over the queueing mechanism and knobs: retries, prioritization, and sharding are just code.
You don't introduce another moving piece into your stack.
Transactions! Your job queue is now guaranteed to be atomic with your data updates, which gives you trivial "exactly once" processing.
Yet re-inventing the wheel can also get you in trouble. How do you split jobs between workers? How do you ensure jobs are only processed once?
If you're using Postgres, there are answers to all these questions. Great ones.
📗 David Christensen's Message Queuing Using Native PostgreSQL is a reference article. It explains how Postgres' primitives can be used to build a lock-free, work-distributing job queue on Postgres using plain SQL. A great read, and if you're using Postgres at your company, an actionable way to avoid yet another system!
The Day Zero Problem
Your day-zero experience shapes the customer's opinion of you forever. But how much do you prioritize it?
What follows will sound evident if you're building a B2C mobile game. Yet the rest of us struggle to prioritize onboarding improvements, especially when there's no direct link with churn.
The difference between a neutral and a magical onboarding experience is the difference between a 0 and 100 customer NPS.
So what makes a day zero experience magical?
"Extremely little effort required to get up and running."
"An almost immediate payoff for the user."
"An intuitive interface"
📗
's "The Day Zero Problem" is a necessary reminder for all product leaders. After a few years, do you still consider onboarding a priority?"Having your customers become evangelists is the best competitive advantage you can have in any market. Building a magical day zero experience is the key to unlocking that advantage, so why isn't it a priority for you?"
Canonical Log Lines at Stripe
Although it can feel trivial, logging can be hard to get right when you're just starting.
And since it feels trivial, large companies do not share much about their logging practices. Try asking Google. You'll get pages and pages of SEO-friendly nonsense.
Yet even great logging stacks at great companies are limited:
"Although logs offer additional flexibility [...], we're still left in a difficult situation if we want to query information across the lines in a trace."
And surprisingly, innovation still happens in the space!
"[...] an idea that we call canonical log lines. It's quite a simple technique: in addition to their normal log traces, requests also emit one long log line at the end that includes many of their key characteristics."
📗 Stripe's "Fast and flexible observability with canonical log lines describes how Stripe added a new tool to their observability stack. Brandur Leach explains the limitations of traces and regular logs. One of them is that simple request statistics can be very compute-intensive very fast because they require logs to be joined by trace_id to compute trace statistics. So Stripe implemented canonical log lines, which have been an enabling tool for engineers and even opened product opportunities.