The next big thing will start out looking like a transcription tool
We’re quickly transitioning to a world where every professional conversation is transcribed. Context that once disappeared into thin air now gets captured. Every word, every thought, every insight - recorded and stored, to be queried later. Simply put, we’re beginning to outsource our long-term memory to machines.
We can thank the proliferation of transcription tools for this shift. From general-purpose products like Granola to vertically focused ones like Heidi Health and Abridge in healthcare, RemyAI and EliseAI in real estate, Attio and Claap in sales, and Model ML in financial services - these products help professionals seamlessly capture the full context of their day-to-day meetings.
The immediate value is clear: capture everything without losing focus on the person in front of you. But the real promise lies in what can - and inevitably will - be built on top of all this captured context: deep workflow automation, with transcription as the wedge. To borrow from Chris Dixon’s seminal post in 2010: I believe the next big thing will start out looking like a transcription tool.
After all, transcription models paired with LLMs form one of the most magical technological combinations of our time - one that will profoundly reshape how we work, and frankly, how we live.
So much of professional life follows the same pattern: we talk to others, capture what was said, and then act on it. A product manager reviews progress with engineers, then updates Jira. A sales rep pitches a customer, then logs the call in Salesforce. A recruiter speaks with a candidate, then updates the ATS.
These aren’t separate actions - they’re one continuous workstream. At its simplest, knowledge work comes down to two things: 1) transmitting and extracting information to and from others, and 2) taking action based on it.
Now, with real-time transcription that’s both accurate and inexpensive, we no longer have to guess what matters in the moment. We can capture everything passively, without distraction. That frees us to stay fully present in conversations, giving us the latitude to steer them where we want them to go. Transcription lets us outsource memory to machines, while machines rely on us for context. It’s a trade that works. Humans are becoming the interface through which machines understand the world.
This will reshape knowledge work. Increasingly, the core of our jobs will center around context capture. We’ll speak, and software will do the heavy lifting - updating systems, drafting and sending emails, scheduling follow-ups, preparing documents, and more. Some may see this as dystopian: a future where humans lose agency over action. But by offloading the repetitive and mechanical tasks to machines, we create space to focus on the uniquely human ones: building trust, demonstrating empathy, understanding people’s desires, and shaping outcomes.
As these tools become part of our daily rhythm at work, they’ll inevitably extend into our personal lives: doctor visits, therapy sessions, lectures, conversations with grandparents, partners… even ourselves. I believe any conversation worth remembering, personal or professional, will soon be fully indexed and searchable.
A few years ago, there was a surge of excitement around the idea of second brains: tools like Roam Research, Obsidian, and Logseq that aimed to help us organise our thoughts. Clear in purpose, they struggled to reach a broad audience. The truth is, they were too complex and too reliant on constant user upkeep to deliver lasting value. Their intuition for the need was right, but the timing was too early.
Transcription models paired with LLMs represent what those second brains should have been, had the technology existed. With that foundation now in place, we’re finally entering the true era of the second brain we were promised.
As with most disruptive innovations, they often start off looking simple - rudimentary, even. So is the case here. Transcription, while valuable to many, may seem like a commoditised utility. But I believe these utilities, increasingly cropping up, will form the foundation of entirely new operating systems that shape how we live and work, built on the bulk of context in the world: spoken language, captured at scale for the first time.