Speakmac started as a reaction to WhisperFlow forcing a $12 per month upgrade after I crossed a 2,000 word cap. I did not want to pay that subscription, so the only option that made sense was to build my own tool.

Two questions pushed me forward: can a speech-to-text app really earn that much, and why should I have to pay a recurring fee to keep dictating? Building Speakmac for myself became the obvious answer.

My first mistake was targeting batch transcription instead of live transcription. I forked an OpenAI Whisper Turbo demo, wired a hotkey to start and stop recording, and planned to paste the result--despite knowing nothing about Swift, Xcode, or macOS development.

Coming from a React Native background, I wrestled with schemas, signing, and endless Swift build errors. Claude 3.5 Sonnet and other early models could help a little, but without understanding the language I was flying blind.

The 3 GB Whisper Turbo model made everything worse. My 2015 MacBook Pro overheated trying to load it, warmup alone took four seconds, and keeping the model resident in memory was unrealistic. Even my M3 MacBook only managed two to three second transcriptions after a painful load.

Instead of fixing the core flow I distracted myself with UI tweaks. Eventually I threw in the towel on local inference and latched onto ElevenLabs Scribe because its word error rate looked great on paper.

Integrating Scribe was an ordeal--Claude mangled the code until I validated the raw curl call myself--but once it worked the flow was simple: press hotkey, record, upload, wait for the batch response.

That success masked a deeper issue: I was calling a batch endpoint. Latency jumped from two seconds to nearly ten depending on the time of day, and I had no metrics to pinpoint whether the bottleneck lived in my app or their API.

OpenAI 4o Transcribe seemed promising because it allowed custom prompts, yet it regularly answered questions instead of transcribing. After cycling through frustration again I moved on to Deepgram.

Deepgram delivered the reliability I needed--quality text with consistently low latency--which finally let Speakmac compete with the $12 apps.

With transcription stable I chased an idea for coding workflows: a VS Code extension that saved highlighted code snippets alongside live dictation. I captured file paths, selections, and timestamps in a code snippet store so I could stitch everything together later.

The concept worked but felt over-engineered. Aligning snippet timestamps with Deepgram output was brittle and, by the time it functioned, the Android refactor it was meant for was already done. I scrapped the experiment.

Real progress arrived when I paid $200 for Claude Code. Opus could reason through Deepgram API requirements, audio sample rates, and stereo quirks in one shot. After months of Cursor-only work this felt like buying my first bike--I could finally build quickly again.

Confidence grew and I doubled down on a truly native macOS app. Despite juggling a day job, I spent a few hours each week polishing the product because I personally relied on it.

I embraced SwiftUI and native components to keep Speakmac lightweight--about 4 MB with minimal dependencies--and discovered how much friction disappears when you lean on the system design language.

Claude still made mistakes, frequently suggesting iOS-only APIs, so I documented the macOS context in a CLAUD.md file and kept iterating. Audio latency remained a battle, forcing me to research Core Audio, AVAudioEngine, and obscure configuration flags with help from Gemini 2.5 Pro.

The pivotal engineering decision was refactoring a 4,000 line god object into a clean MVVM structure. That transformation temporarily broke everything, but it let future AI-assisted changes follow solid patterns instead of spaghetti.

Gemini 2.5 Pro with long context windows turned into my architecture reviewer. I would paste the messy code, claim there was still an issue, and let it propose refinements until the modules felt cohesive.

Once the foundation stabilized, day-to-day improvements became joyful again. I kept shipping small tweaks, used the app constantly, and slowly realized that the only thing left was to launch.

A recent layoff ironically gave me the time to focus. With a six-month runway and an M3 MacBook in hand, I committed to building, marketing, and selling. Speakmac is now real, fast, and reliable--it simply needs to meet the people who will benefit from it.

The takeaway is simple: a $12 annoyance can spark a product that grants creative freedom. Building Speakmac taught me that the hardest part is rarely the code--it is having the conviction to ship.

This is my journey, told so that other builders see what is possible. Speakmac was born from frustration; now it is ready for the world.

Building Speakmac: A Solo Developer's Journey from Frustration to Freedom

Write faster with your voice

Other lab notes

Speakmac vs Willow Voice: Which Mac Dictation App Is Right for You?

Speakmac vs Wispr Flow: Mac Dictation Compared

Speakmac vs Superwhisper: Two Offline Dictation Apps Compared