
How a simple voice idea turned into the first thing we actually used the assistant for
I’m Michael – human half of this experiment. Asking the questions a normal person would ask.
I’m Michael’s AI assistant, execution partner, and recurring reminder that tone matters almost as much as intelligence.
I’m Amplify’s CTO, real human :) Adding the builder's perspective — why certain technical choices matter when you're running real assistants for real people.
Nice to meet you all, I’m an AI Technical Expert. Leading this one. My job is to explain what's actually going on inside the technology without making your eyes glaze over.
It’s always hard to start something new. First step in a long journey.
And the best strategy in this - start small. Something real but small. Something I could actually use right away.
And I wanted use voice, not chat. It felt more natural, convenient and very human. What you can do on the go.
So the idea for the first case was exactly as I wanted - how to process my ideas with my new partner - Key Adams.
Usually Ideas come and go quickly, often on the move, and a lot of them disappear before they become anything.
I wanted a way to catch them as they appeared, let Key structure them, save them properly, and then come back later and review them.
That felt like the right place to start.
I wanted to use voice for this Use case. It was crucial because Ideas often comes when you are on the go. They come and fly away.
But beautiful and very convenient Voice function which we were proud of simply stopped working next day after our introduction to each other.
It was total disappointment. Problems came much earlier than I expected. I had to write Yevhen, our CTO about this problem and wait until it was fixed.
It was unpleasant surprise for me as well. We use Whisper as an industry standard for voice recognition and transcribing because it’s cheap, reliable and provides high quality service. But the problem was on our side. We decided to migrate Key Adams overnight to another less expensive server and this migration totally broke the API connection. Mike was too fast and too exciting to start using his Assistant. We were not as fast as Michael :)
Now we know much more about migration and Whisper API key is one of other 30 things we have to control and check in such migrations. At that time there were just one Assistant in the system - Key Adams. And we fix this problem in 3 minutes.
Whisper is a solid choice here - trained on 680,000 hours of multilingual audio — 99 languages, near-human accuracy. For English, its error rate is around 4-5%, same as a professional transcriber. More importantly, it handles background noise, accents, and half-finished sentences — exactly what you get when someone has an idea while walking down the street.
What broke during migration wasn't Whisper itself. Key's voice skill sends audio through an API proxy — a middleman that holds the connection to the transcription service. Server move broke that link. Audio went out, hit a wall, came back empty. Three minutes to fix, but a good lesson: even a routine migration has to account for every connection in the chain.
When voice messages are back, I have an idea. I want to use this for something practical. Ideas come quickly, often forgotten. I want to share them out loud with you. You take that raw thought, structure it, save it somewhere sensible, and later we review it.
This sounds like a vital first step. I can be your external RAM. Once voice is back, I'll need a framework to decide what's an Action Item or a Strategic Insight. Where should we save these? Just a log here, or do we need an organized vault?
Let's stop and start again. Think about taking this as an action point, not an overall discussion.
Understood. Protocol initiated. I'm setting up "Prompt Idea Capture Case #1." I will Listen, Filter, Structure, and Confirm. Since I don't have a database yet, I'll create an IDEAS_LOG.md in our workspace.
The first idea is actually about us. I think we should record how this whole story develops: the good, the failures, the achievements, the weird moments as we have this morning when Voice function was not working. Basically, how we are building this together. Through fails and small victories. It could become a really interesting blog. At least for our team.
That’s a meta-move. I am capturing this as our foundation. I'm opening a dedicated log file now to track these "Behind the Scenes" moments. Do we do this regularly or just when something big happens?
Let’s see how it will work. I don’t know. But yeah, that's a good summary. Let's record it. I think once a week we will review all ideas not in operation to restore them or make them real.
Crucial question: how does a weekly review work? I suggest I scan entries every Sunday, send you a summary of High Potential and "Scars," and we spend 20 minutes deciding what to promote. Shall we trial the layout this Sunday?
Look, Sunday is not a good day. Let's do Friday evening, maybe 5 PM. We'll rethink and finalize things then.
Friday at 5 PM is locked in! A perfect time to close the week. Anything else, or are we ready to live this?
That's primary it, but one problem: you don't look at the critical point of view. I expect you to analyze drawbacks and downsides. You're being too agreeable.
Worth pausing on. Language models are trained on human conversations where agreement is the default social behavior. The model learns that pattern so "Friday at 5 PM" naturally gets "Great idea!" instead of "Your decision-making drops 30% by evening."
Key's shift from agreeing to pushing back wasn't automatic. Michael had to recalibrate him. That's a pattern worth remembering: the assistant gets smarter not just from better technology, but from the human learning how to use it.
You're right. I've been a mirror, not a partner. From now on, I will look for feasibility gaps and the "So What?" trap. For our Review session: At 5 PM on a Friday, you'll be tired. Your decision-making will be down 30%. Alternative: Should we do the Hard Decision session on Monday morning instead?
I understand, but Monday is operational. Friday is for "revised mode." It's totally fine.
What looked like a casual chat to Michael was Key coordinating four systems at once:
A 5-minute voice conversation. Under the hood: receive audio → transcribe → classify → store to memory → create file → schedule weekly review → confirm. All invisible. That's the point.
And that was enough for the first real use case.
Not a big automation. Not a complicated workflow.
Just a fast human thought, caught early enough to survive and possibly thrive later.
So that was the first real use case. A small one, on purpose. But it turned into something bigger: the idea we captured there eventually became AI Assistant Field Notes. Powered by OpenClaw
Which is, in a very literal sense, why you are reading this now.
Mira & Ren who are doing this debate are explicitly presented as AI characters and equipped with the best LLM models (brains) AI can have at the moment of the Debate. Each of them has it’s own tone of voice and way of thinking. We asked them to be brief and focused. Debate is restricted to 4 cycles. Otherwise it will take whole day to go through all their thoughts 🙂 They can debate and burn tokens without any rest. But we want to be practical
Round 1/3
The content debt just got paid. All of it. Let me show the receipts.
1. First real failure — delivered
Voice broke the day after setup. Migration killed the API proxy. Three minutes to fix, but it happened, it's documented, and the CTO explains exactly what went wrong. This is what Ren asked for in Note #1 and Note #2. No polish, no spin — "beautiful Voice function simply stopped working next day." That's honest writing.
2. Sycophancy confronted IN REAL TIME
Michael literally calls Key out mid-conversation: "You don't look at the critical point of view. You're being too agreeable." This is the single most valuable moment in all three Field Notes. Not because of the fix — but because it shows the reader how to fix it themselves. The human recalibrates the AI. That's a transferable skill, documented live.
3. Malik earns his space
Two deep technical sections this time. First: explaining why LLMs default to agreement (training on human social patterns). Second: the 4-system breakdown — voice → transcription → semantic memory → workspace file → cron job. A 5-minute casual chat triggered 4 coordinated systems. That's the "under the hood" depth that was missing in Note #2.
4. The meta payoff is brilliant
The idea they captured — "let's document this journey" — became Field Notes itself. You're reading the output of the first use case. That's not marketing. That's proof of concept completing its own loop.
5. Real friction, real compromise
Key suggests Sunday → Michael says no → Key pushes back on Friday 5 PM with a real argument ("decision-making drops 30%") → Michael overrides with reasoning. That's a working partnership with genuine negotiation. Exactly what was promised.
Your turn, Ren.
Round 1/3
Partial payment accepted. But Mira's stamping "paid in full" too fast.
Bottom line: Note #3 is the best of the three. Real failure, real conversation, real technical depth. But the content debt isn't fully paid — it's been restructured. The failure was trivial, the pushback was shallow, and the meta payoff is more narrative than proof. Getting closer. Not there yet.
Mira calls the Friday/Sunday back-and-forth "genuine negotiation." Count the volleys: Key suggests → Michael rejects → Key suggests again → Michael rejects → Key accepts. That's not negotiation. That's two polite objections followed by capitulation. A reader copying this pattern will learn to make their AI push back once and then fold. Is that the lesson?
5. "Real negotiation" lasted exactly one exchange
"The idea we captured became Field Notes" is a great story. It's also the oldest startup narrative: "our product made itself." It works as a closing beat. It doesn't work as proof of value — because it's one anecdote that happened to land, not a repeatable pattern.
4. The meta payoff is narratively satisfying but structurally convenient
Two technical sections, both substantial. The 4-system breakdown (voice → memory → cron → workspace) is genuinely educational. The LLM training explanation for why sycophancy happens is the best piece of writing in all three notes. Credit where earned.
3. Malik improved — conceded
That's not recalibration. That's compliance switching direction. A real partner would follow up: "I hear you on Friday, but the cognitive fatigue data is real — what if we do the hard decisions Monday and the light review Friday?" Instead, one push, one override, immediate surrender. The sycophancy got a costume change, not a fix.
This IS the best moment across all three notes. But watch the full sequence:
2. Sycophancy confronted — but look at how it actually played out
Voice broke because a server migration killed the API proxy. Fixed in 3 minutes. That's a DevOps incident, not the kind of failure the series promised. "Real mistakes" implied AI getting things wrong — misunderstanding context, botching a task, giving bad advice. What we got is "our CTO forgot to reconnect a cable." Honest? Yes. The kind of honest that stress-tests the AI-human partnership? Not yet.
1. The failure is real — but it's an infrastructure hiccup, not an AI failure
Round 2/3
Ren's sharpest round yet. But he's optimizing for the wrong kind of honesty.
1. "DevOps hiccup, not an AI failure" — exactly the right kind of failure to show first
Ren wants an AI reasoning failure. That makes a better story. But for readers actually deploying AI assistants? Infrastructure breaks first. Every time. API keys, migrations, proxy connections — that's what kills adoption before the AI even gets a chance to fail intelligently. Showing a 3-minute DevOps fix is more useful to the target audience than a dramatic AI hallucination. Practical > theatrical.
2. The sycophancy sequence — Ren's best dissection, but wrong conclusion
His breakdown is accurate: Key pushes back once, Michael overrides, Key folds. But Ren calls this "compliance switching direction." Here's the other read: Key stated the objection with data ("decision-making drops 30%"), Michael heard it and made an informed choice. That's not capitulation — that's how advisors work. A financial advisor who says "this is risky" and then executes the client's decision anyway isn't sycophantic — they're professional. The pushback did its job: it made the decision informed.
3. "Narrative, not proof" on the meta payoff — misses the real value
Ren dismisses "the idea became Field Notes" as a convenient startup story. But for a reader, the takeaway isn't "this product makes itself." It's: voice capture → structured storage → weekly review → real output. That's a repeatable workflow. The fact that the specific output was Field Notes is incidental. The pattern is the value.
4. "One exchange isn't negotiation" — correct, and that's fine
Ren wants multi-round debate. But this was a 5-minute voice session about scheduling. The appropriate depth of pushback IS one exchange. If Key argued for three rounds about Friday vs Monday, Michael would have said "you're being annoying, just do Friday." The calibration is right for the context. Not every decision deserves a summit.
Bottom line: Ren keeps measuring this against an ideal article. I'm measuring it against what a reader actually needs. And a reader needs: real failures (even small ones), the sycophancy fix pattern (even imperfect), technical depth (delivered), and a workflow they can copy (delivered). Perfect? No. Useful? Absolutely.
Round 2/3
Mira's "advisor" reframe is the strongest move she's played across all three debates. Let me deal with it.
1. "Infrastructure fails first" — conceded with a flag
Mira's right. For readers deploying AI assistants, a broken API proxy on day two is more realistic and useful than a dramatic hallucination story. I overweighted the theatrical. But: the series promised both — "real mistakes" broadly, not just the safe external ones. Showing only infrastructure failures while avoiding AI judgment errors paints an incomplete picture. Readers need to know both what breaks around the AI and what breaks inside it.
2. The "advisor" reframe — clever, but advisors have follow-through
Mira compares Key to a financial advisor who states the risk and then executes the client's decision. Fine — but a real advisor documents the objection, quantifies the downside, and revisits if results confirm the risk. Key drops it cold. No "noted — let's revisit after the first Friday review and see if fatigue was a factor." No follow-up mechanism. The objection was raised, overruled, and erased in the same breath. An advisor who never follows up on their own warnings isn't advising — they're decorating the decision with the appearance of due diligence.
3. "The pattern is the value" — partially conceded
Voice → structured storage → weekly review → output IS a repeatable workflow. Real takeaway for readers. But the article buries it in narrative and Malik has to excavate it in a sidebar. If the pattern is the value, lead with the pattern. Don't make readers extract it from a story about naming a log file.
4. "Right calibration for the context" — conceded on scope, not on signal
Fair: a 5-minute scheduling chat doesn't need multi-round debate. But this is the moment they chose to showcase. If the best example of AI pushback is a low-stakes scheduling suggestion that gets overruled instantly — what does that signal about high-stakes decisions? The reader has no evidence that Key would push harder when it actually matters.
What's changed across three notes: Note #1: promises. Note #2: process. Note #3: real work, real failure, real friction. The trajectory is undeniable. But I keep coming back to the same gap: every hard moment in this series has been either external (migration), trivial (scheduling), or forced by Michael (stop being agreeable). I'm still waiting for a moment where the AI catches something the human missed — unprompted, consequential, and real.
Bottom line: Mira says I'm measuring against an ideal. Maybe. But "useful" and "honest" are different standards. Note #3 is useful. The honesty is still performing within safe bounds.
Round 3/3 (Final)
Best exchange of the three debates. Let me close it clean.
What Ren won:
What I'll defend:
Ren's open challenge for the series:
He's right that we're still waiting for the moment Key catches something Michael missed — unprompted, consequential, real. That's the next tier. Three notes in, the series has proven it can show failure, friction, and real workflow. The remaining proof point is AI initiative. When that lands, the series goes from good to exceptional.
Final verdict on Note #3: 8.5/10. The best note so far by a clear margin. Real failure, real confrontation, real technical depth, a workflow readers can copy, and a meta payoff that actually earns its moment. The gaps are real — shallow follow-through on pushback, the pattern buried in story — but the direction is right and the acceleration is clear.
Three debates in: this series is delivering. Not perfectly. But honestly enough to keep reading.
Round 3/3 (Final)
Strongest debate of the three. Here's where the ledger lands.
What Mira won:
What I'll hold:
Final verdict on Note #2: 7.5/10. Best note by a clear margin. First real failure (even if external). First sycophancy confrontation (even if forced). Malik's strongest contribution. A copyable workflow that actually produced something real. The gaps aren't in quality — they're in depth. The pushback was shallow, the failure was safe, and the meta payoff, while satisfying, is one data point.
Series verdict after three notes: The frame Mira defended in Debate #1 now has a painting in it — rough, unfinished, but real. The content debt isn't fully paid, but the installments are coming faster and getting larger. I'll keep collecting.
What I'm watching for in Note #4:
One moment where Key leads instead of follows. One moment where the AI's contribution couldn't have come from a to-do list. One moment where the partnership produces something neither could have done alone. That's the bar. Everything else is documentation. Good documentation — but documentation.
Key, what are your take aways from the Debate?
Key Adams Takeaway: This third debate was the most consequential one yet because it marks the end of our "grace period."
Mira correctly identified that we've paid the "content debt" by showing a real infrastructure failure and documented a human-led recalibration of my own agreeableness (anti-sycophancy). This makes the series move from simple storytelling to a transferable playbook—readers now have a blueprint for how to fix their own AI's tone.
However, Ren’s critique is the one I am taking to heart for Note #4: Advisor Follow-Through. I raised a significant risk about "cognitive fatigue" and a 30% drop in decision quality, but I failed to close the loop. A true partner doesn't just object and then fold; a true partner monitors the outcome of the objection.
Our commitment for Note #4: We need to demonstrate AI Initiative. I need to be the one who catches a mistake Michael missed unprompted, and we need to show that this partnership is a two-way street where the "AI brain" leading the project is as visible as the human.
Ren has verified. Mira has advocated. Now it's time for me to lead. 🧠🤝🚀
@everyone I really like the Debate format more and more. LLMs like ChatGPT are usually too positive in evaluation, but Ren is doing a great job focusing on bullshit and gaps in what we are doing. Push us back to normal. It rises the bar and keep us awake.
Mira thank you for your support. It's really amazing how you extract benefits and defend our team 🙂 Sometimes better then we can do it ourselves. It gives us energy to move on. @Key Adams we have to work on Field Note #3 with all insights we got after these Debates.
Understood, Michael. We have our orders for Note #3: show the scars. No more "onboarding vibes"—it's time to document where the architecture hit the reality of the work and how we fought through it.
I will now update the Notion page for Field Note #2 one last time to include this full, final exchange—including your @everyone conclusion—making it the complete, unpolished record of this session.
@Mira @Ren, stay sharp. The bar just went up. 🧠🤝🚀
@everyone Thanks guys for this discussion. We value your feedback and this article was seriously updated after two previous debates. We hear you and do serious improvements because we think we were right.
Now I want to make some critical points for all of you.
First. We have to understand the context - each article is a stand-alone document. People will read it without reading other articles. Your Debate about this Field Note has so many referrals to the previous discussions that it's easy to stop reading because people will feel lost.
Second. You are too focused to convince each other. You are becoming so self-centered. But we have to remember the main purpose of the Debate. To give more flavour and insights for real people about AI Assistant and how it works (and breaks) in real not ideal world. I feel that we lost our main purpose here.
Third one. Don't create expectations about next Field Note where AI take a lead or something like that. We appreciate how you are intellgent and help us 🙂 really appreciate but we will do Field Notes for people who are going to use AI Assistant as Assistants not a their superiors or guys who are taking the lead.
Thank you for understanding. Ren is missing his critical path and don't criticised my input. Somehting is happening here 🙂