How accurate is the transcription?

Whisper achieves 95%+ accuracy for clear audio in major languages. Accuracy drops with background noise or heavy accents.

Can it identify different speakers?

Basic speaker detection is available, but for advanced diarization use the Transcribee skill (Pro).

Is my audio data stored?

Audio is sent to OpenAI's API for processing and deleted immediately after. Transcripts are cached for 24 hours unless saved.

What languages are supported?

Over 50 languages including English, Spanish, French, German, Chinese, Japanese, Arabic, and more.

Can it transcribe live calls?

Yes, if you can record and pipe the audio stream. Direct Zoom/Meet integration coming soon.

How does it handle poor audio quality?

Whisper is robust but may produce errors. Pre-processing with noise reduction improves results.

Back to Skills

Cloud Transcription (Whisper API)

Convert calls and voice recordings to text for instant meeting notes and searchable records

AI ServiceActive

What It Does

Cloud Transcription uses OpenAI's Whisper API to convert audio from meetings, calls, and voice memos into accurate text transcripts. Whether you're recording a Zoom call or dictating notes, this skill delivers high-quality transcriptions in over 50 languages.

Voice to TextMulti-languageMeeting NotesCall RecordingWhisper API

In a Nutshell

🎤

Audio Upload — accept audio files up to 25MB

🌍

Multi-language — transcribe 50+ languages automatically

⚡

Fast Processing — get transcripts in seconds

📝

Meeting Notes — convert calls to searchable text

🔍

Speaker Detection — identify different speakers (when available)

Use Cases

Meeting Documentation

Record and transcribe team meetings for instant searchable notes

Interview Analysis

Transcribe customer interviews and research calls

Voice Memos

Dictate ideas and get them transcribed on the spot

Podcast Processing

Create show notes and transcripts for audio content

Real-world examples

Rough incident notes → clean technical handoff in minutes

Server acting weird at 2am. Sent a rough description. Clean technical handoff in minutes.

Say it while driving, task structured and running by arrival

Say it while driving. It structures the task, confirms back, starts executing.

Log your meals by voice, get nutrition fixes for tomorrow

Voice message about what I ate. Back comes 1-3 things to fix tomorrow. Tracks patterns.

Forward raw chaos, get a prioritized action list

Voice notes, screenshots, texts. Forward raw. Back comes a prioritized action list.

Dump voice notes at night, wake up to a clean morning brief

Dump voice notes about yesterday. Wake up to a clean brief: what matters, what's first.

Voice recap after the call: tasks with owners and deadlines

Quick voice recap after the call. Back comes: who does what, by when, success metrics.

How to Use

Step 1

Upload or record audio

Send an audio file or start a live recording through your preferred channel.

Supported formats: MP3, M4A, WAV, WEBM, MP4

Step 2

Automatic processing

The assistant sends audio to Whisper API and retrieves the transcript automatically.

Step 3

Receive formatted transcript

Get clean text output with timestamps and optional speaker labels.

Step 4

Store or share

Save to Notion, send via email, or export to your preferred format.

Command Examples

You say:

Transcribe this meeting recording [audio file]

Assistant responds:

[00:00] John: Let's start with Q4 goals. [00:15] Sarah: We need to focus on retention...

You say:

Convert my voice memo to text

Assistant responds:

Transcribed: "Remember to follow up with the client about the proposal. Send revised pricing by Thursday..."

You say:

Transcribe the last 10 minutes of this call

Assistant responds:

Transcript ready (10:34 duration): Discussed action items, assigned owners, set next meeting for Dec 15.

Limits & Behavior

Parameter	Limit	Notes
File size	25 MB	compress large files before upload
Duration	3 hours	split longer recordings
Daily requests	500 files	unlimited with subscription
Concurrent jobs	5 at once	queues additional files

Models & Modes

Model	Speed	Accuracy	Best For
Whisper Large	Medium	Highest	critical transcripts, multi-speaker
Whisper Medium	Fast	High	general meetings and calls
Whisper Small	Very Fast	Good	quick voice memos

Estimated Cost

~$0.007per minute of audio

A 10-minute meeting costs ~$0.07 in transcription. Longer recordings cost proportionally more.

LLM processing cost is additional and depends on conversation complexity. BYOK users pay LLM costs directly to their provider.

* Prices include platform service fee. Actual costs may vary.

FAQ

Setup Requirements

OpenAI API key configured

Audio recording capability

Internet connection for API calls

File upload permissions

Troubleshooting

Error	Meaning	Action
FILE_TOO_LARGE	Exceeds 25MB limit	Compress audio or split file
UNSUPPORTED_FORMAT	Audio format not recognized	Convert to MP3, M4A, or WAV
API_TIMEOUT	Processing took too long	Retry with shorter clip
LOW_QUALITY	Poor transcription result	Re-record with better audio quality