How many speakers can it identify?

Up to 10 distinct speakers in a single recording. Accuracy improves with clear audio and minimal crosstalk. Best results are with 2-6 speakers.

How is this different from Cloud Transcription?

Cloud Transcription (Whisper) provides fast, accurate text from audio but with basic speaker detection. Transcribee specializes in precise speaker diarization — identifying exactly who said what with timestamps.

Can I rename the speaker labels?

Yes, after transcription you can ask the assistant to replace generic labels (Speaker A, Speaker B) with real names.

Does it work with phone call recordings?

Yes, phone recordings work well since they typically have clear speaker separation. Mobile recordings with speakerphone may have lower accuracy.

Back to Skills

Transcribee

Speaker-separated transcripts with precise timestamps for interviews, meetings, and multi-speaker recordings

Communication & MeetingsComing Soon

What It Does

Transcribee produces speaker-separated transcripts with precise timestamps from audio recordings. Unlike basic transcription that outputs a single text block, Transcribee identifies individual speakers and labels their contributions, making it easy to review who said what and when.

Speaker DiarizationTimestamped OutputUp to 10 SpeakersInterview Analysis

In a Nutshell

🎧

Speaker Diarization — identify and label individual speakers automatically

⏱️

Timestamped Output — precise timestamps for every spoken segment

👥

Multi-Speaker — handles up to 10 distinct speakers per recording

📋

Structured Output — clean, searchable transcript format

Use Cases

Interview Analysis

Get speaker-separated transcripts from interviews and research calls with clear attribution

Meeting Minutes

Automatically generate meeting notes with who said what and when

Podcast Production

Transcribe episodes with host and guest labels for show notes and editing

Research Documentation

Create searchable archives of multi-participant research sessions

How to Use

Step 1

Upload audio with multiple speakers

Send a recording containing two or more speakers. Supported formats include MP3, WAV, M4A, and OGG.

Best results with clear audio and minimal background noise or crosstalk.

Step 2

Review speaker-labeled transcript

The assistant identifies each speaker and produces a timestamped transcript with labels like Speaker A, Speaker B.

Step 3

Refine speaker names

Ask the assistant to rename speakers if you know who they are — e.g., 'Speaker A is Sarah, Speaker B is John'.

Command Examples

You say:

Transcribe this interview with speaker labels

Assistant responds:

[00:00] Speaker A: Welcome to the session. [00:05] Speaker B: Thanks for having me. Let me start with... [00:12] Speaker A: Great, so tell us about...

You say:

How many speakers are in this recording?

Assistant responds:

I detected 3 distinct speakers in this 22-minute recording. Generating labeled transcript now...

You say:

Summarize what Speaker B said in the first 10 minutes

Assistant responds:

Speaker B made 4 key points in the first 10 minutes: discussed project timeline, raised budget concerns, suggested alternative vendor, and agreed to follow-up meeting.

Limits & Behavior

Parameter	Limit	Notes
Max speakers	10 per recording	accuracy decreases above 6
Audio length	Up to 2 hours	longer files processed in chunks
Supported formats	MP3, WAV, M4A, OGG	auto-converted if needed
Minimum segment	2 seconds	very short utterances may merge

Estimated Cost

~$0.007/min audioper transcription

Similar to Cloud Transcription, plus additional LLM processing for speaker diarization.

LLM processing cost is additional and depends on conversation complexity. BYOK users pay LLM costs directly to their provider.

* Prices include platform service fee. Actual costs may vary.

FAQ

Setup Requirements

Audio recording with multiple speakers

Supported audio format (MP3, WAV, M4A, OGG)

Clear audio with minimal background noise for best results