Despite the fact that audio transcripts are prepared for court proceedings, investigations, speeches, press conferences, and movies — and heavily relied on by lawyers, judges, the media, governments, and people with hearing impairments — it seems like the concept of audio transcription is not broadly understood.
In fact, if you search “transcription” on Google, the entire first page is dedicated to DNA transcription (not that there’s anything wrong with that!).
Based purely on my own experience, I’ll explain what audio transcription is along with the transcription process.
In plain English, transcription is the process of typing what you hear accurately, and ideally, in a user-friendly format. Many industries, such as auto insurance, require strict verbatim transcription, which means you type everything, including every “uh” and “hmm.” The style I’m used to and prefer is clean verbatim. Different audio transcription services interpret “clean verbatim” differently, but the way I learned omits non-words (“um,” “uh,” etc.), stuttering (“But — but — but”), and filler phrases (“you know”).
Clean verbatim also breaks up crosstalk into readable and logical chunks. For example, if Person A interjected right before Person B finished the last two words of his sentence, you would probably not reflect that interruption in the transcript. I say “probably” because there are always exceptions. Organizing crosstalk is often not this straightforward, but you get the idea.
Transcribers, or transcriptionists, also structure the dialogue so it makes sense to the reader.
Therefore, transcribers process the information they’re hearing so the transcript is broken up logically. This applies to paragraphs and sentence structure.
If you’re on autopilot or not paying attention to the context, the transcript may not make as much sense, or worse, convey the wrong message to the reader.
For example, if the ambassador of Israel said the acronym “AIPAIC” (the American Israel Public Affairs Committee), you’d better not confuse it with APEC (Asia-Pacific Economic Cooperation) or OPEC (Organization of the Petroleum Exporting Countries). I almost always look up terms and consider their context, even if they sound familiar.
Even discounting that specific scenario, mixing up similar-sounding words is easier than you would think, especially with fast speakers or difficult subject matter.
Once you’ve typed your transcript, you’ll probably need to review it from the top.
I always proofread my work, typically around 117% on Express Scribe (faster or slower depending on audio quality). Use whatever speed works best for you. Horrible audio may need a second or third review.
Effective proofreading is perhaps the most important part of the transcription process. It requires good judgment and second guessing your work. How much time you dedicate to this step depends on many factors: Was the audio bad? Was there crosstalk? Was there a fast talker? Did someone have a thick accent? Were you tired?
If so, it’s best to take your time and accept the extremely likely fact that your first draft contains errors. In my prior job, I proofread the same people’s transcripts for three years, and it was usually obvious if they were struggling with an accent or not feeling as alert on that particular day. It happens to everyone.
One piece of advice that can’t be overstated:
If you’re not sure about something, don’t guess, especially if it’s controversial. Flag it for the editor if possible. If you don’t have the luxury of passing the buck, you’ll have to figure it out on your own (slowing down or speeding up Express Scribe can help). And if you just can’t get it, flag it for the client or replace it with “(inaudible).”
After proofreading a transcript, it’s a good idea to run spelling/grammar check, look over formatting, and make sure the colloquy is correct.
Once that’s done, you are finished!
It usually takes between 4x and 6x audio length from start to finish to produce an accurate transcript, but there will be outliers on both sides.