Curated by: Luigi Canali De Rossi

Wednesday, January 11, 2006

Audio Transcription For Podcasts: JC Human vs.

Sponsored Links

CastingWords is a nifty new online service which allows anyone to submit an audio podcast for immediate transcription into text.

Photo credit: Kryptos

The key innovations brought in by this very affordable new service (see further down my comparison with a traditional established broadcast transcriber) are the easy and efficient automated submission process, the integrated cost estimating process and the RSS and email based workflow notification facility.

Podcasters only need to have the URL of their podcast RSS feed and a PayPal account to pay the transcription fee.

The transcription is done in a very short turnaround time and the service notifies you as soon as the transcription progress is completed while delivering your formatted transcription in three different final output formats. does not employ automated translation technology to transcribe podcasts but uses a fully human network of supporting editors.

What may not appear as exciting to the conscious and ethical online publisher is that, under this promising surface, CastingWords is also a beta podcast search engine which may eventually derive its key strength from the very service it provides for a fee.

That is: CastingWords asks me money to transcribe my podcast, but then it uses those text transcriptions as the core index database of a its podcast search engine, which is in turn ad-supported.

If in the near future CastingWords search engine will be free to all users, and it will share with podcasters, who have paid for transcriptions, some its possible future search ad revenues, then this a great approach to make audio content extend its reach while enabling more podcasters to do a better content publishing job. But if CastingWords wants to make extra money by leveraging user-paid-for content to create an ad-based search engine, then maybe someone should step up and say something to these guys.

For now, it is difficult to judge, as the CastingWords search facility is not even able to find my own podcast content when searching for Rome, and Italy, two pretty unique words I always use at the beginning and end of my audio interviews.

To help everyone get an idea of how useful this service can be, when compared to using a traditional editor to transcribe your audio podcasts, I have taken the time to test and compare with my established human transcription resource, a professional TV broadcast transcriber.

Without telling either one that I was doing a comparison review, I have sent to both a 30-minute audio interview with Eric Goldstein of Clipmarks that I have very recently published.

Here is what I, with the help of assistant editor Matthew Guschwan have been able to find out:

Photo credit: Jake Levin

Transcription Comparison

JG Human vs. CW CastingWords

This summary comparison evaluation is based upon a few key comparison areas:


1) Basic formatting

2) Use of punctuation

3) Spelling errors

4) Fidelity to original recording


5) Output formats

6) Costs

7) Turnaround time

1) Basic Formatting

CastingWords clearly indicated the change of speaker in the interview by using "headings" containing the name of the person speaking such as:
Robin Good:
Eric Goldstein:

JC indicated a change in speaker by simply starting a new paragraph.

The additional use of clearly labeled headings did save us time in getting the transcription online.

2) Use of Punctuation

There were several major differences in punctuation between the two services tested.

For list of items, CastingWords used commas (father, mother, nurse, babysitter) to separate the words while JG used a series of slashes (father/mother/nurse/babysitter).

For our purposes, we prefer the more traditional use of punctuation, which for us is commas.

on the other hand CastingWords did not use commas nearly as often as JG did. In transcribing spoken words, commas are very important in trying to recreate the conversation, and for clarifying points. In this regard, JG's transcription was clearly superior.

3) Spelling Errors

Both services made several mistakes in spelling. It must be said, though, that audio podcasts are often not of the highest audio quality possible, and many of my interviewees, if not myself too, are not native English speakers and therefore our pronunciation of certain words may mislead the transcriber. Also, it is rare that transcribers are familiar with the topic and technologies you may be covering, causing a multitude of misinterpretations that at times, may be very difficult to catch. One that explains them all is "creative comments" transcribed in place of Creative Commons. It is evident that an editor unaware of CC could easily fall into this type of trap.

CW in particular passed along one word (maticion) that should have been easily caught in any standard English spell checker.

There were also instances in both transcriptions wherein words were misused, or spelled inconsistently.

For example, the spelling of homepage as 'home page' or bookmark as 'book mark' will not set off a basic spell checker, therefore the human element becomes more important.

These types of insidious errors can be more difficult to detect. As mentioned above, it is not common to have an editor that is up-to-date with new media and who knows how to properly write, or as in the case of CW, who did not capitalize "Flash."

Other differences were that CW used "bookmark" throughout, yet used the terms "book marking" for "bookmarking" twice and "book marked" for "bookmarked" once. JG used "bookmarking" and "bookmarked" throughout.

4) Fidelity to original recording

There are subtle differences between the transcripts such as a change of phrasing and or a slight change of word order. These changes are harmless if they preserve the meaning of the statement, and can even be helpful if these changes improve the flow or the clarity of the original spoken statement.

Not the same case when even one or two words are unjustifiably dropped, as here, where JG wrote "Clipmarks is about allowing you to easily save the specific information" while CW omitted "easily" altogether from this sentence.

In writing instructions, it is useful to have the instruction within quotation marks. E.g.: "click here" vs. click here. JG always included quotation marks while CW did not.

A more serious error is when the transcription is too literal to the recording, and hesitations or repetitions by the original speaker are not automatically deleted.

Other instances showed quite differing transcriptions of the same short passage:
CW "You've been continuously you've been adding in the recent weeks"
JG "You have been continuously adding in the recent week?"
or where the essential meaning of the phrase suddenly changes for the incorrect placement of a verb or adjective. Here is an example where improper use of a 'do' instead of a 'don't' creates a serious problem in the final transcription.

JG wrote: "If you don't want your clip marks to be seen by other people, you simply don't check the box..."

CW wrote: "If you don't want your Clipmarks to be seen by other people you just click the checkbox..."

5) Output formats

JC has been providing me with a text-only output which gets posted to a private wiki workspace where we both have access to.

CastingWords automatically provides translated output in three different formats which are accessible online via standard URL:
1) RTF
3) Plain Text

6) Costs

CastingWords costs 42 cents a minute. Period.

JC, my fully human translator charges me $20 for anything under one hour. No matter if it is 25 mins or 55, she bills me $20 for each one.

So the cost difference for this specific article was the following:
CastingWords: $14.28
JC: $20.00

Both accept direct electronic payments via PayPal.

7) Turnaround time

In terms of turnaround time, is CastingWords having a clear edge. This test was purposely run on a weekend during the Christmas week and CastingWords reacted in record time making the full transcript ready for pick-up within less than 48 hours from submission.

The fully human solution with JC, took his normal time toll which, unless you request something urgent in writing, is generally between 3 and 5 days (excluding weekend days).

Summary Evaluation

Overall, there were errors from both the automated CastingWords and the fully human JG. As a matter of fact, CastingWords too is essentially a human translation service, only with the addition of a set of automated facilities which greatly enhance the speed and transparency of the overall translation process.

As far as the translation itself went, JC seemed to us to have an edge over CastingWords. So, from a writing standpoint, JG was better than CastingWords because of a general better use of punctuation and because of better construction of some phrases.

Both services made some annoying spelling mistakes, and both transcripts had to be checked closely for the proper use of technical terms and proper names. Neither one provided a transcript that could have been published right-away as it was.

Which one is best?

That will certainly depend on what you need to do, what your time frame is, what is the content topic of your audio, and how important are to you precision and quality transcription versus faster and more immediately presentable results.

Reference transcriptions

Here the links to the two transcriptions as I received them. This is the text-only format, which you can download and check yourself to see what you would get.

CastingWords: text-only transcription of Robin Good interview with Eric Goldstein (I don't know for how long this will remain accessible online)

JC: text-only transcription of Robin Good interview with Eric Goldstein saved in RTF format to maintain original paragraph spacing provided by human transcriber.

Audio MP3 of original interview

For a fully illustrated walk-through of the CastingWords podcast transcription service please see:
Walking Through the Casting Words Store

Robin Good and Matthew Guschwan -
Readers' Comments    
2012-04-19 04:12:32


The exceptional method Amazon's mechanical turkers used by Casting words make them popular in transcription world.

2008-10-27 20:39:51

Audio Transcription

That is a big difference between the two. I would be interested to see what the difference would be between those and a rel=nofolow href="" a. We pride ourself on accuracy and customer service. also does audio transcription for legal dictation, medical dictation, and financial dictation.

2007-01-20 15:50:45


Casting words use Amazon's mechanical turkers to do it's transcriptions for them.
Basically they pay their workers on average about $2 (this includes a bonus) per 9 minute podcast, less if errors are made.
The pittance they pay their outworkers enables them to deliver their service for such a low fee.
Visit and you'll see lists and lists of the stuff they want transcribed.
I've just done a very technical transcription that has taken me the better part of 2 hours to complete, for a grand total of $1.27 plus a small bonus.

posted by Matthew Guschwan on Wednesday, January 11 2006, updated on Tuesday, May 5 2015

Search this site for more with 








    Curated by

    New media explorer
    Communication designer


    POP Newsletter

    Robin Good's Newsletter for Professional Online Publishers  



    Real Time Web Analytics