Radio Free Internet, Part I: How Much of the Web Hears You?

by bpick

As we all know, the spoken word conveys more than typed text.  Where audio gives us a person’s inflection, tone and laughter, web chat requires smileys, sarcasm tags and clumsy acronyms like LOL.

But the spoken word is not yet thoroughly integrated with the rest of the web.  Between video, talk radio and podcasts, a massive volume of spoken words is recorded for wider consumption each day and yet not processed into something that can be searched and manipulated easily on the web.

So each day, despite tens of millions of regular listeners, there are vanishingly few references to talk radio in the blogosphere and social networks.  That’s a shame: regular talk-radio listeners are overwhelmingly more conservative than liberal, and yet some of the Right’s most effective rhetoric and interesting discussions immediately go static after broadcast.

On a broader level, this is a big gap in organizing the world’s information, and yet — where is Google Speech Search?  Why aren’t people sharing bits of audio like they share small bites of text, pictures and (increasingly) short-form video?

The spoken word loses much of its potential audience because it is (1) hard to search, (2) somewhat hard to consume and (3) hard to share.

Part I covers the first problem.  Part II will explore the other two problems.  Part III will suggest a solution.


The spoken word is hard to find because we search using text, and most spoken words in audio form are not attached to the source audio/video in a transcript.

Right now, if audio creators want to signal the spoken-word content of a piece of media, they can add some descriptive tags and/or describe the audio in a related blog post.  But the actual words will not show up in a search of either webpages or RSS feeds unless they’re in a connected transcript.

Transcription takes time, so it usually isn’t done.  I would know: years ago I transcribed several of the QandO blog’s Observations podcasts (see this one for example).  I wanted people to be able to read instead of listening, I wanted links to tell people what we were referring to, and I wanted the words to be searchable.  But it took hours for me to properly transcribe a half-hour podcast.

And even for the audio that has been transcribed, it takes extra effort to get to the specific words of interest.

  • YouTube has automatic captions, but you can’t search videos by that text or skip to the words you want (unless someone in the comments has linked to that particular time in the video).
  • C-SPAN, which keeps transcripts of its programming, allows you to search for specific words by specific people, but even that is closed off from the rest of the web and still pretty clumsy to work with.
  • And the WB does allow limited searches for dialogue in its video clips.

This alone is a big obstacle to integrating the spoken word into much of the web. But the problem is compounded by how difficult it is to consume and spread the spoken word even after you’ve gone to the trouble of finding it.  That’s the subject of Part II.