The internet is a quiet place.

Search engines today search primarily text, and ignore rich media like audio and video. So National Public Radio (NPR) has a plan to convert their broadcasts into text so that it can be searched. Unfortunately the current language detection technology is only about 80% accurate, but this will still allow a functional search of the audio content. NPR also labels their audio file with a great deal of relevant metadata.

I wouder if it would be possible to index phonemes in audio in the same way that words are indexed in text. A text to speech engine could render the search text as a string of phonemes and the search engine could compare the strings. This should also work for multiple languages. This is similar to the concept of Latent Semantic Indexing where the relationships between different words is mathematically developed to allow searching later. That technology is essentially language independent. Could audio indexing work the same way, with phonemes being grouped into words?