Open Codex Code & Culture

Speech recognition 2016

In my twenty years of using speeching recognition I’ve had a number of setups. I began at MIT’s Accessibility Lab with discrete speech—articulating — every … single … word … discretely. IBM’s ViaVoice was the first that allowed me to dictate in phrases—much less of a strain—and ran on Linux for a brief time. Sadly, IBM handed this off to Scansoft, which buried it, and they were then bought my Nuance. This meant that at the start of the naughts, Nuance’s NaturallySpeaking was the only game in town. In 2003, I used NaturallySpeaking alongside Linux by running it on a headless or virtual machine. This was the approach I then used for the next 13 years.

In the past year, I’ve been impressed by Google’s speech recognition. Big data and new machine learning techniques have advanced the state of the art. Sadly, you can’t customize Google speech for your particular vocabulary, nor can you use it to control your desktop. Still, mainstream mobile applications (and voice assistants) have revitalized the speech recognition field. This doesn’t immediately serve the accessibility market, but it gives me hope that there will be spillover.

The need to eventually upgrade my OS (from Kubuntu 14.04) and the news that Simon/KDESpeech was discontinued led me to the conclusion that it was time for a change. I want simple, native desktop dictation. As intriguing as Windows Bash is, I decided MacOS offered the best potential for a Unix desktop with speech recognition. Apple is behind others in the accuracy of their speech recognition—not nearly as good as Google—but their enhanced dictation provides useful control of the desktop, and Nuance’s Dictate runs on MacOS as well. Unfortunately, Dictate 5 is a disaster on El Capitan: it crashes right out of the box.

So, I’m still using NaturallySpeaking in a virtual machine. But I have two hopes. First, I hope Nuance’s Dictate, which is very accurate and permits custom vocabularies, will eventually run well on MacOS. Second, I hope Apple’s Enhanced Dictation will permit customization and improve in quality. Both of these are much more likely than seeing speech recognition on a Linux Desktop.

Which brings me to my current setup. I continue to use the Kinesis Advantage keyboard; I’d be screwed without it. I continue to use the Plantronics Savi W440 wireless headset, seen in the back, for most of my dictation in the Windows virtual machine. If I need to transcribe notes or interviews, or intensively write or edit, this provides the best recognition. The new bit of hardware is the Buddy 7G FlamingoMic, which I use as the Mac’s microphone: I use it for desktop control and dictating short emails.

The first thing I did upon getting the iMac was cover the webcam and microphone with electrical tape—masking tape won’t completely silent the microphone. As this was my first PC that’s fully USB3, I also learned that USB3 and wireless headsets don’t work well together. So the webcam, headset transmitter, and Flamingo mic are all plugged in to a small USB2 hub with individual power switches and LEDs for each device, making it easy to disable each. The Buddy 7G microphones, like the SpeechWare ones, are not general purpose mics: they won’t be good for music, for instance. They have circuitry built in for filtering out noise and picking up voices. I bought the Buddy 7G because I suspect it is as nearly good as the SpeechWares. I bought the Flamingo because it is portable and much cheaper than the version with the built in base. The USB2 hub I’m using cost $6 and is easily mounted to the desk using double sided tape; desktop units cost hundreds more though they offer no more functionality than a hub.

Finally, here’s the accuracy of dictating the rainbow passage using the two microphones and Mac Enhanced Diction and Nuance Speech Recognition.

	Buddy 75	Savi W440
Naturally Speaking 13	99%	100%
Enhanced Dictation El Capitan	96%	96%

You can see that the Buddy 7G desk mic is quite good, but not as good as a headset, and that El Capitan’s Enhanced Dictation is okay but frustrating for serious use.

There are comments.

links

social