In my twenty years of using speeching recognition I’ve had a number
of setups. I began at MIT’s Accessibility
Lab with discrete speech—articulating — every … single … word …
discretely. IBM’s ViaVoice was the first that allowed me to
dictate in phrases—much less of a strain—and ran on Linux for a brief
time. Sadly, IBM handed this off to Scansoft, which buried it, and they
were then bought my Nuance. This meant that at the start of the naughts,
Nuance’s NaturallySpeaking was the only game in town. In 2003,
I used NaturallySpeaking alongside Linux by running it on a
headless or virtual machine. This was the approach I then used for the
next 13 years.
In the past year, I’ve been impressed by Google’s speech recognition.
Big data and new machine learning techniques have advanced the state of
the art. Sadly, you can’t customize Google speech for your particular
vocabulary, nor can you use it to control your desktop. Still,
mainstream mobile applications (and voice assistants) have revitalized
the speech recognition field. This doesn’t immediately serve the
accessibility market, but it gives me hope that there will be
spillover.
The need to eventually upgrade my OS (from Kubuntu 14.04) and the
news that Simon/KDESpeech was
discontinued led me to the conclusion that it was time for a change.
I want simple, native desktop dictation. As intriguing as Windows
Bash is, I decided MacOS offered the best potential for a Unix
desktop with speech recognition. Apple is behind others in the accuracy
of their speech recognition—not nearly as good as Google—but their
enhanced dictation provides useful control of the desktop, and Nuance’s
Dictate runs on MacOS as well. Unfortunately, Dictate
5 is a disaster on El Capitan: it crashes right out of the box.
So, I’m still using NaturallySpeaking in a virtual machine.
But I have two hopes. First, I hope Nuance’s Dictate, which is
very accurate and permits custom vocabularies, will eventually run well
on MacOS. Second, I hope Apple’s Enhanced Dictation will permit
customization and improve in quality. Both of these are much more likely
than seeing speech recognition on a Linux Desktop.
Which brings me to my current
setup.
I continue to use the Kinesis
Advantage keyboard; I’d be screwed without it. I continue to use the
Plantronics
Savi W440 wireless headset, seen in the back, for most of my
dictation in the Windows virtual machine. If I need to transcribe notes
or interviews, or intensively write or edit, this provides the best
recognition. The new bit of hardware is the Buddy 7G FlamingoMic,
which I use as the Mac’s microphone: I use it for desktop control and
dictating short emails.
The first thing I did upon getting the iMac was cover the webcam and
microphone with electrical tape—masking tape won’t completely silent the
microphone. As this was my first PC that’s fully USB3, I also learned
that USB3 and wireless headsets don’t work well together. So the webcam,
headset transmitter, and Flamingo mic are all plugged in to a small USB2 hub with
individual power switches and LEDs for each device, making it easy to
disable each. The Buddy 7G microphones, like the SpeechWare
ones, are not general purpose mics: they won’t be good for music, for
instance. They have circuitry built in for filtering out noise and
picking up voices. I bought the Buddy 7G because I suspect it is as
nearly good as the SpeechWares. I bought the Flamingo because it is
portable and much cheaper than the version with the built in
base. The USB2 hub I’m using cost $6 and is easily mounted to the desk
using double sided tape; desktop units cost hundreds more though they
offer no more functionality than a hub.
Finally, here’s the accuracy of dictating the rainbow
passage using the two microphones and Mac Enhanced Diction and
Nuance Speech Recognition.
|
Naturally Speaking 13
|
99%
|
100%
|
|
Enhanced Dictation El Capitan
|
96%
|
96%
|
You can see that the Buddy 7G desk mic is quite good, but not as good
as a headset, and that El Capitan’s Enhanced Dictation is okay but
frustrating for serious use.
There are comments.