Making Word Useful

Because I use speech recognition software (SR) I'm forced to tangle with proprietary software and formats; this provides a continuous reminder of the benefits and joys of Free Software. However, I have learned a few things about maintaining a Windows system for SR over the past five years.

In 2004 I began using continuous SR with ViaVoice on a headless Shuttle box accessed over VNC. (This was a big improvement over the discrete speech system I used 10 years before.) Despite the ameliorative provided by imaging the OS partition (PING is great for this), Windows was still a dreadful thing to maintain; the advent of virtualization has been a blessing. And up until the beginning of this year, I relied upon Win2K so as to keep a lean and portable OS. However, security and software support for Win2K is ending and the excellent VirtualBox 2.* software permits one to emulate a consistent hardware profile (including the bios); this allows me to placate XP's annoying validation system.

I presently use NaturallySpeaking 10.1. While the underlying recognition is often remarkable, the user experience and Nuance's support are dreadful. To have useful macro support one must pay hundreds of dollars more for a "professional" version to a company that charges its users for tech support because of its own breakage, which, if reported as bugs, are ignored. Fortunately, there is a friendly FOSS community and DragonFly is an amazing (Python-based) macro application that helps me get around the worst annoyances in NaturallySpeaking.

Then there is the matter of application support. While coders might be content with Emacs or UltraEdit, I dictate prose and want a visually meaningful processor: paragraph/heading styles, a spelling and grammar checker, word counter, etc. Lyx, Amaya, OpenOffice, and Abiword are not "Select-and-Say" capable applications (i.e., not useful with NaturallySpeaking). That leaves Microsoft Word and its loathsome ".doc" binary format. These binary files are impervious to the more useful features of versioning systems, or simple scripting. If I need to fix the capitalization of a term in my manuscript, I have to manually open each chapter and do "find/replace" rather than fix it with a simple one-line command (or with KFileReplace). While I had some hope the new ".docx" format would be useful (it is easy enough to unzip and parse) making sense of it is an outrageously difficult task (particularly lists). So, for years now I've been writing pseudo-LaTeX in doc files, converting them to text via antiword and processing it from there.

However, I recently accumulated enough Microsoft Word hacks to turn it into a decent text editor.

  1. Set the default save format as plain text and its default font to something nice like Andale Mono.
  2. Bind {control-v} to this PasteUnformattedText() macro.
  3. Bind {control-s} to this FileSave() macro to get rid of the annoying "you will lose your formatting saving to text" dialog.
  4. Office XP doesn't use UTF-8 encoding by default and nags you with a dialog every time you open such a file. UTF-8 is the encoding used by every other sensible application of late. Make it the default with this registry edit, but realize it uses the byte-order-mark (BOM) which even otherwise sensible applications get confused by. When processing the text, you can remove it in Python with: line = line.lstrip(unicode(codecs.BOM_UTF8, "utf8")).
  5. You can even "syntax highlight" your text with VBA: this AutoOpen() macro shows how editing markdown and LaTeX visually looks much like what I was seen before, but it is now an open format UTF-8 encoded text file!

Comments !

blogroll

social