Open Codex technology :: python :: diff-word-files

2010 Feb 16 | Diffing Word Files

For the most part, I wrote my dissertation and book manuscript using a simplified version of markdown complemented with biblatex citations. Because it was a simple text file, it made managing the edits to the manuscript very easy. I could do global textual replacements trivially. Also, obviously, it was trivial to generate PDFs, HTML, etc. Using Mercurial, I could take advantage of some nice features like the "attic" extension which allows me to keep change sets on the side to be applied only when appropriate. So, for example, the changes necessary generate HTML were kept in the attic and would only be applied when I wanted that.

Unfortunately, once the manuscript went into the MIT Press system, I had to use Microsoft Word. As much as much as the Word document format annoys me, I understand it is widely used, and I can't think of an easy alternative that also provides the capability for editorial annotations. Nonetheless, I had a difficult time seeing changes in Microsoft Word, and want to backport the changes into my source files. And, there does not appear to be a nice textual difference tool for Word documents.

I have posted a small Python script that makes use of antiword and dwdiff but also gives me context on either side of the change. It, of course, doesn't work well with formatting, but is useful and will generate output like the following:

   reflects {-the-} [+a+] stabilization
   a {-number of pragmatic questions: it-} 
     [+project was conceived. It+] would
   there {-will-} [+would+] be
   article {-will-} [+would+] be
   linked {-to from-} [+via+] a

this entry posted to technology/python;
comments (0)






Name:


E-mail:


URL:


Comment:


NoSpam Magic Word:
The opposite of closed (the first word of this blog's title) is?

Open Communities, Media, Source, and Standards XML

by Joseph Reagle

powered by pyblosxom


reagle.org

What I'm reading online (blogroll)


Categories

Archives