As a PhD student, one of the first bibliographic annoyances I encountered was when I had to format a paper using the APA system, which requires titles to be in sentence-case. This means only the first word of a phrase and proper nouns are capitalized. Previously, I had kept the titles of my citations in title-case. The consequence of having to use the APA format was the need to then go in and manually lower case all words that were not proper-nouns in my bibliographic database. However, once this work was done, I realized keeping my data in sentence-case was preferable, as title-case essentially loses information. Yet, this still requires me to manually lowercase some words for automatically captured sources. I am not aware of any bibliographic software that handles this issue well, and the good folks at Zotero have an interesting bug ticket open on the issue.
On Friday, while I was doing my weekly fixes to the automatically captured sources in my field notes/mindmap/bibliography, I thought to myself that there are plenty of word lists around, such as those used by spellcheckers, and couldn’t I finally automate this menial task? However, I knew that I use lots of proper nouns that probably do not appear in common dictionaries. Therefore, I applied Python’s Natural Language Toolkit tokenizer and parts of speech tagger to the text of my dissertation to create a custom word list of proper-nouns that I use. These are used with the dictionary found on my system at
/usr/share/dict/american-english to transform a title-cased sentence into a sentence-cased sentence. Basically, if the word is in my custom list, is in the word list only as a capitalized word, or not in the word list at all, it merits capitalization, else lower-case it. The code is available as change_case.py within the Thunderdell bibliographic tools. It works fairly well and will certainly make that end of the week menial task all the more easier.