Open Codex technology :: python

2007 Jun 20 | Creating a Semester's Class Schedule

I just discovered Python's awesome dateutil package which implements much of the iCalendar standard, including recurrences! Consequently, it's trivial to generate a calendar for the days classes meet. I assume with a little work one could even handle the holidays. In any case, here's an example:

#!/usr/bin/python2.5

from dateutil.rrule import *
from dateutil.parser import *

sem_start = '20070903T140000'
sem_end = '20071212T140000'
days = MO,WE

meetings = list(rrule(WEEKLY, wkst=SU, byweekday=(days),
    dtstart=parse(sem_start), until=parse(sem_end)))
for meeting in meetings:
    print meeting.strftime("%b %d %a")

this entry posted to technology/python;
comments (0)

2007 Feb 08 | ZotZero and BusySponge

I have been reading of ZotZero in Josh's blog and am hopeful that it will help bridge the gap between the dynamic and informal life of the Web (e.g., reading, blogging, bookmarks, RSS, etc.) and the seemingly lifeless task of bibliography. Wouldn't it be nice if citing something was as easy as bookmarking it? Or, if you could read what your colleagues were reading via an RSS feed?

While I haven't played with ZotZero yet -- and I use the Konqueror browser not Firefox -- I share this vision and hope to see it become a reality. And since I recently posted of my Freemind Extract tool (for transforming a mindmap into a bibliography) I realize I haven't spoken of the flipside a couple of years: absorbing information. But first, a historical digression.

The way I make note of and annotate resources and tasks evolved out of two practices at the W3C. The first of which was a decree by Timbl which I objected to strongly at the time: the great datespace shift of 1999. Because the W3C's root file/name space was getting too crowded, Tim's new policy forbid new top-level spaces like www.w3.org/Signature or www.w3.org/Encryption. There were too many already and who were we to lay claim to such spaces for all time? There might be a new digital signature activity 10 years from now, so where would they live? (Consequently, the subsequent key management working group received www.w3.org/2001/XKMS.) I appreciated this concern at the root level, but cringed at only being able to organize other files by date of creation. Try finding a document you wrote a couple of years ago in a space no more structured than /2001/{01,..,12} and is shared by 50+ other people. It's not easy. I realize the only way I could keep track of things I had worked on was to have a log of events and documents I cared about. (This shift also affected how we collaborated in our shared space given issues of ownership, access controls, and version management -- but perhaps more on that another time.)

The second W3C practice was that each of its hosts (worksites) had a weekly meeting at which we shared the important events of the past week and raised agenda issues for common discussion. To make it easier for the minute takers we e-mailed two minutes to an e-mail list and a bot would collect them into draft minutes which would be augmented with the IRC log.

Preparing my two minutes before 10 a.m. Tuesday morning always seemed more frantic than it need be. But, once I started keeping a log of what I had done as a result of the datespace shift, it became trivial. (In fact, I wrote a script to grab the past week automatically, and even generated a RSS feed from the work log so that one could "subscribe" to my work log by keyword/task -- anticipating RSS feeds of tagged bookmarks.)

By 2002 I had tired of manually logging events, via an HTML editor, to my personal blog and work log, so I wrote a specification for a dream tool: Busy Sponge. It would soak up everything I touched of importance and send it to the right place. I opted for a commandline tool I named b.py.

Returning to today, and a challenge I'm sure I share with the ZotZero folks, is how to automatically scrape as much metadata as possible from a Web resource? Busy Sponge continues to be the primary way I input data into my work log and mind maps. Because metadata is no more common or standard on the Web as it was five years ago I am dependent on screen scraping heuristics. For example, the following code allows me to easily capture and cite messages of Wikipedia mailing lists -- and that is why it was such a hassle when the archives broke:

elif url.startswith("http://marc.theaimsgroup.com/"):
	try:
		author = re.search('''From: *(.*?)''', html).group(1)
	except AttributeError:
		author = re.search('''From: *(.*)''', html).group(1)
	author = author.replace(' () ','@').replace(' ! ','.')\
		.replace('&lt;', '<').replace('&gt;', '>')
	author = author.split(' <')[0]
	author = author.replace('"','')

	mlist = re.search('''List: *(.*?)''', html).group(1)

	mdate = re.search('''Date: *(.*?)''', html).group(1)
    ...

Unfortunately, beyond a couple mailing list archives and wikis -- which, fortunately, are the majority of what I grab -- I have to manually edit my sponges with proper meta/bibliographic data. And curses upon those bloggers who make it difficult to determine the author of an article or even the whole blog -- even a pseudonym will do! Beyond the usage of my tool, I can imagine much value in a social tool that allows users to share annotations, or even screen-scraping "plug-ins." One can hope!

this entry posted to technology/python;
comments (0)

2007 Jan 26 | Freemind Bibliography Extract 0.6

I am releasing version 0.6 of the fe mindmapping bibliographic tools. As explained in Extracting Bibliographies from Freemind, these are python scripts that are able to convert between Freemind mindmaps (using a few simple conventions) and bibliographic formats (i.e., OO.org CSV and bibtex). It also makes it very easy for me to search my notes and quote authors  (e.g., "Giddens"). There are no massive changes, just the usual tweaks and bug fixes. One notable change is the regular expressions in pe.py are much improved, and it's quite uncanny at extracting bibliographic keys of the form 'Snide and Smith (2003)' or '(Snide, Smith and Smittie 2004)' from natural language text.

this entry posted to technology/python;
comments (0)

2005 Jun 10 | Mindmapping Bibliographies

I am releasing a new zipfile of the fe mindmapping bibliographic tools. As explained in Extracting Bibliographies from Freemind, these are python scripts that are able to convert between Freemind mindmaps (using a few simple conventions) and bibliographic formats (i.e., OO.org CSV and bibtex). This approach is preferable to other bibliographic tools with limited/constrained forms for text entry. With fe one has a complete outline/map of texts, with figures, images, tables, links to sites, etc.; one can easily organize texts by topic or in separate mindmap files; and one can generate queries where each matching line has its appropriate citation with year and page number (e.g., "Giddens"). Unlike many bibliographic tools, it does not query on-line databases, but one can use such tools (e.g., tellico or refworks) to query and generate bibtex bibliographies and then use be.py to convert them to a mindmap.

this entry posted to technology/python;
comments (0)

2005 Feb 14 | XML ElementTree Data Model

I've been playing with Fredrik Lundh's ElementTree as an intuitive/pythonic way of processing XML. (While I like Aaron Swarz's XML Tramp, it is presently unsupported; ElementTree is fast and has XPath support.)

ElementTree Conventions

this entry posted to technology/python;
comments (1)

2004 May 28 | Upgrading to Pyblosxom 1.0

Upgrading my pyblosxom install can be a bit tricky:

  1. `kdiff3 pyblosxom.cgi ~/data/2web/reagle/joseph/pyblosxom.cgi`
  2. `kdiff3 config.py ~/data/2web/pyblosxom/web/config.py`
  3. copy files from the new version into the existing install making sure to remove files no longer necessary and keep my own plugins and lucene install etc.,
  4. Get the flavourdir tweak from CVS and copy to pyblosxom/Pyblosxom/renderers/blosxom.py
  5. I used to be able to set the default py['parser'] = 'textile' and it would would with ".txt" files, no longer works, and since there's only a few I rename them to ".txtl"
  6. How do I get rid of the automatically generated "<h2>Thu, 27 May 2004</h2>"?

    Generate an empty file: `echo >> date_head.html`

  7. When I create a comment, the comment is created but I get an error:

    "2004-05-28 13:55:35,982 INFO Couldn't open latest comment pickle for writing"

    but it's now logged and doesn't throw an exception.

  8. Trackback has changed! Remove the trackback.cgi file and rely upon the pybloxsom.cgi (renamed to blog) handler by moving to a trackback URI of the form http://reagle.org/joseph/blog/trackback/...

this entry posted to technology/python;
comments (0)

2004 May 14 | Mailbox pretty print

This small script [mbx-pp.py] will take the plain-text portions of messages in a Unix mailbox and turn them into a pretty HTML document.

this entry posted to technology/python;
comments (0)

2004 Feb 13 | Pyblosxom Autoping

I recently re-discovered Sam Ruby's automatic trackback scipt for pyblosxom entries and made some tweaks for my own purposes:

  1. Permits paths/categories, the original did not make use of them.
  2. Grabs configuration information from the pyblosxom config.py file.
  3. Works with my htmlentry plugin.

Unfortunately, I noted — too late — that an autoping.py also comes with pyblosxom, which includes some of Wari's tweaks for caching, so that's three variants! Oh well, perhaps someone will have some time to re-factor the best of each.

this entry posted to technology/python;
comments (1)

2003 Dec 12 | Extracting Bibliographies From Freemind

I've been using the Freemind mind-mapper to represent my readings. While I'm not terrible fond of Java — startup/exit are very slow and I prefer Python obviously — the application is very nice. Fortunately, the data format is in XML, though a rather odd schema, so I can easily go at it with Python and xmltramp regardless.

Freemind extract relies upon a particular patterns.xml (in tar ball below) and certain conventions in the mind-map to create bibtex or OpenOffice.org CSV files.

A citation node looks like "y=2000 p=Basic Books a=New York, NY".

The resulting OpenOffice.org semi-colon delimited file will have this entry (many thanks to David Wilson for answering my many questions):

"Lessig 2000";"1";"New York, NY";;"Lessig,
  Lawrence";;;;;;;;;;;;;"Basic Books";;;"Code: And Other Laws of
  Cyberspace";;;"2000";;;;;;;;

The BibTex entry will look like:

@book{Lessig2000,
   address = {New York, NY},
   author = {Lawrence Lessig},
   publisher = {Basic Books},
   title = {Code: And Other Laws of Cyberspace},
   year = {2000},
}

The tar file includes these utilities:

this entry posted to technology/python;
comments (0)

2003 Nov 14 | XML Tramp Data Model

I've been playing with Aaron Swarz's XML Tramp as an intuitive/pythonic way of processing XML. The model/syntax isn't explicitly documented, but from the source and (mostly) examples, this is what I've figured out:

XML Tramp Conventions

this entry posted to technology/python;
comments (3)

Open Communities, Media, Source, and Standards XML

by Joseph Reagle

powered by pyblosxom


reagle.org

What I'm reading online (blogroll)


Categories

Archives