2007 Jun 20 | Creating a Semester's Class Schedule
I just discovered Python's awesome dateutil package which
implements much of the iCalendar standard, including recurrences!
Consequently, it's trivial to generate a calendar for the days classes
meet. I assume with a little work one could even handle the holidays.
In any case, here's an example:
#!/usr/bin/python2.5
from dateutil.rrule import *
from dateutil.parser import *
sem_start = '20070903T140000'
sem_end = '20071212T140000'
days = MO,WE
meetings = list(rrule(WEEKLY, wkst=SU, byweekday=(days),
dtstart=parse(sem_start), until=parse(sem_end)))
for meeting in meetings:
print meeting.strftime("%b %d %a")
this entry posted to
technology/python;
comments (0)
2007 Feb 08 | ZotZero and BusySponge
I have been reading of ZotZero in Josh's
blog and am hopeful that it will help bridge the gap between the
dynamic and informal life of the Web (e.g., reading, blogging,
bookmarks, RSS, etc.) and the seemingly lifeless task of bibliography.
Wouldn't it be nice if citing something was as easy as bookmarking it?
Or, if you could read what your colleagues were reading via an RSS feed?
While I haven't played with ZotZero
yet -- and I use the Konqueror browser not Firefox -- I share this
vision and hope to see it become a reality. And since I recently posted
of my Freemind Extract tool (for transforming a mindmap into a
bibliography) I realize I haven't spoken of the flipside a couple of
years: absorbing information. But first, a historical digression.
The way I make note of and annotate resources and tasks evolved out
of two practices at the W3C. The first of which was a decree by Timbl
which I objected to strongly at the time: the great datespace shift of
1999. Because the W3C's root file/name space was getting too crowded,
Tim's new policy forbid new top-level spaces like www.w3.org/Signature or www.w3.org/Encryption.
There were too many already and who were we to lay claim to such spaces
for all time? There might be a new digital signature activity 10 years
from now, so where would they live? (Consequently, the subsequent key
management working group received www.w3.org/2001/XKMS.)
I appreciated this concern at the root level, but cringed at only being
able to organize other files by date of creation. Try finding a
document you wrote a couple of years ago in a space no more structured
than /2001/{01,..,12} and is shared by 50+ other
people. It's not easy. I realize the only way I could keep track of
things I had worked on was to have a log of events and documents I
cared about. (This shift also affected how we collaborated in our
shared space given issues of ownership, access controls, and version
management -- but perhaps more on that another time.)
The second W3C practice was that each of its hosts (worksites) had a
weekly meeting at which we shared the important events of the past week
and raised agenda issues for common discussion. To make it easier for
the minute takers we e-mailed two minutes to an e-mail list and a bot
would collect them into draft minutes which would be augmented with the
IRC log.
Preparing my two minutes before 10 a.m. Tuesday morning always
seemed more frantic than it need be. But, once I started keeping a log
of what I had done as a result of the datespace shift, it became
trivial. (In fact, I wrote a script to grab the past week
automatically, and even generated a RSS feed from the work log so that
one could "subscribe" to my work log by keyword/task -- anticipating
RSS feeds of tagged bookmarks.)
By 2002 I had tired of manually logging events, via an HTML editor,
to my personal blog and work log, so I wrote a specification for a
dream tool: Busy Sponge. It would soak up everything I touched of importance and send it to the right place. I opted for a commandline tool I named b.py.
Returning to today, and a challenge I'm sure I share with
the ZotZero folks, is how to automatically scrape as much metadata
as
possible from a Web resource? Busy Sponge continues to be the primary
way I input data into my work log and mind maps. Because metadata is no
more common or standard on the Web as it was five years ago I am
dependent on screen scraping heuristics. For example, the following
code allows me to easily capture and cite messages of Wikipedia mailing
lists -- and that is why it was such a hassle when the archives broke:
elif url.startswith("http://marc.theaimsgroup.com/"):
try:
author = re.search('''From: *(.*?)''', html).group(1)
except AttributeError:
author = re.search('''From: *(.*)''', html).group(1)
author = author.replace(' () ','@').replace(' ! ','.')\
.replace('<', '<').replace('>', '>')
author = author.split(' <')[0]
author = author.replace('"','')
mlist = re.search('''List: *(.*?)''', html).group(1)
mdate = re.search('''Date: *(.*?)''', html).group(1)
...
Unfortunately, beyond a couple mailing list archives and wikis --
which, fortunately, are the majority of what I grab -- I have to
manually edit my sponges with proper meta/bibliographic data. And
curses upon those bloggers who make it difficult to determine the
author of an article or even the whole blog -- even a pseudonym will
do! Beyond the usage of my tool, I can imagine much value in a social
tool that allows users to share annotations, or even screen-scraping
"plug-ins." One can hope!
this entry posted to
technology/python;
comments (0)
I am releasing version 0.6 of
the fe mindmapping bibliographic tools. As
explained in Extracting
Bibliographies from Freemind, these are python scripts that are able to
convert between Freemind
mindmaps (using a few simple conventions) and bibliographic formats (i.e.,
OO.org CSV and bibtex). It also makes it very easy for me to search my notes and quote authors
(e.g., "Giddens").
There are no massive changes, just the usual tweaks and bug fixes. One
notable change is the regular expressions in pe.py are much improved,
and it's quite uncanny at extracting bibliographic keys of the form
'Snide and Smith (2003)' or '(Snide, Smith and Smittie 2004)' from
natural language text.
this entry posted to
technology/python;
comments (0)
I am releasing a new zipfile of
the fe mindmapping bibliographic tools. As
explained in Extracting
Bibliographies from Freemind, these are python scripts that are able to
convert between Freemind
mindmaps (using a few simple conventions) and bibliographic formats (i.e.,
OO.org CSV and bibtex). This approach is preferable to other bibliographic
tools with limited/constrained forms for text entry. With
fe one has a complete outline/map of texts, with
figures, images, tables, links to sites, etc.; one can easily organize texts
by topic or in separate mindmap files; and one can generate queries where
each matching line has its appropriate citation with year and page number
(e.g., "Giddens").
Unlike many bibliographic tools, it does not query on-line databases, but one
can use such tools (e.g., tellico or refworks) to query and generate bibtex
bibliographies and then use be.py to convert them to a mindmap.
- fe.py: extract bibliographic data from
bibliographic MM (dependent on XML ElementTree and
optionally bibtex2html)
- this version is faster since it uses XML ElementTree
instead of XML
Tramp.
- given a list of authors cited (*.rl, such as that generated by
pe.py or pyblink) bibtex2html will
generate a bibliography of only those authors.
- bibliographic maps are searchable from the command-line or via the
Web (e.g., search
results for "Giddens" in my mindmap [java|flash]).
- a Web of mindmaps can be searched for essential entries
(the title is bold) and placed in a new mindmap for studying.
fe.py -h (help)
-v (output csv)
-c (chase links between MMs)
-w (output bibtex & html file) -a (include abstracts)
-s (use bibtex style)
-q (query)
-e (create new MM of essential works)
- be.py: extract a MM from a bibtex file (dependent on bibstuff)
- de.py: extract a MM from a dictated text file
- ff.py: fix the case of titles of a bibliographic MM
- pe.py: extract the bibliographic keys of the form 'Snide and Smith
(2003)' or '(Snide, Smith and Smittie 2004)' from natural language
text
- te.py: parse inconsistently formatted textual bibliographies into
bibliographic MM (e.g., from syllabi, cb2Bib is cool too)
this entry posted to
technology/python;
comments (0)
2005 Feb 14 | XML ElementTree Data Model
I've been playing with Fredrik Lundh's ElementTree as an
intuitive/pythonic way of processing XML. (While I like Aaron Swarz's XML Tramp, it
is presently unsupported; ElementTree is fast and has XPath
support.)
ElementTree Conventions
this entry posted to
technology/python;
comments (1)
2004 May 28 | Upgrading to Pyblosxom 1.0
Upgrading my
pyblosxom install can be a bit tricky:
- `kdiff3 pyblosxom.cgi ~/data/2web/reagle/joseph/pyblosxom.cgi`
- `kdiff3 config.py ~/data/2web/pyblosxom/web/config.py`
- copy files from the new version into the existing install making sure
to remove files no longer necessary and keep my own plugins and lucene
install etc.,
- Get the flavourdir tweak from CVS and copy to
pyblosxom/Pyblosxom/renderers/blosxom.py
- I used to be able to set the default py['parser'] = 'textile' and it
would would with ".txt" files, no longer works, and since there's only a
few I rename them to ".txtl"
- How do I get rid of the automatically generated "<h2>Thu, 27 May
2004</h2>"?
Generate an empty file: `echo >> date_head.html`
- When I create a comment, the comment is created but I get an error:
"2004-05-28 13:55:35,982 INFO Couldn't open latest comment pickle for
writing"
but it's now logged and doesn't throw an exception.
- Trackback has changed! Remove the trackback.cgi file and rely upon the
pybloxsom.cgi (renamed to blog) handler by moving to a trackback URI of
the form http://reagle.org/joseph/blog/trackback/...
this entry posted to
technology/python;
comments (0)
2004 May 14 | Mailbox pretty print
This small script [mbx-pp.py]
will take the plain-text portions of messages in a Unix mailbox and turn them
into a pretty HTML document.
this entry posted to
technology/python;
comments (0)
2004 Feb 13 | Pyblosxom Autoping
I recently re-discovered Sam Ruby's automatic trackback
scipt for pyblosxom entries and made some tweaks for my own purposes:
- Permits paths/categories, the original did not make use of them.
- Grabs configuration information from the pyblosxom config.py file.
- Works with my htmlentry
plugin.
Unfortunately, I noted — too late — that an autoping.py also
comes with pyblosxom, which includes some of Wari's
tweaks for caching, so that's three variants! Oh well, perhaps someone
will have some time to re-factor the best of each.
this entry posted to
technology/python;
comments (1)
I've been using the Freemind mind-mapper to represent my
readings. While I'm not terrible fond of Java — startup/exit are
very slow and I prefer Python obviously — the application is very nice.
Fortunately, the data format is in XML, though a rather odd schema, so I can
easily go at it with Python and xmltramp
regardless.
Freemind extract relies upon a particular patterns.xml (in tar ball below)
and certain conventions in the mind-map to create bibtex or OpenOffice.org
CSV files.
- Authors are green and bound to F3.
- Titles are navy blue and bound to F4.
- Excerpts are blue and bound to F5.
- Annotations are purple and bound to F6.
- Abstracts are gray and bound to F7.
- Citations are magenta and bound to F8.
- My comments are left black, or F1
A citation node looks like "y=2000 p=Basic Books a=New York,
NY".
The resulting OpenOffice.org semi-colon delimited file will have this
entry (many thanks to David
Wilson for answering my many questions):
"Lessig 2000";"1";"New York, NY";;"Lessig,
Lawrence";;;;;;;;;;;;;"Basic Books";;;"Code: And Other Laws of
Cyberspace";;;"2000";;;;;;;;
The BibTex entry will look like:
@book{Lessig2000,
address = {New York, NY},
author = {Lawrence Lessig},
publisher = {Basic Books},
title = {Code: And Other Laws of Cyberspace},
year = {2000},
}
The tar file includes these
utilities:
- be.py: extract a MM from a bibtex file (dependent on bibstuff)
- de.py: extract a MM from a dictated text file
- fe.py: extract bibliographic data from bibliographic
MM (dependent on xmltramp)
- ff.py: fix the case of titles of a bibliographic MM
- te.py: parse inconsistently formatted textual bibliographies into
bibliographic MM (e.g., from syllabi, cb2Bib is cool too)
this entry posted to
technology/python;
comments (0)
2003 Nov 14 | XML Tramp Data Model
I've been playing with Aaron Swarz's XML Tramp as
an intuitive/pythonic way of processing XML. The model/syntax isn't
explicitly documented, but from the source and (mostly) examples, this is
what I've figured out:
XML Tramp Conventions
- Elements have a list of children '[]'
- iterate over children:
for child in doc
- get first and second (splice) child:
doc[0:2]
- get named child element book:
doc.book or
doc['book']
- Elements have a dict of attributes '()'
- test for an attribute COLOR:
if 'COLOR' in
doc.book()
- get attribute COLOR value:
doc.book('COLOR')
- assign attribute value:
doc.book(COLOR='blue')
- Namespaces (NS) are indicated with a period, the period indicates its
not a literal value (as with quotes), but a namespace corresponding to an
arbitrary but specified prefix
- NS qualified elments now appear unquoted within brackets:
doc[ns.book]
- NS qualified attributes now appear unquoted within parenthesis:
doc(ns.COLOR)
- To reserialize an object use
print doc.__repr__(True)
this entry posted to
technology/python;
comments (3)