Open Codex technology

2007 Jun 20 | Creating a Semester's Class Schedule

I just discovered Python's awesome dateutil package which implements much of the iCalendar standard, including recurrences! Consequently, it's trivial to generate a calendar for the days classes meet. I assume with a little work one could even handle the holidays. In any case, here's an example:

#!/usr/bin/python2.5

from dateutil.rrule import *
from dateutil.parser import *

sem_start = '20070903T140000'
sem_end = '20071212T140000'
days = MO,WE

meetings = list(rrule(WEEKLY, wkst=SU, byweekday=(days),
    dtstart=parse(sem_start), until=parse(sem_end)))
for meeting in meetings:
    print meeting.strftime("%b %d %a")

this entry posted to technology/python;
comments (0)

2007 Feb 08 | ZotZero and BusySponge

I have been reading of ZotZero in Josh's blog and am hopeful that it will help bridge the gap between the dynamic and informal life of the Web (e.g., reading, blogging, bookmarks, RSS, etc.) and the seemingly lifeless task of bibliography. Wouldn't it be nice if citing something was as easy as bookmarking it? Or, if you could read what your colleagues were reading via an RSS feed?

While I haven't played with ZotZero yet -- and I use the Konqueror browser not Firefox -- I share this vision and hope to see it become a reality. And since I recently posted of my Freemind Extract tool (for transforming a mindmap into a bibliography) I realize I haven't spoken of the flipside a couple of years: absorbing information. But first, a historical digression.

The way I make note of and annotate resources and tasks evolved out of two practices at the W3C. The first of which was a decree by Timbl which I objected to strongly at the time: the great datespace shift of 1999. Because the W3C's root file/name space was getting too crowded, Tim's new policy forbid new top-level spaces like www.w3.org/Signature or www.w3.org/Encryption. There were too many already and who were we to lay claim to such spaces for all time? There might be a new digital signature activity 10 years from now, so where would they live? (Consequently, the subsequent key management working group received www.w3.org/2001/XKMS.) I appreciated this concern at the root level, but cringed at only being able to organize other files by date of creation. Try finding a document you wrote a couple of years ago in a space no more structured than /2001/{01,..,12} and is shared by 50+ other people. It's not easy. I realize the only way I could keep track of things I had worked on was to have a log of events and documents I cared about. (This shift also affected how we collaborated in our shared space given issues of ownership, access controls, and version management -- but perhaps more on that another time.)

The second W3C practice was that each of its hosts (worksites) had a weekly meeting at which we shared the important events of the past week and raised agenda issues for common discussion. To make it easier for the minute takers we e-mailed two minutes to an e-mail list and a bot would collect them into draft minutes which would be augmented with the IRC log.

Preparing my two minutes before 10 a.m. Tuesday morning always seemed more frantic than it need be. But, once I started keeping a log of what I had done as a result of the datespace shift, it became trivial. (In fact, I wrote a script to grab the past week automatically, and even generated a RSS feed from the work log so that one could "subscribe" to my work log by keyword/task -- anticipating RSS feeds of tagged bookmarks.)

By 2002 I had tired of manually logging events, via an HTML editor, to my personal blog and work log, so I wrote a specification for a dream tool: Busy Sponge. It would soak up everything I touched of importance and send it to the right place. I opted for a commandline tool I named b.py.

Returning to today, and a challenge I'm sure I share with the ZotZero folks, is how to automatically scrape as much metadata as possible from a Web resource? Busy Sponge continues to be the primary way I input data into my work log and mind maps. Because metadata is no more common or standard on the Web as it was five years ago I am dependent on screen scraping heuristics. For example, the following code allows me to easily capture and cite messages of Wikipedia mailing lists -- and that is why it was such a hassle when the archives broke:

elif url.startswith("http://marc.theaimsgroup.com/"):
	try:
		author = re.search('''From: *(.*?)''', html).group(1)
	except AttributeError:
		author = re.search('''From: *(.*)''', html).group(1)
	author = author.replace(' () ','@').replace(' ! ','.')\
		.replace('&lt;', '<').replace('&gt;', '>')
	author = author.split(' <')[0]
	author = author.replace('"','')

	mlist = re.search('''List: *(.*?)''', html).group(1)

	mdate = re.search('''Date: *(.*?)''', html).group(1)
    ...

Unfortunately, beyond a couple mailing list archives and wikis -- which, fortunately, are the majority of what I grab -- I have to manually edit my sponges with proper meta/bibliographic data. And curses upon those bloggers who make it difficult to determine the author of an article or even the whole blog -- even a pseudonym will do! Beyond the usage of my tool, I can imagine much value in a social tool that allows users to share annotations, or even screen-scraping "plug-ins." One can hope!

this entry posted to technology/python;
comments (0)

2007 Jan 26 | Freemind Bibliography Extract 0.6

I am releasing version 0.6 of the fe mindmapping bibliographic tools. As explained in Extracting Bibliographies from Freemind, these are python scripts that are able to convert between Freemind mindmaps (using a few simple conventions) and bibliographic formats (i.e., OO.org CSV and bibtex). It also makes it very easy for me to search my notes and quote authors  (e.g., "Giddens"). There are no massive changes, just the usual tweaks and bug fixes. One notable change is the regular expressions in pe.py are much improved, and it's quite uncanny at extracting bibliographic keys of the form 'Snide and Smith (2003)' or '(Snide, Smith and Smittie 2004)' from natural language text.

this entry posted to technology/python;
comments (0)

2005 Jun 10 | Mindmapping Bibliographies

I am releasing a new zipfile of the fe mindmapping bibliographic tools. As explained in Extracting Bibliographies from Freemind, these are python scripts that are able to convert between Freemind mindmaps (using a few simple conventions) and bibliographic formats (i.e., OO.org CSV and bibtex). This approach is preferable to other bibliographic tools with limited/constrained forms for text entry. With fe one has a complete outline/map of texts, with figures, images, tables, links to sites, etc.; one can easily organize texts by topic or in separate mindmap files; and one can generate queries where each matching line has its appropriate citation with year and page number (e.g., "Giddens"). Unlike many bibliographic tools, it does not query on-line databases, but one can use such tools (e.g., tellico or refworks) to query and generate bibtex bibliographies and then use be.py to convert them to a mindmap.

this entry posted to technology/python;
comments (0)

2005 Apr 16 | Encrypted Files Systems

In moving to Kubuntu 5.04 from a Knoppix install, my loop-aes partitions are no longer readable. Since crypto-loop is being deprecated anyway, I thought I would try the dm-crypt. However, because I would have to employ that on top of a file loop, it's a hassle. Fortunately, I bumped into EncFS. Generally, I like it a lot and is comparable to crypto-loop except when it comes to a USB drive. A copy of a 2GB file to a

Ouch!

Interestingly an ext3 formatted loop device (no encryption) on the  external drive is ~17MB/s and  with crypto-loop it is ~10MB/s. Now, here's the real kicker:  an ext3 loop partition sitting on the vfat external drive, hosting an encfs directory is ~14MB/s! So, vfat sucks -- though encfs on vfat aprobably doesn't have to do quite so poorly. Or to put it another way, it's faster to put a 3GB file on the external vfat drive (vfat is very compatible with many computers), mount it as an ext3 device loop, and run encfs on top of that than it is to access the plain old vfat file system. (This is even slightly faster than running encfs on the local IDE drive!)

this entry posted to technology;
comments (0)

2005 Mar 11 | Mythbusters and Buttered Toast

A segment on tonight's MythBusters addresed the question of "whether buttered toast falls buttered side up or down more often?" This is one of my favorite daily puzzles that can be addressed by a basic understanding of experimentation and statistics. My own curiosity on this question was satisfied by a segment of Newton's Apple -- if my memory is correct -- which found that it is the typical height of the table surface which determines the, originally, upward facing side falling on the floor. Pushing the toast from a ladder completely reversed this trend as the toast could tumble a full 360° and land in its original orientation: buttered side up.

Yet, notice the MythBusters question: it asks if toast being buttered effects how it ends up -- regardless of its original orientation, even if that is buttered up in most all daily cases. So first they had to find a way to drop toast in an unbiased way independent of the original orientation. Not surprisingly, Adam found that pushing it from the table was not satisfactory on this note. Eventually they developed a machine that dropped unbuttered toast landing up 11 times, and down 13 times -- orientation was determined by a magic marker X which we must assume is unbiasing. It is reasonable to conclude that 11 up and 13 down is indicative of a "fair" mechanism. Now when they buttered a side of 24 slices of toast they also found 12 up and 12 down. These sample sizes are too small, but roughly, it does not appear that the butter had any effect!

However, when they drop the toast from a two-story building (27'5") and find that the dry toast side X lands up 26 out of 48 drops (54%) and the buttered side X lands up 29 out of 48 drops (60%), Jamie posits that the 6% discrepancy is because he could see that the buttered side had a concave impression, and like a leaf, the convex non-buttered side tended to fall face down. Adam concludes, "if you really want to ensure, in general, you're toast landing buttered side up or down, we can tell you, you should butter with a good vigor and that the resultant bowl will make your toast generally fall butter side up." However, though he "generally" qualified his statement, strictly speaking, it is not statistically supported and when Jamie is offering a mechanism for a perceived statistical finding, he is premature. (However, if he is offering a simple observation, that's all it is.)

In this case, the null hypothesis is that the difference between the dry 54% and the buttered 60% is just due to chance. (Or, if we were to repeat the experiment, it's probable that a similar skew would happen.) The alternate theory is that there is some causal mechanism (i.e. the bowl shaped impression) that affects the outcome. If we can show that there is a low probability of repeating the experiment and observing a similar significance of difference (6%), that implies support for the alternative hypothesis. Unfortunately, neither test alone is statistically significant. For example, the probability of getting 29 out of 48 drops buttered side up even on a fair coin is 8.5 %.

z = (observed - expected) / StandardError
z = (29 - 24) / Sqrt(48)*Sqrt(.5*.5) = 1.445
=> P = 8.5%

The random chance of getting 26 buttered side up his 27%.

The probability that the difference between getting 26 in the "dry" control case, and 29 in the buttered case also is 27% and not significant.

z = (observed - expected) / StandardErrorofDifference
z= ((60%-54%) - 0%) / Sqrt((SEdry)^2 + (SEbuttered)^2)
z=6% / Sqrt(7.19^2 + 7.07^2)% = .5950
=> P = 27%

this entry posted to technology;
comments (0)

2005 Feb 14 | XML ElementTree Data Model

I've been playing with Fredrik Lundh's ElementTree as an intuitive/pythonic way of processing XML. (While I like Aaron Swarz's XML Tramp, it is presently unsupported; ElementTree is fast and has XPath support.)

ElementTree Conventions

this entry posted to technology/python;
comments (1)

2004 Oct 08 | Klipper DCOP Trick

I seemingly spend a lot of time running web pages on my local file system through lynx for the textual information, and then cutting-and-pasting it into an email message. The following function will add the text dump of an html file to the KDE clipboard automatically.

function lyd { dcop klipper klipper setClipboardContents "`lynx -dump $@`" ; }

this entry posted to technology;
comments (0)

2004 May 28 | Upgrading to Pyblosxom 1.0

Upgrading my pyblosxom install can be a bit tricky:

  1. `kdiff3 pyblosxom.cgi ~/data/2web/reagle/joseph/pyblosxom.cgi`
  2. `kdiff3 config.py ~/data/2web/pyblosxom/web/config.py`
  3. copy files from the new version into the existing install making sure to remove files no longer necessary and keep my own plugins and lucene install etc.,
  4. Get the flavourdir tweak from CVS and copy to pyblosxom/Pyblosxom/renderers/blosxom.py
  5. I used to be able to set the default py['parser'] = 'textile' and it would would with ".txt" files, no longer works, and since there's only a few I rename them to ".txtl"
  6. How do I get rid of the automatically generated "<h2>Thu, 27 May 2004</h2>"?

    Generate an empty file: `echo >> date_head.html`

  7. When I create a comment, the comment is created but I get an error:

    "2004-05-28 13:55:35,982 INFO Couldn't open latest comment pickle for writing"

    but it's now logged and doesn't throw an exception.

  8. Trackback has changed! Remove the trackback.cgi file and rely upon the pybloxsom.cgi (renamed to blog) handler by moving to a trackback URI of the form http://reagle.org/joseph/blog/trackback/...

this entry posted to technology/python;
comments (0)

2004 May 14 | Mailbox pretty print

This small script [mbx-pp.py] will take the plain-text portions of messages in a Unix mailbox and turn them into a pretty HTML document.

this entry posted to technology/python;
comments (0)

Open Communities, Media, Source, and Standards XML

by Joseph Reagle

powered by pyblosxom


reagle.org

What I'm reading online (blogroll)


Categories

Archives