2007 Jun 20 | Creating a Semester's Class Schedule
I just discovered Python's awesome dateutil package which
implements much of the iCalendar standard, including recurrences!
Consequently, it's trivial to generate a calendar for the days classes
meet. I assume with a little work one could even handle the holidays.
In any case, here's an example:
#!/usr/bin/python2.5
from dateutil.rrule import *
from dateutil.parser import *
sem_start = '20070903T140000'
sem_end = '20071212T140000'
days = MO,WE
meetings = list(rrule(WEEKLY, wkst=SU, byweekday=(days),
dtstart=parse(sem_start), until=parse(sem_end)))
for meeting in meetings:
print meeting.strftime("%b %d %a")
this entry posted to
technology/python;
comments (0)
2007 Feb 08 | ZotZero and BusySponge
I have been reading of ZotZero in Josh's
blog and am hopeful that it will help bridge the gap between the
dynamic and informal life of the Web (e.g., reading, blogging,
bookmarks, RSS, etc.) and the seemingly lifeless task of bibliography.
Wouldn't it be nice if citing something was as easy as bookmarking it?
Or, if you could read what your colleagues were reading via an RSS feed?
While I haven't played with ZotZero
yet -- and I use the Konqueror browser not Firefox -- I share this
vision and hope to see it become a reality. And since I recently posted
of my Freemind Extract tool (for transforming a mindmap into a
bibliography) I realize I haven't spoken of the flipside a couple of
years: absorbing information. But first, a historical digression.
The way I make note of and annotate resources and tasks evolved out
of two practices at the W3C. The first of which was a decree by Timbl
which I objected to strongly at the time: the great datespace shift of
1999. Because the W3C's root file/name space was getting too crowded,
Tim's new policy forbid new top-level spaces like www.w3.org/Signature or www.w3.org/Encryption.
There were too many already and who were we to lay claim to such spaces
for all time? There might be a new digital signature activity 10 years
from now, so where would they live? (Consequently, the subsequent key
management working group received www.w3.org/2001/XKMS.)
I appreciated this concern at the root level, but cringed at only being
able to organize other files by date of creation. Try finding a
document you wrote a couple of years ago in a space no more structured
than /2001/{01,..,12} and is shared by 50+ other
people. It's not easy. I realize the only way I could keep track of
things I had worked on was to have a log of events and documents I
cared about. (This shift also affected how we collaborated in our
shared space given issues of ownership, access controls, and version
management -- but perhaps more on that another time.)
The second W3C practice was that each of its hosts (worksites) had a
weekly meeting at which we shared the important events of the past week
and raised agenda issues for common discussion. To make it easier for
the minute takers we e-mailed two minutes to an e-mail list and a bot
would collect them into draft minutes which would be augmented with the
IRC log.
Preparing my two minutes before 10 a.m. Tuesday morning always
seemed more frantic than it need be. But, once I started keeping a log
of what I had done as a result of the datespace shift, it became
trivial. (In fact, I wrote a script to grab the past week
automatically, and even generated a RSS feed from the work log so that
one could "subscribe" to my work log by keyword/task -- anticipating
RSS feeds of tagged bookmarks.)
By 2002 I had tired of manually logging events, via an HTML editor,
to my personal blog and work log, so I wrote a specification for a
dream tool: Busy Sponge. It would soak up everything I touched of importance and send it to the right place. I opted for a commandline tool I named b.py.
Returning to today, and a challenge I'm sure I share with
the ZotZero folks, is how to automatically scrape as much metadata
as
possible from a Web resource? Busy Sponge continues to be the primary
way I input data into my work log and mind maps. Because metadata is no
more common or standard on the Web as it was five years ago I am
dependent on screen scraping heuristics. For example, the following
code allows me to easily capture and cite messages of Wikipedia mailing
lists -- and that is why it was such a hassle when the archives broke:
elif url.startswith("http://marc.theaimsgroup.com/"):
try:
author = re.search('''From: *(.*?)''', html).group(1)
except AttributeError:
author = re.search('''From: *(.*)''', html).group(1)
author = author.replace(' () ','@').replace(' ! ','.')\
.replace('<', '<').replace('>', '>')
author = author.split(' <')[0]
author = author.replace('"','')
mlist = re.search('''List: *(.*?)''', html).group(1)
mdate = re.search('''Date: *(.*?)''', html).group(1)
...
Unfortunately, beyond a couple mailing list archives and wikis --
which, fortunately, are the majority of what I grab -- I have to
manually edit my sponges with proper meta/bibliographic data. And
curses upon those bloggers who make it difficult to determine the
author of an article or even the whole blog -- even a pseudonym will
do! Beyond the usage of my tool, I can imagine much value in a social
tool that allows users to share annotations, or even screen-scraping
"plug-ins." One can hope!
this entry posted to
technology/python;
comments (0)
I am releasing version 0.6 of
the fe mindmapping bibliographic tools. As
explained in Extracting
Bibliographies from Freemind, these are python scripts that are able to
convert between Freemind
mindmaps (using a few simple conventions) and bibliographic formats (i.e.,
OO.org CSV and bibtex). It also makes it very easy for me to search my notes and quote authors
(e.g., "Giddens").
There are no massive changes, just the usual tweaks and bug fixes. One
notable change is the regular expressions in pe.py are much improved,
and it's quite uncanny at extracting bibliographic keys of the form
'Snide and Smith (2003)' or '(Snide, Smith and Smittie 2004)' from
natural language text.
this entry posted to
technology/python;
comments (0)
I am releasing a new zipfile of
the fe mindmapping bibliographic tools. As
explained in Extracting
Bibliographies from Freemind, these are python scripts that are able to
convert between Freemind
mindmaps (using a few simple conventions) and bibliographic formats (i.e.,
OO.org CSV and bibtex). This approach is preferable to other bibliographic
tools with limited/constrained forms for text entry. With
fe one has a complete outline/map of texts, with
figures, images, tables, links to sites, etc.; one can easily organize texts
by topic or in separate mindmap files; and one can generate queries where
each matching line has its appropriate citation with year and page number
(e.g., "Giddens").
Unlike many bibliographic tools, it does not query on-line databases, but one
can use such tools (e.g., tellico or refworks) to query and generate bibtex
bibliographies and then use be.py to convert them to a mindmap.
- fe.py: extract bibliographic data from
bibliographic MM (dependent on XML ElementTree and
optionally bibtex2html)
- this version is faster since it uses XML ElementTree
instead of XML
Tramp.
- given a list of authors cited (*.rl, such as that generated by
pe.py or pyblink) bibtex2html will
generate a bibliography of only those authors.
- bibliographic maps are searchable from the command-line or via the
Web (e.g., search
results for "Giddens" in my mindmap [java|flash]).
- a Web of mindmaps can be searched for essential entries
(the title is bold) and placed in a new mindmap for studying.
fe.py -h (help)
-v (output csv)
-c (chase links between MMs)
-w (output bibtex & html file) -a (include abstracts)
-s (use bibtex style)
-q (query)
-e (create new MM of essential works)
- be.py: extract a MM from a bibtex file (dependent on bibstuff)
- de.py: extract a MM from a dictated text file
- ff.py: fix the case of titles of a bibliographic MM
- pe.py: extract the bibliographic keys of the form 'Snide and Smith
(2003)' or '(Snide, Smith and Smittie 2004)' from natural language
text
- te.py: parse inconsistently formatted textual bibliographies into
bibliographic MM (e.g., from syllabi, cb2Bib is cool too)
this entry posted to
technology/python;
comments (0)
2005 Apr 16 | Encrypted Files Systems
In moving to Kubuntu 5.04 from a
Knoppix install, my loop-aes partitions are no longer readable. Since crypto-loop is
being deprecated anyway, I thought I would try the dm-crypt. However,
because I would have to employ that on top of a file loop, it's a hassle.
Fortunately, I bumped into EncFS. Generally, I like
it a lot and is comparable to crypto-loop except when it comes to a USB
drive. A copy of a 2GB file to a
- a normal IDE partition: ~17MB/s,
- an external vfat USB2 drive (ehci_hcd): ~11MB/s,
- an encfs directory on the IDE drive: ~7MB/s,
- an encrypted directory on the external drive: 64KB/s.
Ouch!
Interestingly an ext3 formatted loop device (no encryption) on the
external drive is ~17MB/s and with crypto-loop it is ~10MB/s. Now, here's
the real kicker: an ext3 loop partition sitting on the vfat external drive,
hosting an encfs directory is ~14MB/s! So, vfat sucks -- though encfs on vfat
aprobably doesn't have to do quite so poorly. Or to put it another way, it's
faster to put a 3GB file on the external vfat drive (vfat is very compatible
with many computers), mount it as an ext3 device loop, and run encfs on top
of that than it is to access the plain old vfat file system. (This is even
slightly faster than running encfs on the local IDE drive!)
this entry posted to
technology;
comments (0)
2005 Mar 11 | Mythbusters and Buttered Toast
A segment on tonight's MythBusters
addresed the question of "whether buttered toast falls buttered side up or
down more often?" This is one of my favorite daily puzzles that can be
addressed by a basic understanding of experimentation and statistics. My own
curiosity on this question was satisfied by a segment of Newton's Apple -- if
my memory is correct -- which found that it is the typical height of the
table surface which determines the, originally, upward facing side falling on
the floor. Pushing the toast from a ladder completely reversed this trend as
the toast could tumble a full 360° and land in its original orientation:
buttered side up.
Yet, notice the MythBusters question: it asks if toast being buttered
effects how it ends up -- regardless of its original orientation, even if
that is buttered up in most all daily cases. So first they had to find a way
to drop toast in an unbiased way independent of the original orientation. Not
surprisingly, Adam found that pushing it from the table was not satisfactory
on this note. Eventually they developed a machine that dropped unbuttered
toast landing up 11 times, and down 13 times -- orientation was determined by
a magic marker X which we must assume is unbiasing. It is reasonable to
conclude that 11 up and 13 down is indicative of a "fair" mechanism. Now when
they buttered a side of 24 slices of toast they also found 12 up and 12 down.
These sample sizes are too small, but roughly, it does not appear that the
butter had any effect!
However, when they drop the toast from a two-story building (27'5") and
find that the dry toast side X lands up 26 out of 48 drops (54%) and the
buttered side X lands up 29 out of 48 drops (60%), Jamie posits that the 6%
discrepancy is because he could see that the buttered side had a concave
impression, and like a leaf, the convex non-buttered side tended to fall face
down. Adam concludes, "if you really want to ensure, in general, you're toast
landing buttered side up or down, we can tell you, you should butter with a
good vigor and that the resultant bowl will make your toast generally fall
butter side up." However, though he "generally" qualified his statement,
strictly speaking, it is not statistically supported and when Jamie is
offering a mechanism for a perceived statistical finding, he is premature.
(However, if he is offering a simple observation, that's all it is.)
In this case, the null hypothesis is that the difference between the dry
54% and the buttered 60% is just due to chance. (Or, if we were to repeat the
experiment, it's probable that a similar skew would happen.) The alternate
theory is that there is some causal mechanism (i.e. the bowl shaped
impression) that affects the outcome. If we can show that there is a low
probability of repeating the experiment and observing a similar significance
of difference (6%), that implies support for the alternative hypothesis.
Unfortunately, neither test alone is statistically significant. For example,
the probability of getting 29 out of 48 drops buttered side up even on a fair
coin is 8.5 %.
z = (observed - expected) / StandardError
z = (29 - 24) / Sqrt(48)*Sqrt(.5*.5) = 1.445
=> P = 8.5%
The random chance of getting 26 buttered side up his 27%.
The probability that the difference between getting 26 in the "dry"
control case, and 29 in the buttered case also is 27% and not significant.
z = (observed - expected) / StandardErrorofDifference
z= ((60%-54%) - 0%) / Sqrt((SEdry)^2 + (SEbuttered)^2)
z=6% / Sqrt(7.19^2 + 7.07^2)% = .5950
=> P = 27%
this entry posted to
technology;
comments (0)
2005 Feb 14 | XML ElementTree Data Model
I've been playing with Fredrik Lundh's ElementTree as an
intuitive/pythonic way of processing XML. (While I like Aaron Swarz's XML Tramp, it
is presently unsupported; ElementTree is fast and has XPath
support.)
ElementTree Conventions
this entry posted to
technology/python;
comments (1)
2004 Oct 08 | Klipper DCOP Trick
I seemingly spend a lot of time running web pages on my local file system
through lynx for the textual information, and then cutting-and-pasting it
into an email message. The following function will add the text dump of an
html file to the KDE clipboard automatically.
function lyd { dcop klipper klipper setClipboardContents "`lynx
-dump $@`" ; }
this entry posted to
technology;
comments (0)
2004 May 28 | Upgrading to Pyblosxom 1.0
Upgrading my
pyblosxom install can be a bit tricky:
- `kdiff3 pyblosxom.cgi ~/data/2web/reagle/joseph/pyblosxom.cgi`
- `kdiff3 config.py ~/data/2web/pyblosxom/web/config.py`
- copy files from the new version into the existing install making sure
to remove files no longer necessary and keep my own plugins and lucene
install etc.,
- Get the flavourdir tweak from CVS and copy to
pyblosxom/Pyblosxom/renderers/blosxom.py
- I used to be able to set the default py['parser'] = 'textile' and it
would would with ".txt" files, no longer works, and since there's only a
few I rename them to ".txtl"
- How do I get rid of the automatically generated "<h2>Thu, 27 May
2004</h2>"?
Generate an empty file: `echo >> date_head.html`
- When I create a comment, the comment is created but I get an error:
"2004-05-28 13:55:35,982 INFO Couldn't open latest comment pickle for
writing"
but it's now logged and doesn't throw an exception.
- Trackback has changed! Remove the trackback.cgi file and rely upon the
pybloxsom.cgi (renamed to blog) handler by moving to a trackback URI of
the form http://reagle.org/joseph/blog/trackback/...
this entry posted to
technology/python;
comments (0)
2004 May 14 | Mailbox pretty print
This small script [mbx-pp.py]
will take the plain-text portions of messages in a Unix mailbox and turn them
into a pretty HTML document.
this entry posted to
technology/python;
comments (0)