I’ve been playing with Fredrik Lundh’s ElementTree as an intuitive/pythonic way of processing XML. (While I like Aaron Swarz’s XML Tramp, it is presently unsupported; ElementTree is fast and has XPath support.)
ElementTree Conventions
Parsing an XML document:
from elementtree.ElementTree import parsetree = parse(filename)
doc = tree.getroot()Element type (name):
print doc.tagElement text:
print doc.textElements have a list of children
- iterate over children:
for child in doc - get first and second (splice) child:
doc[0:2] - get the child of element type book:
doc.find(``'book') append(),insert()andremove()are also supportedgetiterator(tag)returns a list (or another iterable object) of all (descendent) subelements that has the given tag in document order
- iterate over children:
Elements have a dict of attributes
get the attribute dictionary keys:
book.keys()get the attribute dictionary:
book.items()test for an attribute COLOR:
if book.get('COLOR') is not None, or
if 'COLOR' in book.attribget attribute COLOR value:
book.attrib.get('COLOR'), or
book.get('COLOR')assign attribute value:
book.set('COLOR', 'blue')
No data structure is provided for accessing a parent node, however one can easily create a dictionary that yields the parent for any given node:
parent_map = dict([(c,p) for p in tree.getiterator() for c in p])
-
- Find all grandchildren of type ‘author’:
doc.find('*/author')
- Find all grandchildren of type ‘author’:
Namespaces (NS) uses James Clark notation:
- NS qualified elment:
doc = Element("{http://example.com}``doc") - NS qualified attribute:
book.set('{http://example.com}COLOR', 'blue')
- NS qualified elment:
To reserialize an object use
print doc.write(outfile)
Ported/Archived Responses
Martin Thomas on 2007-04-25
Thanks for this useful pockert reference.
Just a quick note,
“Find all grandchildren of type ‘author’: doc.find(’*/author’)” is
incorrect..
the .find method only returns the first ofay items
found. The .findall method will do as described.
//m
Comments !