XML ElementTree Data Model

I've been playing with Fredrik Lundh's ElementTree as an intuitive/pythonic way of processing XML. (While I like Aaron Swarz's XML Tramp, it is presently unsupported; ElementTree is fast and has XPath support.)

ElementTree Conventions

  • Parsing an XML document:

    from elementtree.ElementTree import parse

    tree = parse(filename)
    doc = tree.getroot()

  • Element type (name): print doc.tag

  • Element text: print doc.text
  • Elements have a list of children

    • iterate over children: for child in doc
    • get first and second (splice) child: doc[0:2]
    • get the child of element type book: doc.find(``'book')
    • append(), insert() and remove() are also supported
    • getiterator(tag) returns a list (or another iterable object) of all (descendent) subelements that has the given tag in document order
  • Elements have a dict of attributes

    • get the attribute dictionary keys: book.keys()
    • get the attribute dictionary: book.items()
    • test for an attribute COLOR:

      if book.get('COLOR') is not None, or
      if 'COLOR' in book.attrib

    • get attribute COLOR value:

      book.attrib.get('COLOR'), or
      book.get('COLOR')

    • assign attribute value: book.set('COLOR', 'blue')

  • No data structure is provided for accessing a parent node, however one can easily create a dictionary that yields the parent for any given node:

    • parent_map = dict([(c,p) for p in tree.getiterator() for c in p])
  • XPath Support

    • Find all grandchildren of type 'author': doc.find('*/author')
  • Namespaces (NS) uses James Clark notation:

    • NS qualified elment: doc = Element("{http://example.com}``doc")
    • NS qualified attribute: book.set('{http://example.com}COLOR', 'blue')
  • To reserialize an object use print doc.write(outfile)


Ported/Archived Responses

Martin Thomas on 2007-04-25

Thanks for this useful pockert reference.
Just a quick note, "Find all grandchildren of type 'author': doc.find('*/author')" is incorrect..
the .find method only returns the first ofay items found.  The .findall method will do as described.

//m

Comments !

blogroll

social