Open Codex by Joseph Reagle

Open Codex HISTORICAL entry

2005 Feb 14 | XML ElementTree Data Model

I've been playing with Fredrik Lundh's ElementTree as an intuitive/pythonic way of processing XML. (While I like Aaron Swarz's XML Tramp , it is presently unsupported; ElementTree is fast and has XPath support.)

ElementTree Conventions

Parsing an XML document:
from elementtree.ElementTree import parse

tree = parse(filename)
doc = tree.getroot()
Element type (name): print doc.tag
Element text: print doc.text
Elements have a list of children
- iterate over children: for child in doc
- get first and second (splice) child: doc[0:2]
- get the child of element type book: doc.find( 'book')
- append() , insert() and remove() are also supported
- getiterator(tag) returns a list (or another iterable object) of all (descendent) subelements that has the given tag in document order
Elements have a dict of attributes
- get the attribute dictionary keys: book.keys()
- get the attribute dictionary: book.items()
- test for an attribute COLOR:
  if book.get('COLOR') is not None , or
  if 'COLOR' in book.attrib
- get attribute COLOR value:
  book.attrib.get('COLOR') , or
  book.get('COLOR')
- assign attribute value: book.set('COLOR', 'blue')
No data structure is provided for accessing a parent node, however one can easily create a dictionary that yields the parent for any given node:
- parent_map = dict([(c,p) for p in tree.getiterator() for c in p])
XPath Support
- Find all grandchildren of type 'author': doc.find('*/author')
Namespaces (NS) uses James Clark notation:
- NS qualified elment: doc = Element("{http://example.com} doc")
- NS qualified attribute: book.set('{http://example.com}COLOR', 'blue')
To reserialize an object use print doc.write(outfile)

this entry posted to technology/python ;
comments (1)

Posted by Martin Thomas at Wed Apr 25 17:58:46 2007
Thanks for this useful pockert reference.
Just a quick note, "Find all grandchildren of type 'author': doc.find('*/author')" is incorrect..
the .find method only returns the first ofay items found. The .findall method will do as described.

//m

Open Codex HISTORICAL entry

2005 Feb 14 | XML ElementTree Data Model

ElementTree Conventions

Parsing an XML document:
from elementtree.ElementTree import parse

tree = parse(filename)
doc = tree.getroot()
Element type (name): print doc.tag
Element text: print doc.text
Elements have a list of children
- iterate over children: for child in doc
- get first and second (splice) child: doc[0:2]
- get the child of element type book: doc.find( 'book')
- append() , insert() and remove() are also supported
- getiterator(tag) returns a list (or another iterable object) of all (descendent) subelements that has the given tag in document order
Elements have a dict of attributes
- get the attribute dictionary keys: book.keys()
- get the attribute dictionary: book.items()
- test for an attribute COLOR:
  if book.get('COLOR') is not None , or
  if 'COLOR' in book.attrib
- get attribute COLOR value:
  book.attrib.get('COLOR') , or
  book.get('COLOR')
- assign attribute value: book.set('COLOR', 'blue')
No data structure is provided for accessing a parent node, however one can easily create a dictionary that yields the parent for any given node:
- parent_map = dict([(c,p) for p in tree.getiterator() for c in p])
XPath Support
- Find all grandchildren of type 'author': doc.find('*/author')
Namespaces (NS) uses James Clark notation:
- NS qualified elment: doc = Element("{http://example.com} doc")
- NS qualified attribute: book.set('{http://example.com}COLOR', 'blue')
To reserialize an object use print doc.write(outfile)

this entry posted to technology/python ;
comments (1)

Open Codex HISTORICAL entry

2005 Feb 14 | XML ElementTree Data Model

ElementTree Conventions

Parsing an XML document:
from elementtree.ElementTree import parse

tree = parse(filename)
doc = tree.getroot()
Element type (name): print doc.tag
Element text: print doc.text
Elements have a list of children
- iterate over children: for child in doc
- get first and second (splice) child: doc[0:2]
- get the child of element type book: doc.find( 'book')
- append() , insert() and remove() are also supported
- getiterator(tag) returns a list (or another iterable object) of all (descendent) subelements that has the given tag in document order
Elements have a dict of attributes
- get the attribute dictionary keys: book.keys()
- get the attribute dictionary: book.items()
- test for an attribute COLOR:
  if book.get('COLOR') is not None , or
  if 'COLOR' in book.attrib
- get attribute COLOR value:
  book.attrib.get('COLOR') , or
  book.get('COLOR')
- assign attribute value: book.set('COLOR', 'blue')
No data structure is provided for accessing a parent node, however one can easily create a dictionary that yields the parent for any given node:
- parent_map = dict([(c,p) for p in tree.getiterator() for c in p])
XPath Support
- Find all grandchildren of type 'author': doc.find('*/author')
Namespaces (NS) uses James Clark notation:
- NS qualified elment: doc = Element("{http://example.com} doc")
- NS qualified attribute: book.set('{http://example.com}COLOR', 'blue')
To reserialize an object use print doc.write(outfile)

this entry posted to technology/python ;
comments (1)

Open Codex HISTORICAL entry

2005 Feb 14 | XML ElementTree Data Model

ElementTree Conventions

Parsing an XML document:
from elementtree.ElementTree import parse

tree = parse(filename)
doc = tree.getroot()
Element type (name): print doc.tag
Element text: print doc.text
Elements have a list of children
- iterate over children: for child in doc
- get first and second (splice) child: doc[0:2]
- get the child of element type book: doc.find( 'book')
- append() , insert() and remove() are also supported
- getiterator(tag) returns a list (or another iterable object) of all (descendent) subelements that has the given tag in document order
Elements have a dict of attributes
- get the attribute dictionary keys: book.keys()
- get the attribute dictionary: book.items()
- test for an attribute COLOR:
  if book.get('COLOR') is not None , or
  if 'COLOR' in book.attrib
- get attribute COLOR value:
  book.attrib.get('COLOR') , or
  book.get('COLOR')
- assign attribute value: book.set('COLOR', 'blue')
No data structure is provided for accessing a parent node, however one can easily create a dictionary that yields the parent for any given node:
- parent_map = dict([(c,p) for p in tree.getiterator() for c in p])
XPath Support
- Find all grandchildren of type 'author': doc.find('*/author')
Namespaces (NS) uses James Clark notation:
- NS qualified elment: doc = Element("{http://example.com} doc")
- NS qualified attribute: book.set('{http://example.com}COLOR', 'blue')
To reserialize an object use print doc.write(outfile)

this entry posted to technology/python ;
comments (1)

Open Codex HISTORICAL entry

2005 Feb 14 | XML ElementTree Data Model

Open Communities, Media, Source, and Standards

by Joseph Reagle

Open Codex HISTORICAL entry

2005 Feb 14 | XML ElementTree Data Model

Open Communities, Media, Source, and Standards

by Joseph Reagle

Open Codex HISTORICAL entry

2005 Feb 14 | XML ElementTree Data Model

Open Communities, Media, Source, and Standards

by Joseph Reagle

Open Codex HISTORICAL entry

2005 Feb 14 | XML ElementTree Data Model

Open Communities, Media, Source, and Standards

by Joseph Reagle