ElementTree, serialization and namespace prefixes

The way ElementTree outputs namespaces in serialized output can be a problem with some applications.

Here is an example of such an ouput :
  >>> import cElementTree as etree
>>> stream = """<?xml version="1.0" encoding="UTF-8" ?>
... <doc xmlns="http://bar"
... xmlns:foo="http://foo/">
... <foo:sub/>
... </doc>""
>>>
>>> doc = etree.XML(stream)
>>> print etree.tostring(doc, encoding="UTF-8")
<?xml version="1.0" encoding="UTF-8" ?>
<ns0:doc xmlns:ns0="http://bar">
<ns1:sub xmlns:ns1="http://foo" />
</ns0:doc>
>>>

We can see that the declared namespaces are now given an alias and all prefixes are now changed using those defined aliases. This is absolutley correct in a XML point of view but you can be in trouble sometimes with some applications for which you are outputing XML from elementtree based Python programs because they do not support this properly on their side.

Here is a workaround I found but I don't know if others exist :

  >>> import cElementTree
>>> import elementtree.ElementTree
>>>
>>> my_namespaces = {'http://foo' : 'foo',
... 'http://bar/' : bar}
>>> elementtree.ElementTree._namespace_map.update(my_namespaces)
>>>
>>> stream = """<?xml version="1.0" encoding="UTF-8" ?>
... <doc xmlns="http://bar"
... xmlns:foo="http://foo">
... <foo:sub/>
... </doc>"""
>>>
>>> doc = cElementTree.XML(stream)
>>> print cElementTree.tostring(doc)
<bar:doc xmlns="http://bar">
<foo:sub xmlns:foo="http://foo" />
</bar:doc>
Here, this has been serialized without replacing the prefixes within qualifed names.

The idea is that we are adding well known namespace prefixes to elementtree default ones.

The default elementtre ones are defined within elementtree/ElementTree.py like below :
  _namespace_map = {
# "well-known" namespace prefixes
"http://www.w3.org/XML/1998/namespace": "xml",
"http://www.w3.org/1999/xhtml": "html",
"http://www.w3.org/1999/02/22-rdf-syntax-ns#": "rdf",
"http://schemas.xmlsoap.org/wsdl/": "wsdl",
}

This is not the best way I would have hope to find. Please let me know if you know any others.

The problem I had recently was with OpenOffice.org 1.1.x.  (I don't know about the version2 though).

I could parse and serialize OpenOffice.org content XML documents and read them from OpenOffice.org at first. But as soon as I was modifiying the document from OpenOffice.org then it wasn't taking the namespace prefix aliases into consideration while inserting new elements. I used this trick and now OpenOffice.org is happy. I'm gonna report this issue to Laurent to see if the OpenOffice.org guys are aware about this issue.

I fixed the issue as shown below. I used the nmspace.mod from the OOo dtd to find out the relevant OOo namespaces.

OOo_NS = "http://openoffice.org/2000/"

OFFICE_NS = "%soffice" % OOo_NS
TABLE_NS = "%stable" % OOo_NS
STYLE_NS = "%sstyle" % OOo_NS
TEXT_NS = "%stext" % OOo_NS
META_NS = "%smeta" % OOo_NS
SCRIPT_NS = "%sscript" % OOo_NS
DRAWING_NS = "%sdrawing" % OOo_NS
CHART_NS = "%schart" % OOo_NS
NUMBER_NS = "%snumber" % OOo_NS
DATASTYLE_NS = "%sdatastyle" % OOo_NS
DR3D_NS = "%sdr3d" % OOo_NS
FORM_NS = "%sform" % OOo_NS
CONFIG_NS = "%sconfig" % OOo_NS

FO_NS = "http://www.w3.org/1999/XSL/Format"
XLINK_NS = "http://www.w3.org/1999/xlink"
SVG_NS = "http://www.w3.org/2000/svg"
MATH_NS = "http://www.w3.org/1998/Math/MathML"
# This will be used for the XML serialization and elementtree.
NAMESPACE_MAP = {
OFFICE_NS : 'office',
TABLE_NS : 'table',
STYLE_NS : 'style',
TEXT_NS : 'text',
META_NS : 'meta',
SCRIPT_NS : 'script',
DRAWING_NS : 'drawing',
CHART_NS : 'chard',
NUMBER_NS : 'number',
DATASTYLE_NS : 'datastyle',
DR3D_NS : 'dr3d',
FORM_NS : 'form',
CONFIG_NS : 'config',
MATH_NS : 'math',
SVG_NS : 'svg',
XLINK_NS : 'xlink',
FO_NS : 'fo',
}

import elementtree.ElementTree as etree
etree._namespace_map.update(NAMESPACE_MAP)


Important announcement: Join the Nuxeo team and contribute to the Nuxeo project! We have open positions in France and the UK for open source Java EE developers and sales engineers, both junior and senior.

Like this post? Share it:


Trackback Pings

Trackback URL for this entry:
http://blogs.nuxeo.com/sections/blogs/julien_anguenot/2006_02_23_elementtree-serialization-namespace-prefixes/tbping
Posted by Julien Anguenot @ 02/23/2006 03:56 PM. - Categories: coding, openoffice, python -  1 comments

Nuxeo Bloggers: Log in!
Nuxeo - Indesko - Nuxeo 5 Project
All content is copyrighted by their author.
CPSSkins is Copyright © 2003-2006 by Jean-Marc Orliaguet. | CPS is Copyright © 2002-2006 by Nuxeo SAS.