« Choosing a Javascript library for Zope | Main | Db4objects and TMT announce their interest in the Apogée project »

Feb 27, 2006

ElementTree, serialization and namespace prefixes

The way ElementTree outputs namespaces in serialized output can be a problem with some applications.

Here is an example of such an ouput :
  >>> import cElementTree as etree
>>> stream = """<?xml version="1.0" encoding="UTF-8" ?>
... <doc xmlns="http://bar"
... xmlns:foo="http://foo/">
... <foo:sub/>
... </doc>""
>>>
>>> doc = etree.XML(stream)
>>> print etree.tostring(doc, encoding="UTF-8")
<?xml version="1.0" encoding="UTF-8" ?>
<ns0:doc xmlns:ns0="http://bar">
<ns1:sub xmlns:ns1="http://foo" />
</ns0:doc>
>>>

We can see that the declared namespaces are now given an alias and all prefixes are now changed using those defined aliases. This is absolutley correct in a XML point of view but you can be in trouble sometimes with some applications for which you are outputing XML from elementtree based Python programs because they do not support this properly on their side.

Here is a workaround I found but I don't know if others exist :

  >>> import cElementTree
>>> import elementtree.ElementTree
>>>
>>> my_namespaces = {'http://foo' : 'foo',
... 'http://bar/' : bar}
>>> elementtree.ElementTree._namespace_map.update(my_namespaces)
>>>
>>> stream = """<?xml version="1.0" encoding="UTF-8" ?>
... <doc xmlns="http://bar"
... xmlns:foo="http://foo">
... <foo:sub/>
... </doc>"""
>>>
>>> doc = cElementTree.XML(stream)
>>> print cElementTree.tostring(doc)
<bar:doc xmlns="http://bar">
<foo:sub xmlns:foo="http://foo" />
</bar:doc>
Here, this has been serialized without replacing the prefixes within qualifed names.

The idea is that we are adding well known namespace prefixes to elementtree default ones.

The default elementtre ones are defined within elementtree/ElementTree.py like below :
  _namespace_map = {
# "well-known" namespace prefixes
"http://www.w3.org/XML/1998/namespace": "xml",
"http://www.w3.org/1999/xhtml": "html",
"http://www.w3.org/1999/02/22-rdf-syntax-ns#": "rdf",
"http://schemas.xmlsoap.org/wsdl/": "wsdl",
}

This is not the best way I would have hope to find. Please let me know if you know any others.

The problem I had recently was with OpenOffice.org 1.1.x.  (I don't know about the version2 though).

I could parse and serialize OpenOffice.org content XML documents and read them from OpenOffice.org at first. But as soon as I was modifiying the document from OpenOffice.org then it wasn't taking the namespace prefix aliases into consideration while inserting new elements. I used this trick and now OpenOffice.org is happy. I'm gonna report this issue to Laurent to see if the OpenOffice.org guys are aware about this issue.

I fixed the issue as shown below. I used the nmspace.mod from the OOo dtd to find out the relevant OOo namespaces.

OOo_NS = "http://openoffice.org/2000/"

OFFICE_NS = "%soffice" % OOo_NS
TABLE_NS = "%stable" % OOo_NS
STYLE_NS = "%sstyle" % OOo_NS
TEXT_NS = "%stext" % OOo_NS
META_NS = "%smeta" % OOo_NS
SCRIPT_NS = "%sscript" % OOo_NS
DRAWING_NS = "%sdrawing" % OOo_NS
CHART_NS = "%schart" % OOo_NS
NUMBER_NS = "%snumber" % OOo_NS
DATASTYLE_NS = "%sdatastyle" % OOo_NS
DR3D_NS = "%sdr3d" % OOo_NS
FORM_NS = "%sform" % OOo_NS
CONFIG_NS = "%sconfig" % OOo_NS

FO_NS = "http://www.w3.org/1999/XSL/Format"
XLINK_NS = "http://www.w3.org/1999/xlink"
SVG_NS = "http://www.w3.org/2000/svg"
MATH_NS = "http://www.w3.org/1998/Math/MathML"
# This will be used for the XML serialization and elementtree.
NAMESPACE_MAP = {
OFFICE_NS : 'office',
TABLE_NS : 'table',
STYLE_NS : 'style',
TEXT_NS : 'text',
META_NS : 'meta',
SCRIPT_NS : 'script',
DRAWING_NS : 'drawing',
CHART_NS : 'chard',
NUMBER_NS : 'number',
DATASTYLE_NS : 'datastyle',
DR3D_NS : 'dr3d',
FORM_NS : 'form',
CONFIG_NS : 'config',
MATH_NS : 'math',
SVG_NS : 'svg',
XLINK_NS : 'xlink',
FO_NS : 'fo',
}

import elementtree.ElementTree as etree
etree._namespace_map.update(NAMESPACE_MAP)


(Post originally written by Julien Anguenot on the old Nuxeo blogs.)

Comments

About Us

We're the friendly employees of Nuxeo, a leading open source software vendor, which develops a complete Enterprise Content Management (ECM) software platform to help companies better produce, process, publish, archive, expose and find their information from digital assets to transactional documents.

» Follow us @nuxeo (Twitter)

» Connect on LinkedIn

» Visit Nuxeo.com

 

Customize & Configure
Nuxeo • Studio

Nuxeo • DM
Online Trial

Nuxeo • DM
Download

Nuxeo • DAM
Download

Nuxeo Connect support