ElementTree, serialization and namespace prefixes
Here is an example of such an ouput :
>>> import cElementTree as etree
>>> stream = """<?xml version="1.0" encoding="UTF-8" ?>
... <doc xmlns="http://bar"
... xmlns:foo="http://foo/">
... <foo:sub/>
... </doc>""
>>>
>>> doc = etree.XML(stream)
>>> print etree.tostring(doc, encoding="UTF-8")
<?xml version="1.0" encoding="UTF-8" ?>
<ns0:doc xmlns:ns0="http://bar">
<ns1:sub xmlns:ns1="http://foo" />
</ns0:doc>
>>>
We can see that the declared namespaces are now given an alias and all prefixes are now changed using those defined aliases. This is absolutley correct in a XML point of view but you can be in trouble sometimes with some applications for which you are outputing XML from elementtree based Python programs because they do not support this properly on their side.
Here is a workaround I found but I don't know if others exist :
>>> import cElementTreeHere, this has been serialized without replacing the prefixes within qualifed names.
>>> import elementtree.ElementTree
>>>
>>> my_namespaces = {'http://foo' : 'foo',
... 'http://bar/' : bar}
>>> elementtree.ElementTree._namespace_map.update(my_namespaces)
>>>
>>> stream = """<?xml version="1.0" encoding="UTF-8" ?>
... <doc xmlns="http://bar"
... xmlns:foo="http://foo">
... <foo:sub/>
... </doc>"""
>>>
>>> doc = cElementTree.XML(stream)
>>> print cElementTree.tostring(doc)
<bar:doc xmlns="http://bar">
<foo:sub xmlns:foo="http://foo" />
</bar:doc>
The idea is that we are adding well known namespace prefixes to elementtree default ones.
The default elementtre ones are defined within elementtree/ElementTree.py like below :
_namespace_map = {
# "well-known" namespace prefixes
"http://www.w3.org/XML/1998/namespace": "xml",
"http://www.w3.org/1999/xhtml": "html",
"http://www.w3.org/1999/02/22-rdf-syntax-ns#": "rdf",
"http://schemas.xmlsoap.org/wsdl/": "wsdl",
}
This is not the best way I would have hope to find. Please let me know if you know any others.
The problem I had recently was with OpenOffice.org 1.1.x. (I don't know about the version2 though).
I fixed the issue as shown below. I used the nmspace.mod from the OOo dtd to find out the relevant OOo namespaces.
OOo_NS = "http://openoffice.org/2000/"
OFFICE_NS = "%soffice" % OOo_NS
TABLE_NS = "%stable" % OOo_NS
STYLE_NS = "%sstyle" % OOo_NS
TEXT_NS = "%stext" % OOo_NS
META_NS = "%smeta" % OOo_NS
SCRIPT_NS = "%sscript" % OOo_NS
DRAWING_NS = "%sdrawing" % OOo_NS
CHART_NS = "%schart" % OOo_NS
NUMBER_NS = "%snumber" % OOo_NS
DATASTYLE_NS = "%sdatastyle" % OOo_NS
DR3D_NS = "%sdr3d" % OOo_NS
FORM_NS = "%sform" % OOo_NS
CONFIG_NS = "%sconfig" % OOo_NS
FO_NS = "http://www.w3.org/1999/XSL/Format"
XLINK_NS = "http://www.w3.org/1999/xlink"
SVG_NS = "http://www.w3.org/2000/svg"
MATH_NS = "http://www.w3.org/1998/Math/MathML"
# This will be used for the XML serialization and elementtree.
NAMESPACE_MAP = {
OFFICE_NS : 'office',
TABLE_NS : 'table',
STYLE_NS : 'style',
TEXT_NS : 'text',
META_NS : 'meta',
SCRIPT_NS : 'script',
DRAWING_NS : 'drawing',
CHART_NS : 'chard',
NUMBER_NS : 'number',
DATASTYLE_NS : 'datastyle',
DR3D_NS : 'dr3d',
FORM_NS : 'form',
CONFIG_NS : 'config',
MATH_NS : 'math',
SVG_NS : 'svg',
XLINK_NS : 'xlink',
FO_NS : 'fo',
}
import elementtree.ElementTree as etree
etree._namespace_map.update(NAMESPACE_MAP)
(Post originally written by Julien Anguenot on the old Nuxeo blogs.)
Subscribe to Feed
Follow us on Twitter