How can this function be rewritten to implement OrderedDict?

You could use the new OrderedDictdict subclass which was added to the standard library's collections module in version 2.7. Actually what you need is an Ordered+defaultdict combination which doesn't exist — but it's possible to create one by subclassing OrderedDict as illustrated below:

If your version of Python doesn't have OrderedDict, you should be able use Raymond Hettinger's Ordered Dictionary for Py2.4 ActiveState recipe as the base class instead.

import collections

class OrderedDefaultdict(collections.OrderedDict):
    """ A defaultdict with OrderedDict as its base class. """

    def __init__(self, default_factory=None, *args, **kwargs):
        if not (default_factory is None or callable(default_factory)):
            raise TypeError('first argument must be callable or None')
        super(OrderedDefaultdict, self).__init__(*args, **kwargs)
        self.default_factory = default_factory  # called by __missing__()

    def __missing__(self, key):
        if self.default_factory is None:
            raise KeyError(key,)
        self[key] = value = self.default_factory()
        return value

    def __reduce__(self):  # Optional, for pickle support.
        args = (self.default_factory,) if self.default_factory else tuple()
        return self.__class__, args, None, None, iter(self.items())

    def __repr__(self):  # Optional.
        return '%s(%r, %r)' % (self.__class__.__name__, self.default_factory, self.items())

def simplexml_load_file(file):
    from lxml import etree

    tree = etree.parse(file)
    root = tree.getroot()

    def xml_to_item(el):
        item = el.text or None
        child_dicts = OrderedDefaultdict(list)
        for child in el.getchildren():
            child_dicts[child.tag].append(xml_to_item(child))
        return collections.OrderedDict(child_dicts) or item

    def xml_to_dict(el):
        return {el.tag: xml_to_item(el)}

    return xml_to_dict(root)

x = simplexml_load_file('routines/test.xml')
print(x)

for y in x['root']:
    print(y)

The output produced from your test XML file looks like this:

{'root':
    OrderedDict(
        [('a', ['1']),
         ('aa', [OrderedDict([('b', [OrderedDict([('c', ['2'])]), '2'])])]),
         ('aaa', ['3']),
         ('aaaa', [OrderedDict([('bb', ['4'])])]),
         ('aaaaa', ['5'])
        ]
    )
}

a
aa
aaa
aaaa
aaaaa

Which I think is close to what you want.

Minor update:

Added a __reduce__() method which will allow the instances of the class to be pickled and unpickled properly. This wasn't necessary for this question, but came up in a similar one.


The recipe from martineau works for me, but it has problems with the method copy() inherited from DefaultDict. The following approach fix this drawback:

class OrderedDefaultDict(OrderedDict):
    #Implementation as suggested by martineau

    def copy(self):
         return type(self)(self.default_factory, self)

Please consider, that this implementation does no deepcopy, which seems especially for default dictionaries rather the right thing to do in most circumstances


There are many possible implementation of OrderedDict listed in the answer here: How do you retrieve items from a dictionary in the order that they're inserted?

You can create your own OrderedDict module for use in your own code by copying one of the implementations. I assume you do not have access to the OrderedDict because of the version of Python you are running.

One interesting aspect of your question is the possible need for defaultdict functionality. If you need this, you can implement the __missing__ method to get the desired effect.