Access nested children in xml file parsed with ElementTree

Hope it could be useful:

import xml.etree.ElementTree as etree
with open('filename.xml') as tmpfile:
    doc = etree.iterparse(tmpfile, events=("start", "end"))
    doc = iter(doc)
    event, root = doc.next()
    num = 0
    for event, elem in doc:
        print event, elem

Yo have to iter() over your root.

that is root.iter() would do the trick!

import xml.etree.ElementTree as ET
import urllib2
tree =ET.parse(urllib2.urlopen('http://ratings.food.gov.uk/OpenDataFiles/FHRS408en-GB.xml'))
root = tree.getroot()
for child in root.iter():
   print child.tag, child.attrib

Output:

FHRSEstablishment {}
Header {}
ExtractDate {}
ItemCount {}
ReturnCode {}
EstablishmentCollection {}
EstablishmentDetail {}
FHRSID {}
LocalAuthorityBusinessID {}
...
  • To get all tags inside EstablishmentDetail you need to find that tag and then loop through its children!

That is, for example.

for child in root.find('.//EstablishmentDetail'):
    print child.tag, child.attrib

Output:

FHRSID {}
LocalAuthorityBusinessID {}
BusinessName {}
BusinessType {}
BusinessTypeID {}
RatingValue {}
RatingKey {}
RatingDate {}
LocalAuthorityCode {}
LocalAuthorityName {}
LocalAuthorityWebSite {}
LocalAuthorityEmailAddress {}
Scores {}
SchemeType {}
NewRatingPending {}
Geocode {}
  • To get the score for Hygiene as you've mentioned in comment,

What you have done is, it will get the first Scores tag and that will have Hygiene, ConfidenceInManagement, Structural tags as child when you call for each in root.find('.//Scores'):rating=child.get('Hygiene'). That is, obviously all three child will not have the element!

You need to first - find all Scores tag. - find Hygiene in every tags found!

for each in root.findall('.//Scores'):
    rating = each.find('.//Hygiene')
    print '' if rating is None else rating.text

Output:

5
5
5
0
5