Working with XML Files in Python

Loading

XML (eXtensible Markup Language) is a widely used format for storing and transporting structured data. Python provides several libraries to read, write, modify, and parse XML files.

Why Use XML?

  • Stores structured data in a human-readable format.
  • Used in web services, APIs, and configuration files.
  • Easily integrates with databases and data exchange formats.

Python Libraries for XML Handling:

  1. xml.etree.ElementTreeBuilt-in, easy-to-use, lightweight.
  2. lxmlFaster, supports XPath & XSLT (Requires installation).
  3. xml.dom.minidomProvides DOM-like XML manipulation.

1. Reading an XML File (xml.etree.ElementTree)

Sample XML File (data.xml):

<employees>
<employee id="101">
<name>John Doe</name>
<position>Software Engineer</position>
<salary>80000</salary>
</employee>
<employee id="102">
<name>Jane Smith</name>
<position>Data Analyst</position>
<salary>75000</salary>
</employee>
</employees>

Reading and Parsing XML:

import xml.etree.ElementTree as ET

# Load XML file
tree = ET.parse("data.xml")
root = tree.getroot()

# Print root tag
print(f"Root element: {root.tag}")

# Iterate through child elements
for employee in root.findall("employee"):
emp_id = employee.get("id") # Get attribute value
name = employee.find("name").text
position = employee.find("position").text
salary = employee.find("salary").text
print(f"ID: {emp_id}, Name: {name}, Position: {position}, Salary: {salary}")

Key Methods:

  • .parse("file.xml") → Loads XML file.
  • .getroot() → Returns the root element.
  • .find("tag") → Finds the first occurrence of a tag.
  • .findall("tag") → Finds all occurrences of a tag.
  • .get("attribute") → Gets an attribute value.

2. Writing an XML File

We can create an XML structure and save it to a file.

Creating an XML File (employees.xml):

import xml.etree.ElementTree as ET

# Create root element
root = ET.Element("employees")

# Create child elements
emp1 = ET.SubElement(root, "employee", id="101")
ET.SubElement(emp1, "name").text = "John Doe"
ET.SubElement(emp1, "position").text = "Software Engineer"
ET.SubElement(emp1, "salary").text = "80000"

emp2 = ET.SubElement(root, "employee", id="102")
ET.SubElement(emp2, "name").text = "Jane Smith"
ET.SubElement(emp2, "position").text = "Data Analyst"
ET.SubElement(emp2, "salary").text = "75000"

# Convert to XML tree
tree = ET.ElementTree(root)
tree.write("employees.xml")

print("XML file created successfully!")

Key Functions:

  • ET.Element("tag") → Creates a root element.
  • ET.SubElement(parent, "tag") → Adds child elements.
  • .text = "value" → Sets element text content.
  • ET.ElementTree(root).write("file.xml") → Saves XML file.

3. Modifying an XML File

Modify an existing XML file by updating element values.

Example: Updating Salary of an Employee

import xml.etree.ElementTree as ET

tree = ET.parse("employees.xml")
root = tree.getroot()

# Find employee with id=101 and update salary
for emp in root.findall("employee"):
if emp.get("id") == "101":
emp.find("salary").text = "85000" # Update salary

# Save changes
tree.write("employees.xml")

print("XML file updated successfully!")

Use .find(tag).text to update element values.


4. Deleting an Element

Remove an employee record from XML.

import xml.etree.ElementTree as ET

tree = ET.parse("employees.xml")
root = tree.getroot()

# Find and remove employee with id=102
for emp in root.findall("employee"):
if emp.get("id") == "102":
root.remove(emp)

# Save changes
tree.write("employees.xml")

print("Employee removed successfully!")

Use .remove(element) to delete a node.


5. Searching and Filtering XML Data

Find employees earning more than $75,000.

import xml.etree.ElementTree as ET

tree = ET.parse("employees.xml")
root = tree.getroot()

for emp in root.findall("employee"):
salary = int(emp.find("salary").text)
if salary > 75000:
print(f"Employee: {emp.find('name').text}, Salary: {salary}")

Convert text to integer before numerical comparisons.


6. Pretty Printing XML (xml.dom.minidom)

The default XML output is compact. Use xml.dom.minidom for formatted output.

import xml.etree.ElementTree as ET
import xml.dom.minidom

tree = ET.parse("employees.xml")
xml_str = ET.tostring(tree.getroot(), encoding="utf-8")
pretty_xml = xml.dom.minidom.parseString(xml_str).toprettyxml()

print(pretty_xml)

Use toprettyxml() to format XML for better readability.


7. Parsing Large XML Files (iterparse)

For large XML files, use iterparse() to process data incrementally.

import xml.etree.ElementTree as ET

for event, elem in ET.iterparse("large_file.xml", events=("start", "end")):
if event == "end" and elem.tag == "employee":
print(f"Employee: {elem.find('name').text}")
elem.clear() # Free memory

Use .clear() to release memory after processing each node.


8. Using lxml for XPath Queries

lxml supports XPath, allowing advanced XML queries.

from lxml import etree

tree = etree.parse("employees.xml")
employees = tree.xpath("//employee[salary>75000]/name/text()")

print("High salary employees:", employees)

Install lxml:

pip install lxml

XPath Example Queries:

  • //employee/name → Selects all <name> elements.
  • //employee[@id='101'] → Selects employee with id=101.
  • //employee[salary>75000] → Selects employees earning more than 75,000.

9. Handling XML Errors (try-except)

Catch errors when working with XML files.

import xml.etree.ElementTree as ET

try:
tree = ET.parse("employees.xml")
root = tree.getroot()
except FileNotFoundError:
print("Error: XML file not found!")
except ET.ParseError:
print("Error: XML file is malformed!")

Common Errors:

  • FileNotFoundError → XML file does not exist.
  • ET.ParseError → XML is malformed or corrupted.

10. Summary Table

OperationMethodExample
Read XML FileET.parse("file.xml")root = tree.getroot()
Find Element.find("tag")name = emp.find("name").text
Find All Elements.findall("tag")for emp in root.findall("employee")
Modify Element.text = "new value"emp.find("salary").text = "85000"
Delete Element.remove(element)root.remove(emp)
Write XMLET.ElementTree(root).write("file.xml")tree.write("employees.xml")
Pretty Printxml.dom.minidom.parseString()toprettyxml()
Parse Large XMLET.iterparse("file.xml")Efficient memory usage

Leave a Reply

Your email address will not be published. Required fields are marked *