Working with XML Files in Python

XML (eXtensible Markup Language) is a widely used format for storing and transporting structured data. Python provides several libraries to read, write, modify, and parse XML files.

Why Use XML?

Stores structured data in a human-readable format.
Used in web services, APIs, and configuration files.
Easily integrates with databases and data exchange formats.

Python Libraries for XML Handling:

xml.etree.ElementTree – Built-in, easy-to-use, lightweight.
lxml – Faster, supports XPath & XSLT (Requires installation).
xml.dom.minidom – Provides DOM-like XML manipulation.

1. Reading an XML File (`xml.etree.ElementTree`)

Sample XML File (`data.xml`):

<employees>
    <employee id="101">
        <name>John Doe</name>
        <position>Software Engineer</position>
        <salary>80000</salary>
    </employee>
    <employee id="102">
        <name>Jane Smith</name>
        <position>Data Analyst</position>
        <salary>75000</salary>
    </employee>
</employees>

Reading and Parsing XML:

import xml.etree.ElementTree as ET

# Load XML file
tree = ET.parse("data.xml")
root = tree.getroot()

# Print root tag
print(f"Root element: {root.tag}")

# Iterate through child elements
for employee in root.findall("employee"):
    emp_id = employee.get("id")  # Get attribute value
    name = employee.find("name").text
    position = employee.find("position").text
    salary = employee.find("salary").text
    print(f"ID: {emp_id}, Name: {name}, Position: {position}, Salary: {salary}")

Key Methods:

.parse("file.xml") → Loads XML file.
.getroot() → Returns the root element.
.find("tag") → Finds the first occurrence of a tag.
.findall("tag") → Finds all occurrences of a tag.
.get("attribute") → Gets an attribute value.

2. Writing an XML File

We can create an XML structure and save it to a file.

Creating an XML File (`employees.xml`):

import xml.etree.ElementTree as ET

# Create root element
root = ET.Element("employees")

# Create child elements
emp1 = ET.SubElement(root, "employee", id="101")
ET.SubElement(emp1, "name").text = "John Doe"
ET.SubElement(emp1, "position").text = "Software Engineer"
ET.SubElement(emp1, "salary").text = "80000"

emp2 = ET.SubElement(root, "employee", id="102")
ET.SubElement(emp2, "name").text = "Jane Smith"
ET.SubElement(emp2, "position").text = "Data Analyst"
ET.SubElement(emp2, "salary").text = "75000"

# Convert to XML tree
tree = ET.ElementTree(root)
tree.write("employees.xml")

print("XML file created successfully!")

Key Functions:

ET.Element("tag") → Creates a root element.
ET.SubElement(parent, "tag") → Adds child elements.
.text = "value" → Sets element text content.
ET.ElementTree(root).write("file.xml") → Saves XML file.

3. Modifying an XML File

Modify an existing XML file by updating element values.

Example: Updating Salary of an Employee

import xml.etree.ElementTree as ET

tree = ET.parse("employees.xml")
root = tree.getroot()

# Find employee with id=101 and update salary
for emp in root.findall("employee"):
    if emp.get("id") == "101":
        emp.find("salary").text = "85000"  # Update salary

# Save changes
tree.write("employees.xml")

print("XML file updated successfully!")

Use .find(tag).text to update element values.

4. Deleting an Element

Remove an employee record from XML.

import xml.etree.ElementTree as ET

tree = ET.parse("employees.xml")
root = tree.getroot()

# Find and remove employee with id=102
for emp in root.findall("employee"):
    if emp.get("id") == "102":
        root.remove(emp)

# Save changes
tree.write("employees.xml")

print("Employee removed successfully!")

Use .remove(element) to delete a node.

5. Searching and Filtering XML Data

Find employees earning more than $75,000.

import xml.etree.ElementTree as ET

tree = ET.parse("employees.xml")
root = tree.getroot()

for emp in root.findall("employee"):
    salary = int(emp.find("salary").text)
    if salary > 75000:
        print(f"Employee: {emp.find('name').text}, Salary: {salary}")

Convert text to integer before numerical comparisons.

6. Pretty Printing XML (`xml.dom.minidom`)

The default XML output is compact. Use xml.dom.minidom for formatted output.

import xml.etree.ElementTree as ET
import xml.dom.minidom

tree = ET.parse("employees.xml")
xml_str = ET.tostring(tree.getroot(), encoding="utf-8")
pretty_xml = xml.dom.minidom.parseString(xml_str).toprettyxml()

print(pretty_xml)

Use toprettyxml() to format XML for better readability.

7. Parsing Large XML Files (`iterparse`)

For large XML files, use iterparse() to process data incrementally.

import xml.etree.ElementTree as ET

for event, elem in ET.iterparse("large_file.xml", events=("start", "end")):
    if event == "end" and elem.tag == "employee":
        print(f"Employee: {elem.find('name').text}")
        elem.clear()  # Free memory

Use .clear() to release memory after processing each node.

8. Using `lxml` for XPath Queries

lxml supports XPath, allowing advanced XML queries.

from lxml import etree

tree = etree.parse("employees.xml")
employees = tree.xpath("//employee[salary>75000]/name/text()")

print("High salary employees:", employees)

Install lxml:

pip install lxml

XPath Example Queries:

//employee/name → Selects all <name> elements.
//employee[@id='101'] → Selects employee with id=101.
//employee[salary>75000] → Selects employees earning more than 75,000.

9. Handling XML Errors (`try-except`)

Catch errors when working with XML files.

import xml.etree.ElementTree as ET

try:
    tree = ET.parse("employees.xml")
    root = tree.getroot()
except FileNotFoundError:
    print("Error: XML file not found!")
except ET.ParseError:
    print("Error: XML file is malformed!")

Common Errors:

FileNotFoundError → XML file does not exist.
ET.ParseError → XML is malformed or corrupted.

10. Summary Table

Operation	Method	Example
Read XML File	`ET.parse("file.xml")`	`root = tree.getroot()`
Find Element	`.find("tag")`	`name = emp.find("name").text`
Find All Elements	`.findall("tag")`	`for emp in root.findall("employee")`
Modify Element	`.text = "new value"`	`emp.find("salary").text = "85000"`
Delete Element	`.remove(element)`	`root.remove(emp)`
Write XML	`ET.ElementTree(root).write("file.xml")`	`tree.write("employees.xml")`
Pretty Print	`xml.dom.minidom.parseString()`	`toprettyxml()`
Parse Large XML	`ET.iterparse("file.xml")`	Efficient memory usage

1. Reading an XML File (xml.etree.ElementTree)

Sample XML File (data.xml):