Guides

Python Parse XML: A Step-by-Step Guide for Scraping and Automation

Tomas Jurgaitis
Last Updated on 2026-02-12

Ready to scale your data?

Subscribe to our newsletter

Whether you are scraping product listings, handling API responses, or parsing sitemap files, XML is one of the most common data formats you’ll run into. While JSON has become the go-to for most APIs, XML still powers a large portion of the web, especially when dealing with enterprise systems, older websites, and RSS feeds.

If you’re building Python-based scraping tools or proxy-powered automation scripts, understanding how to parse XML is essential.

In this guide, we’ll break down multiple ways to parse XML in Python, with working examples, so you can pick the right method for your use case.

What is XML?

XML (eXtensible Markup Language) is a markup language designed to store and transport data. It uses a hierarchical structure, a tag-based structure that resembles HTML, but its purpose is to describe data, not display it.

Here’s a quick example of an XML file:

<products>
  <product>
    <id>1</id>
    <name>Noise Cancelling Headphones</name>
    <price>249.99</price>
  </product>
  <product>
    <id>2</id>
    <name>Wireless Mouse</name>
    <price>29.99</price>
  </product>
</products>

<products>
  <product>
    <id>1</id>
    <name>Noise Cancelling Headphones</name>
    <price>249.99</price>
  </product>
  <product>
    <id>2</id>
    <name>Wireless Mouse</name>
    <price>29.99</price>
  </product>
</products>

This kind of structure is common in eCommerce sitemaps, product feeds, and even some legacy APIs, all of which are goldmines for web scraping and data extraction.

Python Libraries For Parsing XML

Python offers several built-in and third-party libraries for working with XML.

Here are the three most common:

Library	Type	Pros	Best For
xml.etree.ElementTree	Built-in	Lightweight, easy to use	Beginners, simple XML structures
xml.dom.minidom	Built-in	Prettier formatting, DOM-style	Pretty-printing, smaller files
lxml	Third-party	Fast, supports XPath, robust	Large files, complex queries

Method 1: Using ElementTree

Importing and Parsing

import xml.etree.ElementTree as ET
tree = ET.parse('products.xml')
root = tree.getroot()
If you already have the XML as a string (e.g., from an HTTP response), use:
root = ET.fromstring(xml_string)

import xml.etree.ElementTree as ET
tree = ET.parse('products.xml')
root = tree.getroot()
If you already have the XML as a string (e.g., from an HTTP response), use:
root = ET.fromstring(xml_string)

Extracting Data

for product in root.findall('product'):
    name = product.find('name').text
    price = product.find('price').text
    print(f"{name}: ${price}")

for product in root.findall('product'):
    name = product.find('name').text
    price = product.find('price').text
    print(f"{name}: ${price}")

Why Use This?

It’s built-in and requires no installation.
Great for lightweight XML parsing in simple scraping tasks.

Method 2: Using minidom

Minidom provides a DOM-like interface for working with XML documents.

Example

from xml.dom import minidom
dom = minidom.parse('products.xml')
products = dom.getElementsByTagName('product')
for product in products:
    name = product.getElementsByTagName('name')[0].firstChild.nodeValue
    print("Product:", name)

from xml.dom import minidom
dom = minidom.parse('products.xml')
products = dom.getElementsByTagName('product')
for product in products:
    name = product.getElementsByTagName('name')[0].firstChild.nodeValue
    print("Product:", name)

Prettify XML Output

pretty_xml = dom.toprettyxml()
print(pretty_xml)

pretty_xml = dom.toprettyxml()
print(pretty_xml)

Best For:

Pretty-printing.
Smaller XML files.
Not ideal for performance-heavy tasks.

Method 3: Using lxml

Lxml is a third-party library known for its speed and XPath support.

Installation

pip install lxml

pip install lxml

Parsing and Querying with XPath

tree = etree.parse('products.xml')
products = tree.xpath('//product')
for product in products:
    name = product.xpath('name/text()')[0]
    price = product.xpath('price/text()')[0]
    print(f"{name} — ${price}")

tree = etree.parse('products.xml')
products = tree.xpath('//product')
for product in products:
    name = product.xpath('name/text()')[0]
    price = product.xpath('price/text()')[0]
    print(f"{name} — ${price}")

Why use lxml?

Handles large files efficiently.
Ideal for scraping at scale using proxies or automation tools.
XPath makes it easier to target complex elements, especially those deeply nested.

Parsing XML from a Proxy API Request

Let’s say you are scraping a proxy-enabled API that returns an XML response. Here’s how you could handle it:

import requests
from lxml import etree
proxy_url = "http://your-proxy-url:port"
response = requests.get("https://example.com/data.xml", proxies={"http": proxy_url, "https": proxy_url})
tree = etree.fromstring(response.content)
items = tree.xpath('//item')
for item in items:
    title = item.findtext('title')
    print("Title:", title)

import requests
from lxml import etree
proxy_url = "http://your-proxy-url:port"
response = requests.get("https://example.com/data.xml", proxies={"http": proxy_url, "https": proxy_url})
tree = etree.fromstring(response.content)
items = tree.xpath('//item')
for item in items:
    title = item.findtext('title')
    print("Title:", title)

Proxying users often scrape public sitemaps, product feeds, or search engine data that comes in XML format; this approach fits perfectly.

Tips for Working with XML in Python

Use XPath in lxml when working with deeply nested structures.
Always validate your XML source; malformed XML can crash your script.
Convert XML to JSON if your pipeline expects JSON format.
Use proxies if scraping rate-limited XML endpoints, such as sitemap.xml or feed.xml..
Handle encoding (UTF-8, ISO-8859-1, etc.) to avoid UnicodeDecodeErrors.

Conclusion

Knowing how to parse XML in Python using ElementTree, minidom, or lxml gives you a serious edge when building robust scripts that consume structured data. Whether you’re scraping search engine results, parsing sitemaps, or reading product feeds, being comfortable with XML makes you more adaptable and effective.

And if you want your scripts to scale without interruption, pair your XML scraper with residential or datacenter proxies from Proxying.io to bypass geo-blocks, rate limits, and firewalls.

About the author

Tomas Jurgaitis

Tomas Jurgaitis has led PR initiatives at the forefront of tech, blending a sharp eye for storytelling with a deep-rooted curiosity for all things digital. Raised in an environment where innovation was the norm, his passion for the internet and emerging tech came naturally where he regularly crafts how-to tutorials for web scraping.

Python Parse XML: A Step-by-Step Guide for Scraping and Automation

IN THIS ARTICLE:

Ready to scale your data?

Subscribe to our newsletter

What is XML?

Python Libraries For Parsing XML

Method 1: Using ElementTree

Importing and Parsing

Extracting Data

Why Use This?

Method 2: Using minidom

Example

Prettify XML Output

Best For:

Method 3: Using lxml

Installation

Parsing and Querying with XPath

Why use lxml?

Tips for Working with XML in Python

Conclusion

Frequently Asked Questions (FAQs)

Can I extract attributes from XML tags using Python?

What's the difference between fromstring() and parse() when working with XML?

How can I handle malformed XML?

About the author

IN THIS ARTICLE:

Ready to scale your data?

Subscribe to our newsletter

Want to scale your web data gathering with Proxies?

Related articles

How to Use cURL With a Proxy: A Complete Guide for Beginners

How to Use cURL POST with Proxying for Secure API Requests

How to Use cURL Header for Custom HTTP Requests