Whether you are scraping product listings, handling API responses, or parsing sitemap files, XML is one of the most common data formats you’ll run into. While JSON has become the go-to for most APIs, XML still powers a large portion of the web, especially when dealing with enterprise systems, older websites, and RSS feeds.
If you’re building Python-based scraping tools or proxy-powered automation scripts, understanding how to parse XML is essential.
In this guide, we’ll break down multiple ways to parse XML in Python, with working examples, so you can pick the right method for your use case.
What is XML?
XML (eXtensible Markup Language) is a markup language designed to store and transport data. It uses a hierarchical structure, a tag-based structure that resembles HTML, but its purpose is to describe data, not display it.
Here’s a quick example of an XML file:
<products>
<product>
<id>1</id>
<name>Noise Cancelling Headphones</name>
<price>249.99</price>
</product>
<product>
<id>2</id>
<name>Wireless Mouse</name>
<price>29.99</price>
</product>
</products>This kind of structure is common in eCommerce sitemaps, product feeds, and even some legacy APIs, all of which are goldmines for web scraping and data extraction.
Python Libraries For Parsing XML
Python offers several built-in and third-party libraries for working with XML.
Here are the three most common:
| Library | Type | Pros | Best For |
| xml.etree.ElementTree | Built-in | Lightweight, easy to use | Beginners, simple XML structures |
| xml.dom.minidom | Built-in | Prettier formatting, DOM-style | Pretty-printing, smaller files |
| lxml | Third-party | Fast, supports XPath, robust | Large files, complex queries |
Method 1: Using ElementTree
Importing and Parsing
import xml.etree.ElementTree as ET
tree = ET.parse('products.xml')
root = tree.getroot()
If you already have the XML as a string (e.g., from an HTTP response), use:
root = ET.fromstring(xml_string)Extracting Data
for product in root.findall('product'):
name = product.find('name').text
price = product.find('price').text
print(f"{name}: ${price}")Why Use This?
- It’s built-in and requires no installation.
- Great for lightweight XML parsing in simple scraping tasks.
Method 2: Using minidom
Minidom provides a DOM-like interface for working with XML documents.
Example
from xml.dom import minidom
dom = minidom.parse('products.xml')
products = dom.getElementsByTagName('product')
for product in products:
name = product.getElementsByTagName('name')[0].firstChild.nodeValue
print("Product:", name)Prettify XML Output
pretty_xml = dom.toprettyxml()
print(pretty_xml)Best For:
- Pretty-printing.
- Smaller XML files.
- Not ideal for performance-heavy tasks.
Method 3: Using lxml
Lxml is a third-party library known for its speed and XPath support.
Installation
pip install lxmlParsing and Querying with XPath
tree = etree.parse('products.xml')
products = tree.xpath('//product')
for product in products:
name = product.xpath('name/text()')[0]
price = product.xpath('price/text()')[0]
print(f"{name} — ${price}")Why use lxml?
- Handles large files efficiently.
- Ideal for scraping at scale using proxies or automation tools.
- XPath makes it easier to target complex elements, especially those deeply nested.
Parsing XML from a Proxy API Request
Let’s say you are scraping a proxy-enabled API that returns an XML response. Here’s how you could handle it:
import requests
from lxml import etree
proxy_url = "http://your-proxy-url:port"
response = requests.get("https://example.com/data.xml", proxies={"http": proxy_url, "https": proxy_url})
tree = etree.fromstring(response.content)
items = tree.xpath('//item')
for item in items:
title = item.findtext('title')
print("Title:", title)Proxying.io users often scrape public sitemaps, product feeds, or search engine data that comes in XML format; this approach fits perfectly.
Tips for Working with XML in Python
- Use XPath in lxml when working with deeply nested structures.
- Always validate your XML source; malformed XML can crash your script.
- Convert XML to JSON if your pipeline expects JSON format.
- Use proxies if scraping rate-limited XML endpoints, such as sitemap.xml or feed.xml..
- Handle encoding (UTF-8, ISO-8859-1, etc.) to avoid UnicodeDecodeErrors.
Conclusion
Knowing how to parse XML in Python using ElementTree, minidom, or lxml gives you a serious edge when building robust scripts that consume structured data. Whether you’re scraping search engine results, parsing sitemaps, or reading product feeds, being comfortable with XML makes you more adaptable and effective.
And if you want your scripts to scale without interruption, pair your XML scraper with residential or datacenter proxies from Proxying.io to bypass geo-blocks, rate limits, and firewalls.
