How to parse XML with Python and lxml

The XML document is used to send data between client and server, or response from REST service of SOAP service. In this example, we will parse and extract data from XML document with Python and lxml library.

The lxml is a Python library which provided a Pythonic interface for C libraries lixml2 and libxslt. Its fast, reliable and easy to use, if you need to deal with XML document in Python you should check it.

Here is an example XML document test.xml

 
<?xml version="1.0" encoding="utf-8"?>
<response version="1.0">
  <code>200</code>
  <message>Hello</message>
</response>
 

To parse the file directly

 
from lxml import etree
 
doc = etree.parse("test.xml")
print(doc.findtext('code'))
 

Or read from the file and then build XML document with etree.XML

 
from lxml import etree
 
xml = open("test.xml").read()
xml = bytes(bytearray(xml, encoding = 'utf-8'))
doc = etree.XML(xml)
 

Note that lxml don't support Unicode file string with encoding declaration, we can't do this

 
from lxml import etree
 
xml = open("test.xml").read()
print (xml)
doc = etree.XML(xml)
print(doc.findtext('code'))
 

This will get an error

 
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
 

We need to first convert it to bytes input

 
from lxml import etree
 
xml = open("test.xml").read()
xml = bytes(bytearray(xml, encoding = 'utf-8'))
doc = etree.XML(xml)
 

Or just read in bytes input:

 
xml = open("test.xml", "rb").read()
doc = etree.XML(xml)
print(doc.findtext('code'))
 

Wish this post can help you get started with Python and lxml to install lxml on Windows check How to install lxml for Python 3.4.3 on Windows.