Python lxml example find link by text content

The lxml is a Pythonic binding for the C libraries libxml2 and libxslt which quite easy to use. Combined with XPath, you can use it to do almost any queries against XML document.

This example shows how to find element with specified text content and print the element with lxml and XPath parser in Python.

If you didn't install lxml, to install lxml on Windows check How to install lxml for Python 3.4.3 on Windows.

Suppose we have a HTML which contains a list of hyperlinks.

The HTML file contains the content below, xml-xpath-example-test.html:

 
<div class="list">
  <ul>
    <li><a href="hello.html">hello</li>
    <li><a href="world.html">world</li>
  </ul>
</div>  
 

Here is the Python code to locate the link with text content "hello" and then output the url and text of the link

 
import lxml.html
from lxml import etree
 
html = lxml.html.parse("xml-xpath-example-test.html")
link = html.xpath("/html/body/div[@class='list']//li/a[text()='hello']")
print("link text: " + link[0].text)
print("link url: " +link[0].get("href"))
 
#output
link text: hello
link url: hello.html