Python lxml example find link by text content
The lxml is a Pythonic binding for the C libraries libxml2 and libxslt which quite easy to use. Combined with XPath, you can use it to do almost any queries against XML document.
This example shows how to find element with specified text content and print the element with lxml and XPath parser in Python.
If you didn't install lxml, to install lxml on Windows check How to install lxml for Python 3.4.3 on Windows.
Suppose we have a HTML which contains a list of hyperlinks.
The HTML file contains the content below, xml-xpath-example-test.html:
<div class="list"> <ul> <li><a href="hello.html">hello</li> <li><a href="world.html">world</li> </ul> </div>
Here is the Python code to locate the link with text content "hello" and then output the url and text of the link
import lxml.html from lxml import etree html = lxml.html.parse("xml-xpath-example-test.html") link = html.xpath("/html/body/div[@class='list']//li/a[text()='hello']") print("link text: " + link.text) print("link url: " +link.get("href")) #output link text: hello link url: hello.html