Convert XML to HTML with lxml XSLT in Python
The source document is a table of contents written in XML format, and we want to get it displayed in HTML. We will use XSLT module of the lxml library in Python
<contents> <part title="Lucene Basics(or Fundamentals)"> <chapter title="Lucene Searching"> <section type="internal" title="Lucene Scoring"> <leaf title="How Lucene scoring works" seotitle="how-lucene-scoring-works"> </leaf> </section> <section type="terminal" title="" seotitle=""> <leaf title="hello world" seotitle="how-lucene-scoring-works"> </leaf> </section> </chapter> </part> <part title="Lucene Index"> <chapter title="Lucene Searching"> <section type="internal" title="Lucene Scoring"> <leaf title="How Lucene indexing works" seotitle="how-lucene-indexing-works"> </leaf> <leaf title="Lucene Index tutorial" seotitle="lucene-index-tutorial"> </leaf> </section> <section type="terminal" title="" seotitle=""> </section> </chapter> </part> </contents>
And the XSLT template used to transform it to HTML
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <div class="toc-contents"> <ul> <xsl:apply-templates/> </ul> </div> </xsl:template> <xsl:template match="part"> <li> <div class="toc-part"> <h1><xsl:value-of select="@title"/></h1> <ul> <xsl:apply-templates select="chapter"/> </ul> </div> </li> </xsl:template> <xsl:template match="chapter"> <li> <div class="toc-chapter"> <h2><xsl:value-of select="@title"/></h2> <ul> <xsl:apply-templates select=".//leaf"/> </ul> </div> </li> </xsl:template> <xsl:template match="leaf"> <li> <a> <xsl:attribute name="href"> http://makble.com/<xsl:value-of select="@seotitle"/> </xsl:attribute> <xsl:value-of select="@title"/></a> </li> </xsl:template> </xsl:stylesheet>
The python code:
import lxml.html from lxml import etree xslt_doc = etree.parse("test-xslt.xslt") xslt_transformer = etree.XSLT(xslt_doc) source_doc = etree.parse("toc-test.xml") output_doc = xslt_transformer(source_doc) print(str(output_doc)) output_doc.write("output-toc.html", pretty_print=True)
In HTML it may looks like this