Python minidom Parsing XML example

To work with the XML in Python, you never run out of libraries, to accomplish a complex job, you have third party library like lxml, but for some simple tasks, it's better start with the built-in xml module like minidom.

This post illustrate how to use minidom with some code examples.

The test XML document

Suppose we are working on the XML document like below

<?xml version="1.0"?>
  <part title="Emacs">
    <chapter title="Search in Emacs">
      <section type="internal" title="Advanced Search">
        <leaf title="Emacs multi-occur Search All Occurrences and List Search Results" seotitle="emacs-multioccur-search-all-occurrences-and-list-search-results"></leaf>
        <leaf title="Emacs Evil mode regular expression search and replace"  seotitle="emacs-evil-mode-regular-expression-search-and-replace"></leaf>
    <chapter title="Configuring Emacs">
      <section type="internal" title="Basic Configuring">
        <leaf title="How to toggle evil mode in Emacs" seotitle="how-to-toggle-evil-mode-in-emacs"></leaf>
        <leaf title="How to set Emacs as editor of Git in Windows" seotitle="how-to-set-emacs-as-editor-of-git-in-windows"></leaf>
        <leaf title="Implementing Goto Anything in Emacs" seotitle="implementing-goto-anything-in-emacs"></leaf>
    <chapter title="Editing in Emacs">
      <section type="internal" title="Basic Editings">
        <leaf title="3 Ways to Replace string in Emacs" seotitle="x-ways-to-replace-string-in-emacs"></leaf>
    <chapter title="Others">
      <section type="internal" title="Fix Problems">
        <leaf title="Warning: Desktop file appears to be in use by PID how to disable" seotitle="warning-desktop-file-appears-to-be-in-use-by-pid-how-to-disable"></leaf>
        <leaf title="Upgrade to Emacs 25 on Windows 64 bit" seotitle="upgrade-to-emacs-25-on-windows-64-bit"></leaf>

Parsing and navigate XML elements

Import the mindom module and parse the document, then query the document with the query API which is very familiar if you have experiences working with HTML DOM with Javascript, the syntax are almost identical.

from xml.dom import minidom
xmlDocument = minidom.parse("test.xml")
allSections = xmlDocument.getElementsByTagName("section")
print ("there are " + str(allSections.length) + " sections")
for section in allSections:
    print("title: " + section.getAttribute("title") + ". contains " + str(section.childNodes.length) + " items")


there are 4 sections
title: Advanced Search. contains 5 items
title: Basic Configuring. contains 7 items
title: Basic Editings. contains 3 items
title: Fix Problems. contains 5 items