Reading Open Document with Python

July 6, 2007

The open document format is technically a JAR file [1] that encapsulates (mostly) XML files. The file content such as headers, paragraphs, table cells can be easily accessed using Python’s built-in zipfile and xml libraries (e.g. DOM, Xpath). Imagine all the possibility you can do!

In this example I will open an ODS (spreadsheet) file and print out the sheet name and the content of the first cell of every sheet. It uses Python’s XML minidom implementation.

import zipfile
import xml.dom.minidom

# Xml namespace for open document table tags
OD_TABLE_NS = 'urn:oasis:names:tc:opendocument:xmlns:table:1.0'

def get_text(node):
    text = ''
    for child in node.childNodes:
        if child.nodeType == child.ELEMENT_NODE:
            text = text+get_text(child)
        elif child.nodeType == child.TEXT_NODE:
            text = text+child.nodeValue

    return text

zip_data = zipfile.ZipFile('FileName.ods', 'r')
content ='content.xml')
content = xml.dom.minidom.parseString(content)

# It is important to search using namespace
sheet_list = content.getElementsByTagNameNS(OD_TABLE_NS, 'table')

for sheet in sheet_list:
    sheet_name = sheet.getAttributeNS(OD_TABLE_NS, 'name')
    cells = sheet.getElementsByTagNameNS(OD_TABLE_NS,

    if len(cells) > 0:
        print "%s: %s" (sheet_name, get_text(cells[0]))


So here is even more reason to use the open document format, go download OpenOffice ;)

[1] Seriously I don’t actually understand the difference between a ZIP and a JAR file. Anybody care to explain…

  1. sejauh yang aku tahu, jar=zip, lebih detilnya baca di wiki deh:

  2. mas, bisa minta tutorial python yg lain? :)
    atau punya source tertentu di web? minta linknya ya? Makasih

  3. In line 30 you are missing the % after print “%s: %s” :)

  4. And thanks! This was very helpful for me!

