Skip to content

Reading Open Document with Python

July 6, 2007

The open document format is technically a JAR file [1] that encapsulates (mostly) XML files. The file content such as headers, paragraphs, table cells can be easily accessed using Python’s built-in zipfile and xml libraries (e.g. DOM, Xpath). Imagine all the possibility you can do!

In this example I will open an ODS (spreadsheet) file and print out the sheet name and the content of the first cell of every sheet. It uses Python’s XML minidom implementation.

import zipfile
import xml.dom.minidom

# Xml namespace for open document table tags
OD_TABLE_NS = 'urn:oasis:names:tc:opendocument:xmlns:table:1.0'

def get_text(node):
    text = ''
    for child in node.childNodes:
        if child.nodeType == child.ELEMENT_NODE:
            text = text+get_text(child)
        elif child.nodeType == child.TEXT_NODE:
            text = text+child.nodeValue

    return text

zip_data = zipfile.ZipFile('FileName.ods', 'r')
content = zip_data.read('content.xml')
content = xml.dom.minidom.parseString(content)

# It is important to search using namespace
sheet_list = content.getElementsByTagNameNS(OD_TABLE_NS, 'table')

for sheet in sheet_list:
    sheet_name = sheet.getAttributeNS(OD_TABLE_NS, 'name')
    cells = sheet.getElementsByTagNameNS(OD_TABLE_NS,
        'table-cell')

    if len(cells) > 0:
        print "%s: %s" (sheet_name, get_text(cells[0]))

zip_data.close()

So here is even more reason to use the open document format, go download OpenOffice ;)

[1] Seriously I don’t actually understand the difference between a ZIP and a JAR file. Anybody care to explain…

4 Comments
  1. sejauh yang aku tahu, jar=zip, lebih detilnya baca di wiki deh:
    http://en.wikipedia.org/wiki/JAR_%28file_format%29

  2. mas, bisa minta tutorial python yg lain? :)
    atau punya source tertentu di web? minta linknya ya? Makasih

  3. In line 30 you are missing the % after print “%s: %s” :)

  4. And thanks! This was very helpful for me!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: