Org to XML

Volume 4, Issue 5; 20 Jan 2020

Of course I want XML out! I always want XML out.

A while back, I mentioned in passing that I had worked out how to convert my Org files into XML. One of the reasons I only mentioned it in passing was that I didn’t think I had done it very well.

There were two issues: first, my Emacs lisp coding skills are rusty. There’s a lot of richness in the Org mode data structures and I wasn’t very efficiently or effectively getting information out of them. Second, I realized that what I wanted was an API that made the data structures more regular and easier to access. I wasn’t really in a position to write that because see the first issue.

Time passed and Nate Dwarshuis published a fabulous functional API for org-mode: om.el. That totally resolved the second issue.

I spent a few hours on the plane yesterday rewriting my org-to-xml code on top of that library. The new library, om-to-xml, supercedes org-to-xml, though I’ve left that in the package for the time being. My Emacs lisp may still be a bit amateurish, but it’s getting better. Leveraging Nate’s work also made everything easier.

The goal, as before, is not to have an XML export of the Org mode file (like the HTML or TeX exports), but rather to have an XML version of the Org mode data structures. What this means is that todo items, deadlines, links, macros, property drawers, etc. are all preserved in the output. That makes the Org mode artifact (more) amenable to downstream processing with XML tools.

Some features:

  • Better handling of subscripts and superscripts.
  • Support for macros and other Org mode features I had previously missed.
  • Better handling of source blocks.
  • A small amount of extensibility.

The extensibility hooks consist of some simple configurations (ignored properties, support for insignificant newlines in the XML, etc.) and the ability to write your own function for handling specific node types.

For example, I use a couple of different kinds of blocks in my weblog posts. One of them is for marginalLike this one. notes:

Like this one.

To Org mode, that’s just a “special block” with a particular type. It’s more convenient for me if it becomes its own element. To achieve that, I add my own handler:

(add-to-list 'om-to-xml-element-handlers
             '(special-block . ndw/om-special-block-to-xml))

That handler deals with a couple of different special block types, outputting my own custom XML markup.

It may still be a niche project, but it’s a better implementation of a niche project!