so

Org to XML

Volume 4, Issue 5; 20 Jan 2020

Of course I want XML out! I always want XML out.

A while back, I mentioned in passing that I had worked out how to convert my Org files into XML. One of the reasons I only mentioned it in passing was that I didn’t think I had done it very well.

There were two issues: first, my Emacs lisp coding skills are rusty. There’s a lot of richness in the Org mode data structures and I wasn’t very efficiently or effectively getting information out of them. Second, I realized that what I wanted was an API that made the data structures more regular and easier to access. I wasn’t really in a position to write that because see the first issue.

Time passed and Nate Dwarshuis published a fabulous functional API for org-mode: om.el. That totally resolved the second issue.

I spent a few hours on the plane yesterday rewriting my org-to-xml code on top of that library. The new library, om-to-xml, supercedes org-to-xml, though I’ve left that in the package for the time being. My Emacs lisp may still be a bit amateurish, but it’s getting better. Leveraging Nate’s work also made everything easier.

The goal, as before, is not to have an XML export of the Org mode file (like the HTML or TeX exports), but rather to have an XML version of the Org mode data structures. What this means is that todo items, deadlines, links, macros, property drawers, etc. are all preserved in the output. That makes the Org mode artifact (more) amenable to downstream processing with XML tools.

Some features:

  • Better handling of subscripts and superscripts.
  • Support for macros and other Org mode features I had previously missed.
  • Better handling of source blocks.
  • A small amount of extensibility.

The extensibility hooks consist of some simple configurations (ignored properties, support for insignificant newlines in the XML, etc.) and the ability to write your own function for handling specific node types.

For example, I use a couple of different kinds of blocks in my weblog posts. One of them is for marginalLike this one. notes:

#+BEGIN_MARGINNOTE
Like this one.
#+END_MARGINNOTE

To Org mode, that’s just a “special block” with a particular type. It’s more convenient for me if it becomes its own element. To achieve that, I add my own handler:

(add-to-list 'om-to-xml-element-handlers
             '(special-block . ndw/om-special-block-to-xml))

That handler deals with a couple of different special block types, outputting my own custom XML markup.

It may still be a niche project, but it’s a better implementation of a niche project!

Comments

Came across http://ergoemacs.org/emacs/elisp_parse_org_mode.html Thought you may be interested Norm.

—Posted by Dave Pawson on 23 Jan 2020 @ 01:51 UTC #

Very interesting! Thanks, Dave!

Please provide your name and email address. Your email address will not be displayed and I won’t spam you, I promise. Your name and a link to your web address, if you provide one, will be displayed.

Your name:

Your email:

Homepage:

Do you comprehend the words on this page? (Please demonstrate that you aren't a mindless, screen-scraping robot.)

What is ten times six?  (e.g. six plus two is 8)

Enter your comment in the box below. You may style your comment with the CommonMark flavor of Markdown.

All comments are moderated. I don’t promise to preserve all of your formatting and I reserve the right to remove comments for any reason.