Reproducible research

Volume 3, Issue 7; 12 Mar 2019 (updated 07 Jan 2023)

I guess you’re getting a series of posts about Emacs, whether you want them or not. This one is about “reproducible research” or, more specifically, about making prose about code easier to write.

Research is the process of going up alleys to see if they are blind.

Marston Bates

I’m not a researcher in any official capacity, but I am one in the casual sense that I write a lot of documents that have data embedded in them, often snippets of code.

Most recently, I was explaining how to setup authentication with externally signed certificates. There’s a lot of explanation about what has to be setup and how, interspersed regularly with code snippets that the reader is expected to run to get answers that will be used in subsequent code snippets. (And eventually, of course, to be automated so that human error is reduced, but the first step is explaining the process.)

Imagine how you might do this. Open up your editor window and next to it, open up a window to a REPL somewhere. Type in the first code snippet, try to run it until you’ve fixed all the typos, copy and paste it into your document, run it, copy and paste the results into your document. Edit the code. Repeat ad nauseum.

The techniques of reproducible research (which will look familiar to anyone familiar with Literate Programming) offer a much easier and better approach.

Babel is part of Org. It allows you to embed source code of many different kinds I’ve got mine configured for more than thirty languages including Emacs Lisp, Python, Perl, LaTeX, Org, Ruby, Scala, and the shell. directly in your document (with syntax highlighting), run it directly from your document, and automatically insert the results into your document.

I wrote ob-ml-marklogic as a plugin for Babel to support embedded MarkLogic XQuery, JavaScript, and SPARQL. Despite the fact that I implemented it a couple of years ago, it wasn’t until I turned my attention back to org-mode that I picked it up again in earnest.

The externally signed certificates example is too complicated for a weblog posting, and also, I will be quite content if I die without ever having to think about it again.

Consider instead the Fibonacci sequence. I might write a little XQuery code to compute the “nth” number like this:

declare variable $n as xs:integer external;
let $phi := (1 + math:sqrt(5)) div 2.0
  round(math:pow($phi, $n) div math:sqrt(5))

Except that’s not very interesting because in the weblog post, it’s just a code example. Here’s what it looks like in Emacs:

Org Babel Example

Everytime I hit C-c in the source code, it’s evaluated and the results are inserted into my document. If computing the 10th (yes, that HEADER line defines a variable; I can change that and hit C-c to get different results) Fibonacci number doesn’t seem like a very exciting thing to do, consider embedding the results of shell scripts, or Python, or probably your favorite language.

It’s a huge step forward in writing prose that leverages code examples, if you ask me.