so

Make it easy to use

Volume 2, Issue 8; 05 Mar 2018

Very few things are easy to use in absolute terms, but relative improvements can have value as well. Relatively speaking, I think I’ve made processing DocBook documents easier.

Recently, I attempted to reformat some documents that I hadn’t touched in a while. What I discovered was that they were relying on out-of-date stylesheets and a build environment that no longer “just worked” on my system. I decided the quickest thing to do would be to copy a more recent build system from another project and tweak it.

My build system of choice these days is Gradle. Now don’t start, I’m sure there are lots of reasons why you might argue that that isn’t “easy”. I’m a pragmatist:

  • Most of my XML work that isn’t directly in MarkLogic server runs on the JVM. That’s where my XML Calabash implementation(s) are, that’s where Saxon is, etc. Gradle works well with the Java ecosystem.
  • Gradle is the first tool that made it practical for me to use Maven repositories.
  • Maven deals very successfully with managing the software dependencies of a project.
  • Gradle is cross-platform. I recently helped someone get a document build system setup on Windows. Gradle worked flawlessly out of the box except for the bits of my build where I’d lazily left in some make and perl and a stray “cp” command.
  • Gradle is extensible, even if I don’t especially like Groovy.

The build system that I copied was the one for the latest XProc spec. What I noticed as I was tweaking it was that it depended on the DocBook XSLT 2.0 Stylesheets artifact A Maven “artifact” is just a jar file. It’s a way of packaging up a software dependency and sticking it on the web were build tools can find it. The details are unimportant to you, the user, if all you care about is formatting documents. from Maven and it downloaded the stylesheets from https://cdn.docbook.org/ to do the formatting. That shouldn’t be necessary, I thought. (I’ve been burned several times recently by this downloading step when attempting to build documents on trains and planes, so I was predisposed to investing a little time in fixing it.)

[ Here we go. If you’re going to fall down a deep rat hole, make sure there’s lots of yak fur at the bottom to pad your fall. —ed]

Indeed, downloading the stylesheet artifact from Maven wasn’t accomplishing very much. It would be possible, I thought, to use the stylesheets directly from the jar file if I could get a catalog setup correctly.

I had written, and was using, an extension task that makes it easier to use XML Calabash in Gradle. Getting the catalog in place meant refactoring that task in significant ways.

But having that task automatically setup an XML Catalog for the DocBook stylesheets seemed wrong. It’s just about pipeline processing, not DocBook specifically.

So I wrote another extension task for formatting DocBook documents specifically. Logically, that task had to be an extension of the underlying XML Calabash task. That meant a bit more refactoring.

Along the way, I also corrected a problem with the underlying task wherein the documents used by the task didn’t automatically get counted as inputs and outputs for the purpose of Gradle figuring out what tasks needed to be run again when documents changed.

Now that I had a place to stand, I added a bit of code to the DocBook task so that it would construct a bespoke catalog for the stylesheets in the jar file and insert that into the XML Calabash runtime.

Threading all these needles was a little tricky. I ended up putting some debugging code in XML Calabash to help me out. Turns out I had to fix some bugs related to catalog handling. I also applied a bunch of pull requests and fixed a handful of unrelated bugs along the way. (Including the bug that was causing the p:validate-with-relax-ng step to swallow the message that described the actual validation error. I anticipate much rejoicing across the land. Or in my office, anyway.)

After all that, a Gradle script for formatting DocBook documents:

buildscript {
  repositories {
    mavenCentral()
    maven { url "http://maven.restlet.org" }
  }

  dependencies {
    classpath group: 'org.docbook', name: 'docbook-xslt2', version: '2.2.0'
    classpath group: 'com.xmlcalabash', name: 'xmlcalabash1-gradle', version: '1.3.2'
    classpath group: 'org.xmlresolver', name: 'xmlresolver', version: '0.13.1'
  }
}

import org.docbook.DocBookTask
import com.xmlcalabash.XMLCalabashTask

task myDocument(type: DocBookTask) {
  input("source", "document.xml")
  output("result", "output.html")
}

That may not look “easy”, especially if you aren’t a software developer. But if you install Gradle on your platform and run gradle myDocument, you’ll get a formatted document. Assuming, that is, that your document is named document.xml. Replace that with the filename of your actual DocBook document. You can change output.html into something nicer as well while you’re at it. And the name of the task, if you wish.

If you’re curious:

  • The word buildscript and the block enclosed in curly braces that follows is just boilerplate. You don’t have to understand it, but what it says is, this project requires the DocBook XSLT 2.0 stylesheets, the Gradle plugin for running XML Calabash, and my XML Catalog processor.
  • The import statements are also just boilerplate.
  • Finally the task myDocument of type “DocBookTask” transforms the source document document.xml into the output document output.html.

If you’re tempted to say “so what”, at least consider briefly what happens when you run this through Gradle.

  1. It will download the artifacts necessary: the three named explicitly and all of the thirty or so dependencies that you didn’t even know about.
  2. It will cache them locally for you in some location you never have to worry about. And if you have multiple projects that use DocBook, it’ll share them across those projects.
  3. If you end up using different versions of the stylesheets in different projects, that’ll just work as well.
  4. It will arrange for XML Calabash to run with the DocBook pipeline to process your source document.
  5. It will use an XML Catalog that will find the stylesheets directly in the appropriate jar file.
  6. And it will just work on Linux, on the Mac, and on Windows!

If you’ve been processing XML documents since the last millenium or if you’re a software developer, none of those steps will seem difficult [except maybe the Windows thing —ed], but it’s still possible to appreciate that you don’t have to do any of them. If you aren’t a software developer, some of those steps probably read like complete jibberish. That’s ok, because you don’t have to do any of them!

Right. Having got this far, being just about in a position to go back and finish the job I started, it occurred to me that if the stylesheets can be processed this way, shouldn’t it be possible to process the schemas in the same way? [Can you say “displacement activity”? —ed]

Of course, the answer is yes: SMOP. Long story short, I took the DocBook schemas (4.5, 5.0, and 5.1) and packaged them up in a Maven artifact with a little Java shim to construct a bespoke catalog for them as well. Then I went back and extended the DocBookTask so that it will use them if they’re available.

Want to validate your DocBook document before you format it?

buildscript {
  repositories {
    mavenCentral()
    maven { url "http://maven.restlet.org" }
  }

  dependencies {
    classpath group: 'org.docbook', name: 'docbook-xslt2', version: '2.2.0'
    classpath group: 'com.xmlcalabash', name: 'xmlcalabash1-gradle', version: '1.3.2'
    classpath group: 'org.xmlresolver', name: 'xmlresolver', version: '0.13.1'
    // Add this line:
    classpath group: 'org.docbook', name: 'docbook-schemas', version: '5.1-1'
  }
}

import org.docbook.DocBookTask
import com.xmlcalabash.XMLCalabashTask

task myDocument(type: DocBookTask) {
  // And tell the pipeline to validate with the schema
  option("schema", "https://docbook.org/xml/5.1/rng/docbook.rng")
  input("source", "document.xml")
  output("result", "output.html")
}

If you want a custom stylesheet or a custom schema, that’s fine too. Simply import or include the stylesheets or schemas using the standard URIs; they will be resolved by the catalogs and no actual web access will be required.

At the end of the day, whether you consider this easy or difficult is going to depend on a lot of factors. I haven’t taken the time to describe all of the options of the DocBookTask (e.g., how to make PDF instead of HTML), and if you’re doing more than just formatting a single XML file, you will probably need or want to learn a little bit more Gradle.

I’m pleased with the results, however. So what if it consumed most of a weekend and required updates to three projects and the construction of a fourth. I’ve made it easy for you, right? Isn’t that the important thing?

Please provide your name and email address. Your email address will not be displayed and I won’t spam you, I promise. Your name and a link to your web address, if you provide one, will be displayed.

Your name:

Your email:

Homepage:

Do you comprehend the words on this page? (Please demonstrate that you aren't a mindless, screen-scraping robot.)

What is four times two?  (e.g. six plus two is 8)

Enter your comment in the box below. You may style your comment with the CommonMark flavor of Markdown.

All comments are moderated. I don’t promise to preserve all of your formatting and I reserve the right to remove comments for any reason.