An XInclude API for Saxon

Volume 4, Issue 36; 09 Jul 2020

Published by Norman Walsh

An XInclude processor for Saxon, including an extension function that can be called from XSLT stylesheets (and XQuery? I haven’t tried).

My upcoming Balisage paper is about the DocBook xslTNG Stylesheets, as I revealed a few days ago. That project includes a reference guide and that reference guide needed an XInclude implementation, so I did the cheap and cheerful thing in XSLT.

A while later, I wanted to include some text documents with callouts and that just didn’t work at all. I looked at my XSLT implementation of XInclude and I thought about how in XSLT 3.0, it might be possible to do quite a lot of the…and then some more rational part of my brain shouted:

Stop it! Just stop it! You have enough things to do, you do not need to go down the rabbit hole of reimplementing XInclude in XSLT 3.0 to satisfy an edge case in the documentation for a project that has already turned into a deep [expletive deleted] rabbit hole! You have a working XInclude implementation! It’s well tested! It’s in Java! For the love of all things, just use that!

So I did. And I’ve packaged it up separately because it seemed like the sort of thing that might be generally⊕I happen to think DocBook is generally pretty useful, but some of ya’ll persist in wanting to use other things. Insisting that you use the DocBook stylesheets to get XInclude support for some other vocabulary seemed unnecessary. useful. You can download it from the releases page or get it from Maven though it probably won’t turn up on the Maven website for a few hours.

I bumped the version number to 0.9.0 because the implementation was lifted straight out of XML Calabash and is well tested. If no one reports that I messed up the deployment package or something in the next few days, I’ll probably republish it as 1.0.0.

For XML fragment identifiers, you get xmlns, for namespaces, and the element scheme. You also get the xpath scheme which just evaluates an XPath. If interoperability is a concern, make sure you know what your interchange partners can handle. This implementation can handle anything Saxon 10.0 can.

For text fragment identifiers, which are what actually sent me down this rabbit hole, you get three choices:

RFC 5147 line or character ranges,
The search scheme that I invented on an airplane a few years ago,
or Ln-Lm which seems to be something GitHub invented. (I can’t find a specification for it, pointers welcome)

I tried to design the implementation such that it would be easy to extend it with schemes of your own devising, but it’s fair to say that hasn’t had much practical testing.

Share and enjoy!