so

Towards XML Resolver 2.0.0

Volume 5, Issue 4; 03 May 2021

I’ve pushed a snapshot release of XML Resolver 2.0.0.

This post has been replaced.

I’m having one of those rat-hole days that occur in programming every so often. I wanted to upgrade a thing, but that meant I had to upgrade another thing, where I found a bug, which led to another thing that I wanted to upgrade but where I discovered I hadn’t implemented a feature, which led, which led, which led…

Ultimately (I hope!), I wound up looking at XML Resolver where I wanted to add a small feature. And where I found a bunch of unfinished API work started seven months ago. ☹

I finished the API work, added my small feature, looked at the open issues list, and decided to close a couple of those while I was in the neighborhood. (This led me to RFC 2397…where I found a bug because there’s always a deeper rat hole.)

On the plus side, closing one of those issues, supporting classpath: uris, will (I think!) greatly simplify the next project up the stack and may come in handy in Saxon. (I’m hoping to get the XML Resolver library into the next major release of Saxon in place of the decades old Apache libraries.)

The API work I started back in October changes some of the public facing APIs. I’ve added a generic ResolverFeature type to make the API a little more type safe. In practice, I haven’t changed any APIs that I think are widely used, but in deference to the principle of marking API changes with major version numbers, I’ve decided to call the new version 2.0.0.

Given that I was making one potentially breaking change, I decided this was a good time to make another. The resolver class can be configured with either system properties or a properties file. In version 1.x, if a property file is used, the values specified in that file always take precedence.

This means that you can’t selectively override settings for a single application by specifying a system property. That seems wrong. In 2.x, if a property is specified in both places, the system property wins. (I added a property, prefer-property-file, to preserve the former behavior but the new behavior is the default.)

Closely related is the question of how to add additional catalogs for a specific project. Just because I want to add the DocBook stylesheets to the catalog for this set of transformations doesn’t mean I want to completely replace all of the catalogs you already have configured!

To make this easier, the 2.0 release adds a new system property, xml.catalog.additions, and a new property file key, catalog-additions. Both properties take a list of catalog files. Those catalog files will be added to the list defined by the normal catalog files properties.

Finally, the two issues I closed were support for data: URIs and support for classpath: URIs.

Data URIs are defined by RFC 2397. They let you “inline” some content directly in the URI. For example, this catalog entry:

  <uri name="http://example.com/example.xml"
       uri="data:application/xml;base64,PGRvYz5JIHdhcyBhIGRhdGEgVVJJPC9kb2M+Cg=="/>

maps the URI http://example.com/example.xml to a short XML document defined by that data URI (It’s <doc>I was a data URI</doc>). I’m not sure this is going to be very useful, but it wasn’t hard to do.

Classpath URIs are more interesting. As near as I can tell, they’re defined somewhat informally by the Spring framework. The classpath in Java is a list of directories and/or JAR files. Subject to some constraints imposed by the class loader (which I’m not going to discuss), you can search and load files from those directories or JAR files (a JAR file is just a ZIP file for our purposes here).

What this means is practice is that you can ship a resource in the JAR file with your application or library and then retrieve it with a classpath: URI.

For example, this catalog entry:

  <uri name="http://example.com/example.xml"
       uri="classpath:path/example-doc.xml"/>

maps the URI http://example.com/example.xml to a document with the path path/example-doc.xml on the classpath. Searches always begin at the root of the classpath segments, so path/example-doc.xml and /path/example-doc.xml are equivalent.

The classpath: URI returns theThe resolver also supports classpath*: but since it’s defined as concatenating the resources identified, it’s of comparatively little use in the XML case. first matching file and you (the application writer) can’t control what else is on the classpath or what order it’s in, so you’d be better of with a longer, more likely unique name such as org/docbook/xslTNG/resources/catalog.xml.

It is now possible to use classpath: URIs in the catalog. It is also possible to use classpath: URIs in the catalog list.

This will, I hope, allow me to put a catalog file in the DocBook xslTNG distribution which maps URIs to other resources in the distribution jar file. I will then use xml.catalog.additions to point to that catalog file in projects that use DocBook and everything will “just work”.

We shall see, as I climb my way back up out of this maze of rat holes.