XML Resolver 6.0.11

Volume 8, Issue 13; 28 Oct 2024

Still pushing forward. Small fixes and one new feature.

Since I last wrote about the XML Resolver (coincidentally, literally the last thing I posted here), I’ve made a few incremental improvements:

Fixed issue #208. The initialization code wasn’t properly closing the properties file. This could cause “out of filehandle” errors.
Fixed issue #211. The system identifier wasn’t always being properly passed back in the resolved resource. This left the base URI empty.
I merged a PR from @dizzzz that added XML DSig schemas to the built-in schemas. Thanks, Dannes!
I added a new feature to support a custom scheme resolver.

One of these things is not like the others. The idea behind the custom scheme resolver is to make integration with larger projects, like databases, easier. Suppose, for example, that you have a database product that supports a custom URI scheme, db:. In other words, in the context of the database product, a request for db://someID will return a resource.

That’s all well and good, but what about outside that context? In particular, what if you do this:

<public publicId="-//Sample//DTD Sample 1.0//EN"
        uri="db://sample-10-id"/>

Sometimes this is fine. The resolver gets a request for the public identifier, it maps it to the db://sample-10-id, fails to resolve that URI, and returns a response that says, “I couldn’t load this”. The caller can then figure out what to do with the URI.

But sometimes it’s not fine. There are options, like “always resolve” that make failure to resolve a resource into an error. And relying on fallback behavior in the caller can be problematic. Can you be sure that a Xerces parser, encountering db://sample-10-id, will be wired up just right so that it can resolve that?

The new (and consequently, to be viewed as somewhat experimental) feature is to allow the resolver to be configured so that it can defer resolution of particular schemes to resolvers that know what to do with them.

For example, with this initialization:

XmlResolverConfiguration config = ...
SchemeResolver dbSchemeResolver = ...
config.registerSchemeResolver("db", dbSchemeResolver)

Any resolver with that config will call dbSchemaResolver.getResource() to resolve a db: URI. That code can get it from a database, load it from a resource, or consult a Ouija board. The XML Resolver doesn’t care as long as it returns a resource.

If it fails to return a resource, the resolver will try the next scheme resolver configured for that scheme, and so on until someone resolves it.

If no one resolves it, it returns to the previous behavior: trying to resolve it with Java APIs and then giving up.

Rather than installing the resolver programmatically, it can be done automatically with an SPI:

Create a META-INF/services/org.xmlresolver.spi.SchemaResolverProvider resource that identifies a provider that is an implementation of a SchemaResolverProvider.
The provider will be called to create a SchemaResolverManager.
The manager identifies the schemes supported and provides SchemaResolvers for them.

Implementation experience invited.

Note that all of the providers are instantiated once when the XML Resolver is initialized. This is a performance compromise because the expectation is that the resolver will be called many times and instantiating the SPI over-and-over again could be expensive.

(I have no idea what the C# equivalent of SPI is, or if there even is one. And I haven’t republished the C# resolver with the latest DSig schemas, but it’s on the list.)