The ugly, the ugly, and the ugly

Volume 6, Issue 8; 07 Aug 2022

Implementing XProc on top of Saxon 11.x and later is…not pretty.

XSLT imposes a constraint that documents are immutable. One consequence of this contraint is that two different requests for the same absolute URI (for example, two calls to the doc() function) must return the same document.

Another consequence of this constraint is that an XSLT processor must keep track of the documents that it has accessed, so that it can return the same immutable copy each time. (If, for example, you use the doc() function to access a URI that returns a random number, every attempt you make to load that URI must return the same document and never a different one with a different random number.)

It follows that if the processor has an API that lets you identify new documents, it must reject an attempt to add a new, different document that has the same URI as some document that it’s already tracking.

Starting in version 11, Saxon is much more aggressive about policing this constraint.

This is unfortunate for an XProc implementation because XProc doesn’t impose the same immutability constraint. Lots of steps change documents without changing their URIs. Also, all <p:inline> documents have the same document URI as the pipeline that contains them. This isn’t a problem for pipelines because most steps get their inputs from ports, the URIs are less relevant. Given:

<p:load name="L" href="input.xml"/>
<p:add-attribute name="A"
<p:xslt name="T">
  <p:with-input port="stylesheet" href="style.xsl"/>

There’s no ambiguity about which “version” of the input document is styled by the XSLT step, even though the output from steps “L” and “A” both have the same document uri.

I’m not really sure, exactly what to do to address the tighter constraints in Saxon 11.x. On the one hand, it’s tempting to suggest that there should be a configuration flag to relax the constraint. But that would introduce the possibility that a p:xslt step might run XSLT in a non-conformant way. That would be bad. On the other hand, my implementation is built on top of the Saxon APIs, so there’s no practical way to avoid the constraint.

If I can’t remove the constraint, and I can’t avoid the constraint, I’m going to have to figure out how to live with it.

Looking at the results of the test suite, the problem arises (in the test suite) in two places:

  1. When a pipeline contains multiple p:inline documents.
  2. When a pipeline attempts to construct a collection of documents. (This can occur in, for example, building the default collection for an XSLT transformation, or in the cx:collection-manager extension step.)

I can’t prevent a pipeline from having multiple inlines, or a collection from having multiple documents, so I have to make sure the URIs are unique.

To see how this works “in real life”, I’ve published XML Calabash 1.5.0a1-110, an experimental release that uses Saxon 11. It makes URIs unique, where it thinks it needs to, by adding a query parameter. If /path/pipe.xpl contains two inlines, they will have the URIs /path/pipe.xpl?xmlcalabash_uniqueid=1 and /path/pipe.xpl?xmlcalabash_uniqueid=2. This satisfies the uniqueness constraint and doesn’t break “normal” operations like resolving a relative URI against the document URI.

With these changes, the test suite passes. But I have no intuition about how this will effect pipelines in the real world, and I haven’t yet attempted a thorough analyiss to see if there are more places where the names have to be coerced to be unique.

If you can try 1.5.0a1-110 and report any problems that arise, I’d appreciate it.