so

Improving parallelism

Volume 5, Issue 15; 23 Jul 2021

Exploring the trade-offs between user expectations and performance in XProc pipelines.

Back in September, there was animated discussion about how much freedom an XProc processor should have in determining which steps can run in parallel. The issue has taken some time to resolve.

The heart of the issue is this: given two sequential steps where the first step has a primary output port (and consequently provides a default readable port to the following step), if the processor can determine that the following step does not use the default readable port, can the processor ignore that connection?

This question doesn’t arise in XProc 1.0 because in 1.0 we explicitly did not require an implementation to analyze XPath expressions in order to build the pipeline graph. An XProc 1.0 processor cannot tell if the following step uses the port or not and therefore must conclude that it does.

And, candidly, that’s sort of where my head was. And the issue was much more general than it initially appeared to be so there was lots of opportunity for misunderstanding.

If the processor isn’t given this freedom, then lots of practical parallelism in pipelines can’t be exploited. If the processor is given this freedom, then steps that rely on side effects and don’t have explicit connections, may not be executed in the same order they appear in the pipeline document. I think it’s the case that this only arises when otherwise independent steps have side effects that have to occur in a particular order. In which case, I assert, you’d be better off making those dependencies explicit irrespective of this issue.

In any event, the editorial team has decided that the answer is “yes,” the processor can discard implicit connections where it can tell that the document that (would be) provided is never used.

XProc 3.0 processors already have to analyze XPath expressions in order to determine how variable and option bindings are connected, so any additional burden on implementors is insignificant.

This is all laid out in a new section, Connections and the Default Readable Port in the current editorial draft. Comments most welcome.