Pipelineception

Volume 9, Issue 6; 04 May 2025

Pipelines inside pipelines with pipelines using pipelines for pipelines to do pipelines because pipelines. Pipelines!

One of my best friends has often observed that pipelines are constantly being invented and reinvented. They’re just that useful. They’re used at all kinds of different scales from (loosely speaking) what microprocessors do up to orchestrating whole enterprise workflows. (And, outside the domain of software, for moving goods around and running systems on micro, macro, and industrial scales.)

Pipelines have been invented a whole bunch of times in web and markup tools too, and somewhere in that mix sits XProc, an XML standard for describing pipelines. I’ve been working on that for coming up on twenty years. Version 3.1 should come out in time for MarkupUK this year and it’s feeling pretty solid. You should check it out, if you haven’t already.

The genesis of this post comes from a different bit of standards work that I’m involved in. I am the chair of “QT4CG”, more officially known as the XQuery and XSLT Extensions Community Group. We’re the W3C community group working on XPath, XQuery, and XSLT version 4.0.

Pipelines come up in that context too, sometimes. In fact, we already have pipelines, the -> operator (and friends), for example. But what I mean is that sometimes we have discussions about pipelines at a higher level.

I’m the chair of the working group, my job is to manage the committee process, encourage the group to focus on the tasks we need to resolve to get finished, keep the ball rolling, and generally stay out of the way. I try really, really hard not to shout “XProc!” whenever anyone talks about inventing pipelines for managing queries and transformations.

And I’m not saying that just to be funny. XSLT allows you to run different transformations in the same stylesheet with fn:transform. I use that in the DocBook xslTNG Stylesheets. It’s reasonable to want to be able to chain those together in a more elegant fashion and it’s reasonable to want to be able to do that without pulling in all of XProc. (In XSLT, there’s the extra use⊕Somewhere a couple of pages down on my todo list is the idea that two (or more) sequential p:xslt steps could be compiled down into Saxon’s internal pipeline architecture. That should allow XML Calabash to chain streaming transformations. case of chaining together streaming transformations that isn’t explicitly addressed by XProc.)

But still.

So sometimes I think about what it would look like if we did have XProc in XPath. I think the fn:invisible-xml function gives us a practical model. You’d have an fn:xproc function that took an XProc pipeline and returned a function that you could use to run that pipeline. If you’re not (yet) comfortable with higher-order functions, this might seem a bit odd, but it has two really big benefits: first, you’ve got obvious and automatic caching of the compiled, ready-to-run pipeline, and second, you don’t have to work out how to pass the pipeline, options for compiling the pipeline, all of the pipeline inputs, and all of the pipeline options to a single function.

This has all become a bit abstract; let’s consider a toy example. Suppose you had this pipeline:

<p:declare-step type="ex:process">
  <p:input port="source" content-types="xml html"/>
  <p:output port="result" content-types="xml html"/>
  <p:add-attribute attribute-name="do" attribute-value="something"/>
</p:declare-step>

That’s an XProc⊕It adds an attribute name “do” with the value “something” to the document element of the XML or HTML document that appears on the source input port. The resulting document appears on the result output port. pipeline that “does something”. Not something useful or interesting, but you can just imagine that’s any pipeline you like with an input and an output and a body that does something. Doesn’t matter what.

Then if you could “compile” that pipeline into a function, you could use it like this in XSLT:

<xsl:template match="could">
  <xsl:sequence select="ex:process(.)"/>
</xsl:template>

That would take the could element, use it as the input to the ex:process pipeline, and then it would use the result of that pipeline as the result of the sequence. Cool, right? (Technically, the function is in XPath so it would work there too, or in XQuery.)

If ex:process had several inputs and some options, the syntax would extend naturally:

ex:process(input1, input2, map { 'option-name': value })

Easy enough. But functions can only return a single result, and pipelines can have multiple output ports. We need some workaround to handle that. Luckily, we have maps (hashes, dictionaries, whatever you like to call them) and they fit the bill. For consistency, let’s say the function always returns a map where the keys of the map are the names of the output ports and the values are the document(s) that appear on those ports. So our example would actually look like this:

<xsl:template match="could">
  <xsl:sequence select="ex:process(.)?result"/>
</xsl:template>

Or, if you prefer, like this: map:get(ex:process(.), 'result').

You could do the same thing for inputs, but you don’t have to and it seems a bit cumbersome, so I haven’t. The first “n” arguments to the function are the “n” input ports that the function has, in the order they’re declared. If the function has options, there’s one more argument, a map of option names to values.

This feels like it could work. We just need the fn:xproc function, I guess.

And then a thought occurred to me. Implementations can define their own functions. Yes, a conformant XPath processor has to implement all of the required fn: functions, but it can also implement more functions in other namespaces.⊕We might still want one, though, to control the compiler or set static, compile-time options. I’m just going to ignore that. So we don’t actually need an fn:xproc function.

Starting around XML Calabash alpha29, every pipeline you declare with an explicit type becomes an XPath function with the same name. This “just works”:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
                xmlns:ex="http://example.com/ns"
                exclude-inline-prefixes="#all"
                name="main" version="3.0">
  <p:output port="result"/>

  <p:declare-step type="ex:process">
    <p:input port="source" content-types="xml html"/>
    <p:output port="result" content-types="xml html"/>
    <p:add-attribute attribute-name="do" attribute-value="something"/>
  </p:declare-step>

  <p:xslt>
    <p:with-input port="source">
      <doc>
        <could/>
      </doc>
    </p:with-input>
    <p:with-input port="stylesheet" href="stylesheet.xsl"/>
  </p:xslt>

</p:declare-step>

Where stylesheet.xsl is:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:ex="http://example.com/ns"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
                exclude-result-prefixes="#all"
                version="3.0">

<xsl:output method="xml" encoding="utf-8" indent="no"/>

<xsl:mode on-no-match="shallow-copy"/>

<xsl:template match="doc">
  <doc>
    <title>What would I do?</title>
    <xsl:apply-templates/>
  </doc>
</xsl:template>

<xsl:template match="could">
  <xsl:sequence select="ex:process(.)?result"/>
</xsl:template>

</xsl:stylesheet>

It’s possible to take this one step further. It’s possible to make this work even if you aren’t running a pipeline. It just requires a little more work to set up.

You have to hook into Saxon’s startup. We have -init: for that.
You have to be able to point to a library of pipelines. We can use a Java property for that.
Profit.

So, in fact, this also “just works” (assuming the ex:process step is defined in library.xpl):

XLIB=/path/to/library.xpl
CP=...all the jars...
java -Dcom.xmlcalabash.pipelines="$XLIB" -cp "$CP" \
     net.sf.saxon.Transform \
     -init:com.xmlcalabash.api.RegisterSaxonFunctions \
     -s:doc.xml -xsl:stylesheet.xsl

To be fair “…all the jars…” is doing some work in there. But XML Calabash ships with shell and Powershell scripts that point the way.

I don’t know how useful or intersting this is going to be in practice, but once I had the idea in my head, I had to see it work.