so

Accumulators FTW

Volume 7, Issue 38; 08 Aug 2023

TIL: Accumulators in XSLT make some classes of problems a whole lot easier to solve.

I’ve called this a “today I learned” post mostly because I haven’t had one in a while. It would be fairer to say that I learned it last Thursday at Amanda Galtman’s excellent talk for her excellent Balisage paper, Accumulators in XSLT and XSpec, and I actually implemented it yesterday. But nevermind.

I’m working on some documentation where I’m importing markup from Javadoc. It’s an experiment that occurred to me after I got the “XML doclet” working again. The XML doclet parses Javadoc comments and turns them into XML markup instead of HTML. (I say “working again”, I implemented it from scratch with the new Javadoc APIs. But that’s a story for another day when it works a little better.)

The problem is that all of the quotes in the Javadoc are "straight" quotes instead of proper “typographic” quotes. It’s not a big deal but (a) they clash with the typographic quotes used elsewhere in the documentation and (b) they’re ugly.

I’m not going to go through all the Javadoc comments and fix the quotes, so I want to fix them when I’m importing the XML. I’m prepared to assume that the quotes are well balanced within the sections I’m converting, but I know they aren’t all in a single text node. I have to do the right thing with markup like this:

<p>What about "<code>markup</code>" like "this".</p>

I spent about ten minutes dreaming up an elaborate solution involving a recursive function that basically reimplemented apply templates. I was pretty confident I could do it, but it sure didn’t feel like a fun exercise for a Monday evening on my mom’s couch. (Mom had gone to bed, I wasn’t being antisocial. Much.)

Then I thought, “what I really want to do is keep track of some things as I’m applying templates. Isn’t that exactly what Amanda said accumulators were for?”

I need to control where the numbering starts, so I constructed a wrapper around the section I want to process (I throw the wrapper away when I add the results to the actual result document):

<db:the-cake-is-a-lie>
<p>What about "<code>markup</code>" like "this".</p>
</db:the-cake-is-a-lie>

Then I made an accumulator to count the number of straight quotes in each text node:

<xsl:accumulator name="dquote" as="xs:integer" initial-value="0">
  <xsl:accumulator-rule match="db:the-cake-is-a-lie" select="0"/>
  <xsl:accumulator-rule
      match="text()"
      select="$value + string-length(replace(., '[^&quot;]', ''))"/>
</xsl:accumulator>

(I wonder if there’s a better way to count the quotes?)

Then it’s just a short template to get the job done.

<xsl:template match="text()" mode="patch-dquote">
  <xsl:variable name="qcount" select="string-length(replace(., '[^&quot;]', ''))"/>
  <xsl:variable name="tcount" select="accumulator-before('dquote')"/>
  <xsl:variable name="first" select="$tcount - $qcount + 1"/>
  <xsl:choose>
    <xsl:when test="$qcount gt 0">
      <xsl:variable name="bits" as="xs:string*">
        <xsl:for-each select="tokenize(., '&quot;')">
          <xsl:choose>
            <xsl:when test="position() eq last()">
              <xsl:sequence select="."/>
            </xsl:when>
            <xsl:when test="($first + position() - 1) mod 2 = 1">
              <xsl:sequence select=". || '“'"/>
            </xsl:when>
            <xsl:otherwise>
              <xsl:sequence select=". || '”'"/>
            </xsl:otherwise>
          </xsl:choose>
        </xsl:for-each>
      </xsl:variable>
      <xsl:sequence select="string-join($bits, '')"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:copy/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

It took me a moment to understand why accumulator-before('dquote') was returning the “already counted” value. Maybe accumulator-after('dquote') would be clearer? It amounts to the same thing since a text node doesn’t have any descendants.

There you have my first “production” use of an accumulator. It’s a vastly simpler solution than the one I had in mind. Thank you, Amanda (and Balisage)!

What about straight single quotes, I hear you ask? (I know my audience.) The overwhelming majority (I’m willing to assume “all”) of the uses of straight single quote are in contractions. I’m using a much simpler conversion strategy: if it’s preceded by a space, it’s a left single quote, otherwise it’s a right single quote.

That’ll catch unlikely but just marginally possible uses like this:

<p>I don't think "this 'is' very likely".</p>

If I notice places where that does the wrong thing, I’ll revisit.

#Balisage #TIL #XSLT