On Fibonacci and tail recursion (and XSLT)

Volume 4, Issue 42; 09 Oct 2020

Published by Norman Walsh

A few observations about tail recursion and xsl:iterate in XSLT 3.0.

Let me begin by saying that Declarative Amsterdam 2020 was an excellent conference. I enjoyed the tutorials and all of the talks. In a virtual conference world, I think broadcasting prerecorded talks is a fantastic idea. It puts a little extra burden on the speakers, but it means everyone has to practice and they get to do as many takes as they like. It means you can’t interrupt the speaker for questions of clarification, but that’s so difficult in a virtual setting anyway, I don’t think it matters. Added bonus, the speakers can engage with the audience in real time in the session chat. An idea worth stealing, I think.

But that’s not what this post is about.

In one of the talks, Adam Retter spoke about compiling XQuery to machine code. It was full of good ideas and cleverness and insight. I have moved “investigate LLVM” further up my TODO list in response.

The motivating example that Adam used in discussing the compiler was my favorite small example: computing Fibonacci numbers. The function he used was something like⊕If that’s not exactly what Adam used, it’s close enough. It’s the same in the relevant technical detail I’ll discuss below. And while I’m in the marginalia, there’s closed form solution that I’ll also mention below. this:

declare function local:fib($n as xs:integer) {
  if ($n = 0)
  then 0
  else if ($n = 1)
       then 1
       else local:fib($n - 2) + local:fib($n - 1)
};

Adam remarked in passing that this was a tail recursive function.

Thing is, it’s not. It wasn’t really relevant to his talk and I certainly didn’t feel like I needed to call him out on it. Adam’s probably got more programming chops than I do and he certainly knows what a tail recursive function is. It’s an easy mistake to make as I well know having done it more than once myself.

A function is only tail recursive if it satisfies a few very precise constraints. Informally, one of them is that the recursion must occur at the very end of the function. The function above doesn’t end with a recursive call, it ends with a “+” expression. That function isn’t tail recursive.

This is important because tail recursive functions can be optimized into loops and general recursive functions can’t. That has a profound impact on performance and memory usage.

I only mention all of this because at the very end of the conference, after we’d basically wrapped up and were doing the virtual equivalent of standing around chatting, the question of whether that example had been tail recursive or not came up briefly (I don’t recall why) and someone, Liam I think, or it might have been Tom observed that the xsl:iterate instruction prevents you from making this mistake.

This is all very fresh in my mind because I just finished teaching a course on new XSLT 3.0 features and advanced topics. Computing the Fibonacci numbers and making this mistake and avoiding it with xsl:iterate is precisely one of the examples I used.

I setup my example in XSLT, but it’s very nearly the same:

<xsl:function name="f:fibonacci" as="xs:integer">
  <xsl:param name="n" as="xs:integer"/>

  <xsl:choose>
    <xsl:when test="$n = 1 or $n = 2">
      <xsl:sequence select="1"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:sequence
          select="f:fibonacci($n - 2)
                  + f:fibonacci($n - 1)"/>
    </xsl:otherwise>
  </xsl:choose>

</xsl:function>

Go ahead and try that out if you feel like it. For small “n”, it’s fine. It’ll tell you the tenth Fibonacci number is 55 quick enough. But ask it for the 255th and I don’t know how long you’d have to wait. Longer than I ever waited, assuming you don’t crash the JVM first.

Oh, heck, let’s run some unscientific experiments. For the XSLT function above, including JVM startup time and other overhead as measured with “time” (I said “unscientific”!), I get a results like this:

1 = 1, 1.68s user 0.15s system 198% cpu 0.923 total
2 = 1, 1.68s user 0.14s system 200% cpu 0.909 total
3 = 2, 1.72s user 0.15s system 205% cpu 0.907 total
4 = 3, 1.72s user 0.15s system 205% cpu 0.907 total
5 = 5, 1.76s user 0.15s system 195% cpu 0.977 total
6 = 8, 1.69s user 0.15s system 202% cpu 0.908 total
7 = 13, 1.74s user 0.15s system 207% cpu 0.914 total
8 = 21, 1.69s user 0.14s system 206% cpu 0.889 total
9 = 34, 1.71s user 0.15s system 203% cpu 0.909 total
10 = 55, 1.78s user 0.15s system 197% cpu 0.978 total
15 = 610, 1.83s user 0.15s system 195% cpu 1.009 total
20 = 6765, 1.86s user 0.16s system 204% cpu 0.988 total
30 = 832040, 2.87s user 0.25s system 250% cpu 1.247 total
40 = 102334155, 8.87s user 0.83s system 127% cpu 7.608 total
^C385.12s user 3.04s system 104% cpu 6:12.70 total

So I gave up on the 50th number after about 5 minutes of listening to the laptop fans whine. I can’t really explain that performance curve. Somewhere above 40, does the JVM start running out of memory and swapping?

Aaaaannnyway …

Computing Fibonacci numbers can be done with a tail recursive funtion and, as it happens, Saxon will optimize that function. Here’s my tail recursive version:

<xsl:function name="f:tail-recursive-fib"
              as="xs:integer">
  <xsl:param name="n" as="xs:integer"/>
  <xsl:sequence
      select="fp:tail-recursive-fib($n, 0, 1)"/>
</xsl:function>

<xsl:function name="fp:tail-recursive-fib"
              as="xs:integer"
              visibility="private">
  <xsl:param name="count" as="xs:integer"/>
  <xsl:param name="n" as="xs:integer"/>
  <xsl:param name="next" as="xs:integer"/>
  <xsl:choose>
    <xsl:when test="$count = 0">
      <xsl:sequence select="$n"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:sequence
          select="fp:tail-recursive-fib(
                     $count - 1, $next,
                     $n + $next)"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:function>

One thing that has to be said about this version is that it’s a whole lot less obvious what’s going on. But, measured in the same unscientific way, it does perform much better:

1 = 1, 1.66s user 0.14s system 202% cpu 0.888 total
2 = 1, 1.77s user 0.14s system 199% cpu 0.959 total
3 = 2, 1.69s user 0.14s system 208% cpu 0.878 total
4 = 3, 1.72s user 0.15s system 208% cpu 0.894 total 
5 = 5, 1.73s user 0.15s system 206% cpu 0.905 total 
6 = 8, 1.72s user 0.15s system 209% cpu 0.893 total 
7 = 13, 1.70s user 0.14s system 208% cpu 0.883 total
8 = 21, 1.75s user 0.14s system 208% cpu 0.907 total
9 = 34, 1.68s user 0.14s system 204% cpu 0.889 total
10 = 55, 1.71s user 0.14s system 206% cpu 0.901 total
15 = 610, 1.70s user 0.15s system 205% cpu 0.896 total                
20 = 6765, 1.72s user 0.14s system 208% cpu 0.895 total               
30 = 832040, 1.71s user 0.14s system 207% cpu 0.893 total             
40 = 102334155, 1.72s user 0.14s system 208% cpu 0.892 total          
50 = 12586269025, 1.72s user 0.15s system 207% cpu 0.898 total        
60 = 1548008755920, 1.72s user 0.15s system 208% cpu 0.895 total      
70 = 190392490709135, 1.72s user 0.15s system 204% cpu 0.912 total    
80 = 23416728348467744, 1.74s user 0.15s system 205% cpu 0.918 total  
90 = 2880067194370825216, 1.79s user 0.15s system 208% cpu 0.932 total

And, just for completeness, if you do it with the closed form expression, you get:

1 = 1, 1.70s user 0.14s system 202% cpu 0.913 total
2 = 1, 1.67s user 0.15s system 206% cpu 0.882 total
3 = 2, 1.69s user 0.15s system 205% cpu 0.892 total
4 = 3, 1.69s user 0.15s system 200% cpu 0.917 total
5 = 5, 1.72s user 0.15s system 205% cpu 0.912 total
6 = 8, 1.68s user 0.15s system 203% cpu 0.898 total
7 = 13, 1.71s user 0.14s system 204% cpu 0.907 total
8 = 21, 1.70s user 0.15s system 203% cpu 0.907 total
9 = 34, 1.64s user 0.14s system 198% cpu 0.895 total
10 = 55, 1.67s user 0.15s system 199% cpu 0.909 total
15 = 610, 1.68s user 0.14s system 202% cpu 0.899 total
20 = 6765, 1.69s user 0.15s system 201% cpu 0.912 total
30 = 832040, 1.69s user 0.15s system 202% cpu 0.910 total
40 = 102334155, 1.67s user 0.14s system 200% cpu 0.905 total
50 = 12586269025, 1.72s user 0.15s system 208% cpu 0.897 total
60 = 1548008755920, 1.75s user 0.15s system 201% cpu 0.943 total
70 = 190392490709135, 1.72s user 0.15s system 206% cpu 0.908 total
80 = 23416728348467685, 1.71s user 0.14s system 200% cpu 0.922 total
90 = 2880067194370816120, 1.70s user 0.15s system 200% cpu 0.926 total

Which is even a little bit better.⊕Sort of. If you ask the tail recursive function for the 255th Fibonacci number, it’ll promptly tell you 87,571,595,343,018,854,458,033,386,304,178,158,174,356,588,264,390,370. If you ask the closed form function, it’ll promptly tell you 87,571,595,343,019,600,000,000,000,000,000,000,000,000,000,000,000,000. But that’s a whole other problem. Maybe. I think what this shows most is that the JVM startup tax and other artifacts are probably dominating the measurements.

But, dragging myself once more back to the point, you can also do this with xsl:iterate.

The advantage of xsl:iterate here isn’t that it’s faster. In fact, I expect it performs pretty much the same as the properly tail recursive function. The real advantage of xsl:iterate is that it’s not possible to write it so that it fails to be optimized into a loop. In my course, I described xsl:iterate as “Recursion. Easier. With guard rails.”.

If you attempt to write a tail recursive function, you can easily fail to get it exactly right. And even if you get it exactly right today, you can really easily come back to it a week later and accidentally break it. But you won’t notice. And everything will be fine. And I bet all of your tests will pass. It will just perform abysmally when you least expect it (and long after you’ve forgotten that it relied on being tail recursive).

If you write it with xsl:iterate, it won’t compile if it can’t be optimized.

Here’s my iterative solution, as I wrote it:

<xsl:function name="f:iterative-fib" as="xs:integer">
  <xsl:param name="n" as="xs:integer"/>
  <xsl:iterate select="1 to $n">
    <xsl:param name="n" select="0"/>
    <xsl:param name="next" select="1"/>
    <xsl:on-completion select="$n"/>
    <xsl:next-iteration>
      <xsl:with-param name="n"
                      select="$next"/>
      <xsl:with-param name="next"
                      select="$n + $next"/>
    </xsl:next-iteration>
  </xsl:iterate>
</xsl:function>

Not only is this guaranteed to be efficient, it’s arguably clearer.

If you’re using XSLT 3.0, learning xsl:iterate is definitely worth the effort.