Alpha6

Volume 8, Issue 18; 23 Dec 2024

The story of XML Calabash 3.x has some twists and turns, but the last few alphas have passed all the tests and it’s finally possible to start thinking about some of the fun things.

My work on XML Calabash is entirely a hobby. I do it in my spare time.⊕There’s an interesting psycho-social question about “why”, but let’s not look too closely at that. “Never ask a question if you don’t want to know the answer.” When I started working on my 3.0 implementation, I decided to do it in Scala. I was looking for some way to make it “more fun” and I was interested in exploring alternative languages on the JVM. Scala seemed like a good choice (it’s still a good choice, I’m not knocking Scala!). I was also interested in making a pipeline engine that was entirely agnostic to the pipeline format;

When I started working on my 3.0 implementation, I decided to do it in Scala. I was looking for some way to make it “more fun” and I was interested in exploring alternative languages on the JVM. Scala seemed like a good choice (it’s still a good choice, I’m not knocking Scala!). I was also interested in making a pipeline engine that was entirely agnostic to the pipeline format;⊕There’s history here. I worked really hard on a non-XML language, a “compact syntax” for XProc 2.0. That effort failed, but it still lingers in my mind. I divided the implementation into two parts, a core pipeline engine and an XProc implementation on top of that.

It was interesting, and I got pretty far, but then two things happened: I reached the point where I was passing some large percentage of the test suite and all that was left was the relative tedium of getting across the finish line. And I got interested in Invisible XML.

Many months passed. I published NineML my iXML implementation on the JVM. I tinkered around with a few other things. (I’ve actually re-implemented a chunk of NineML; that’s waiting in the wings.)

In February or thereabouts, I was feeling like I needed to get back to XML Calabash. I looked at the state of my implementation and I wasn’t happy about it. The remaining things that needed doing all felt hard. And after a good long think, I came to the conclusion that the shouldn’t be hard if I had the data models right.⊕What are the right data models? That’s a good question. And maybe a posting for another day. I can’t claim I started with a completely green field. I developed a plan on my first try at 3.0 and much of that remains. It should be (relatively) easy, with the right data models.

The other thing that happened around February is that I began looking at porting some big, ugly Gradle scripts from Groovy to Kotlin. I have never liked Groovy. I like Kotlin a lot more. And suddenly, I had a reason to want to learn more about Kotlin.

I took my two ideas: “use the right data models” and “you need to learn more Kotlin” and I combined them. I started over in Kotlin. Feeling a need to shed some baggage if I was ever going to finish, I decided my language agnostic pipeline implementation was a bridge too far. Eager to leave open some avenues for exploration, I settled on a builder API for pipelines. XML Calabash 3.x ships with a builder that parses XProc and builds a pipeline, but other parsers and builders are possible.

It went well. By about April or May, I was feeling like I had the end in sight. [Foreshadowing: the end was not in sight.] My goal was to have a beta version out by XML Prague.

There were only a few things left to implement, among them, p:import-functions. In retrospect, I should have given that more thought early on (data models, you know). It has its hooks deep in the core and I had got it wrong. Badly wrong.

You do not want to get “mostly done” with a big project and then discover that the core needs to be aggressively and violently refactored. It was not ready for XML Prague. It was not even ready for Balisage. (It might have been, but hobby projects get starved for time when life, and work, intervene.) It was ready for (US) Thanksgiving, just. I published “alpha1” on 24 November. I got to 100% passing in the test suite a short while later.

The big refactor has left the code less tidy and elegant than I once thought it was. Honestly, it’s a bit of a mess in places. But starting over, over again is beyond my endurance. I’ll just keep tidying as I go.

It remains something of a sore point with me that some of the hardest things to implement correctly, and well, are features that I think see very little real-world use. (I’m not saying “none”, if one of these is your favorite feature and you use it every day, that’s fine. I just don’t think anyone reading this is going to think that!)

Conditional compilation. The [p:]use-when feature, especially when combined with static variables and the p:step-available() function, is messy. Combined with multiple imports of the same pipelines, it’s even worse. In the end, there’s a whole separate data model and parse just to handle that.
Default pipeline inputs. You have to keep them around, but then not use them if an actual input is provided. Except that’s only true on the outer-most pipeline because nested pipelines always have to have inputs.
Import functions. Oh, there’s nothing wrong with this feature (though I do bet it’s rarely used), I’m just annoyed with myself for having failed to plan appropriately for it.

I’m going to let the alphas roll along for a while. At least until I get some external feedback that maybe it mostly works. I published alpha6 today, but you want the most recent release, whatever that is.

I’ve got some other stuff to talk about, but I’ll save that for other posts.

Happy holidays to them that celebrate. Enjoy the festivals of axial tilt, whether those are winter festivals or summer festivals in your neck of the woods!