so

Invisible XML in production

Volume 8, Issue 20; 24 Dec 2024

Using iXML to parse a command line interface.

I have written dozens, maybe hundreds, of Invisible XML grammars. The vast majority for testing, either in the test suite or in my own testing for NineML. It’s an odd reality that implementing something doesn’t always mean using it, at least not in the way that we expect others to use it.

I have a few iXML grammars knocking about that I use for various personal things, but I think the command line parser for my debugger feature is my first production use.

It started out, as these things often do, with some ad hoc parsing. The syntax is simple. Things like this:

inputs
models pipeline
options
run
set $variable = expression
show expression
stack
step
subpipeline
subpipeline for-each
up

It’s easy enough to tokenize on spaces, do a little string comparison…has to be quicker than doing it “the right way,” right? Then you get to the first slightly more complicated case:

break on delete
break on add output result when //ex:chap[@role]
break clear add
break on declare-step_2

I was about 100 lines into my stupid parser and regretting my life choices when I decided to spend the half hour it would take to setup the iXML parser and do it that way.

Man, that was the right answer! Especially a couple of hours later when I wanted to change things around. And today when I added two new commands.

The grammar is a bit unusual.The business about version “1.1” at the top (if you went and looked) has to do with the use of “aliasing”. My current NineML implementation doesn’t support that for grammars labeled (implicitly or explicitly) “1.0”. That’s a bug, but since aliasing is only available in the current editor’s draft, I’m not losing sleep over it. It’s got things in it like this:

-set = set-command, rs, -'$', varname, s, -':'?, -'=', expr .
-set-command = name-set, s, -"set" .
name-set>name = +"set" .

We have a set nonterminal that we suppress, and then an insertion for “set” that is always serialized as name. The practical upshot of this is that the parsed output always has a “name” element containing the name of the command, rather than an element with the same name as the command. So set $foo = 3 produces:

<command>                                                         
   <name>set</name>
   <varname>foo</varname>
   <expr> 3</expr>
</command>

Then(I don’t care about the extra space at the beginning of the expression because XPath doesn’t care.) somewhat sacrilegiously, I use a NineML feature to produce JSON instead of XML,

{"command":{"name":"set","varname":"foo","expr":" 3"}}

which I quickly parse into an XDM map, then a JVM map, so I can dispatch off the values (in Kotlin, but that’s not really important):

val command = parseJson(dtree.asJSON())
when (command["name"]) {
    null -> {
        println("Parse failed: ${line}")
        continue
    }
    "base-uri" -> doBaseUri(command)
    "breakpoint" -> doBreakpoint(command)
    "catch" -> doCatchpoint(command)
    "define" -> doDefine(command)
    "down" -> doDown(command)
    …

I think this iXML thing has legs!

#Invisible XML #XML Calabash #XProc