so

Converting ABNF to iXML

Volume 7, Issue 37; 05 Aug 2023

Converting ABNF to iXML…with iXML and XSLT.

This bit of silliness came to me on Monday morning when I should have been preparing for my Balisage talk. It’s part of a slightly larger bit of silliness that’s still lurking somewhere in the back of my mind.

The IETF publishes grammars for specifications (URIs, JSON, JSONPath, etc.) using a notation called Augmented BNF (or ABNF for short). Michael Sperberg-McQueen created an Invisible XML grammar for ABNF as a grammar sample.

Armed with that grammar, you can turn ABNF into XML. With a few small modifications, that grammar can turn ABNF into reasonably nice XML. So far, so good. Once you have it in XML, of course, you can use XSLT to transform it into some other kind of XML. For example, the XML format of Invisible XML grammars.

Then it’s just a matter of getting a few marks in the right place, and you have an Invisible XML grammar for the language specified in the IETF RFC generated directly from the published ABNF grammar, no hand authoring required.

I do actually think that’s kind of cool.

Three observations:

One interesting consequence of this approach is that the iXML grammar you get is a literal translation of the ABNF. That’s what you wanted, right? Well, yeah, except that ABNF grammars don’t impose the iXML rule that the start symbol must be (exclusively) the first rule in the grammar. The first rule in the (ABNF) URI grammar is:

URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

That parses absolute URIs that begin with a scheme. But there are other things that can also be parsed with the ABNF grammar. For example, /absolute/path/file.xml. But iXML can’t parse those with that grammar.

I fixed this in CoffeePot by adding a --start-symbol option to specify an alternate start symbol. That’s not conformant with the iXML specification, but it gets the job done.

The second thing I observed is that the JSON grammar in RFC 8259 is horribly ambiguous. It has lots and lots of places where optional whitespace can occur next to other optional whitespace. The sample document in section 13 has 192 parses. It has 12,288 parses if you leave it indented as it appears in the specification. (With other parsing technologies, you probably resolve the whitespace in the lexer so it’s not really an issue, but it’s still a little disappointing.)

My final observation: kudos to the folks working on JSONPath: that grammar is unambiguous. 🙂

If you want to play with any of this: I put the presentation I gave in the open mic session online (see especially, the last slide). All the code and examples are in https://github.com/nineml/abnf2ixml.

#Balisage #Invisible XML