so

Modularity in iXML

Volume 9, Issue 24; 20 Oct 2025

What does it mean for an iXML grammar to be modular? From a purely practical perspective, it means you can reuse rules defined in other grammars. But how does it work?

I was reminded recently that I opened an issue about modular grammars back in 2022. I think my initial comment is a bit naive and, unsurprisingly, the last comment in 2022, from Michael (😭), digs right into the heart of the matter: what, semantically, are we trying to do?

I’ve implemented a form of modularity, so I’m going to sketch out how I think about the semantics that I implemented. There may be other ways to think about it, and I may think about it differently tomorrow, but one has to start somewhere.

It’s tempting, if we look at XInclude, or XSLT, or XML Schema, to think of “inclusion” as a sort of textual process. Not literally textual, necessarily, but pulling the included thing into the including thing. I didn’t find that model helpful. It’s true, in some sense, but at the level of the grammar, I found it very complicated to think about it that way.

To explore this, let’s consider an absolutely trivial case of reuse. We start with a grammar for digits, digits.ixml, that defines decimal and hexadecimal digits, digit and hexdigit, respectively:

+shares digit, hexdigit .

-digit = ["0"-"9"] .
-hexdigit = digit | hexadecimal .
-hexadecimal = ["A"-"F" | "a"-"f" ] .

and a grammar for numbers, numerics.ixml, that uses digit and hexdigit from digits.ixml to define decimal and hexadecimal:

+uses digit, hexdigit from "digits.ixml" .
+shares decimal, hexadecimal .

number = decimal | hexadecimal .

-decimal = digit+ .
-hexadecimal = hexdigit+ .

(It also defines a top-level number, but that’s not shared and not relevant here.)

Graphs are going to be important in a moment, so let’s represent these graphically. First, digits.ixml:

digitscluster_digitsdigits.ixmldecrange[ "0"-"9" ]hexrange[ "A"-"F" | "a"-"f" ]digit -digit digit->decrangehexdigit -hexdigit hexdigit->digithexadecimal-hexadecimalhexdigit->hexadecimalhexadecimal->hexrange

I’ve put the shared nonterminals in bold, and I’ve distinguished between nonterminals and literals by using boxes for nonterminals and “notes” for terminals.

Then numerics.ixml:

numericscluster_numericsnumerics.ixmldigitdigithexdigithexdigitnumbernumberdecimal -decimal number->decimalhexadecimal -hexadecimal number->hexadecimaldecimal->digit +hexadecimal->hexdigit +

In this diagram, we have digit and hexdigit represented with ovals as place holders. Those are nonterminals that we’re going to import.

Finally, we have partno.ixml:

+uses decimal from "numerics.ixml" .

partno: decimal .

which simply uses decimal from numerics.ixml. Graphically:

partnocluster_partnopartno.ixmldecimaldecimalpartnopartnopartno->decimal

If we combine the three graphs, we can connect the placeholders to the actual rules in the imported grammars:

combinedcluster_digitsdigits.ixmlcluster_numericsnumerics.ixmlcluster_partnopartno.ixmldecrange[ "0"-"9" ]hexrange[ "A"-"F" | "a"-"f" ]digit-digitdigit->decrangehexdigit-hexdigithexdigit->digithexadecimal-hexadecimalhexdigit->hexadecimalhexadecimal->hexrangehexadecimal2-hexadecimalhexadecimal2->hexdigit +decimal-decimaldecimal->digit +numbernumbernumber->hexadecimal2number->decimalpartnopartnopartno->decimal

In this view, it’s clear how the pieces fit together. Note, in particular, that the two different rules for the hexadecimal nonterminal are unambiguous. Any sort of textual inclusion mechanism would run the risk of creating a situation where there were two rules for hexadecimal in the grammar, which would be an error.

Using the Invisible XML aliasing feature, we can rename one of the hexadecimal rules so that it has a unique name, but will still serialize as <hexadecimal>:

combined_renamedcluster_digitsdigits.ixmlcluster_numericsnumerics.ixmlcluster_partnopartno.ixmldecrange[ "0"-"9" ]hexrange[ "A"-"F" | "a"-"f" ]digit-digitdigit->decrangehexdigit-hexdigithexdigit->digithexadecimal2-x1 > hexadecimalhexdigit->hexadecimal2hexadecimal2->hexrangedecimal-decimaldecimal->digit +hexadecimal-hexadecimalhexadecimal->hexdigit +numbernumbernumber->decimalnumber->hexadecimalpartnopartnopartno->decimal

Now, we could merge all the rules together into a single grammar, but before we do that, we can do a final step, we can remove any nonterminals that are unreachable.

combined_simplifiedcluster_simplified(composed.ixml)decrange[ "0"-"9" ]digit-digitdigit->decrangedecimal-decimaldecimal->digit +partnopartnopartno->decimal

That’s the final “composed” grammar that will be used for parsing. It never needs to be represented in iXML, but if it was it would look like this:

partno = -decimal .
decimal = -digit+ .
digit = ["0"-"9"] .

One thing that’s easy to imagine at this level is an override mechanism. Suppose that part numbers are decimal but can also include “X”. We could invent a syntax for this:

+uses decimal from "numerics.ixml" overrides digit .
+shares partno .

partno = -decimal .
-digit = ["0"-"9" | "X" ] .

The intent is clear from the corresponding graph:

combined_renamedcluster_partnopartnocluster_numericsnumericscluster_digitsdigitspartnopartnopdigit-digitdecimal-decimalpartno->decimalnumbernumberdecrange2[ "0"-"9" | "X" ]pdigit->decrange2decimal->pdigit +digit2-x1 > digitdecimal->digit2hexadecimal-hexadecimalhexdigit-hexdigithexadecimal->hexdigit +number->decimalnumber->hexadecimalhexdigit->digit2hexadecimal2-x2 > hexadecimalhexdigit->hexadecimal2decrange[ "0"-"9" ]hexrange[ "A"-"F" | "a"-"f" ]digit2->decrangehexadecimal2->hexrange

(The dotted line isn’t really in the graph, it just shows where the decimal link would have gone without the override.)

The ability to rename rules automatically solves most of the grammar modularity problem. But it still breaks down if you need to use symbols from two different grammars that have the same name:

{ This is an error }
+uses pound from "currencies.ixml" .
+uses pound from "imperial-units.ixml" .
...

I think that can be finessed as well, along the lines of Python imports, if necessary:

+uses pound as currency-pound from "currencies.ixml" .
+uses pound as weight-pound from "imperial-units.ixml" .
...

In the meantime, there’s an obvious workaround: use intermediate grammars to create the renamed nonterminals:

currency-pound.ixml:

+uses pound from "currencies.ixml" .
currency-pound = pound

weight-pound.ixml:

+uses pound from "imperial-units.ixml" .
weight-pound = pound

And finally:

+uses currency-pound from "currency-pound.ixml" .
+uses weight-pound from "weight-pound.ixml" .
...

I don’t really have a sense of how often that’s likely to be necessary. And I haven’t tried to implement “uses … as …” (yet).

#Invisible XML #MarkupMonday #XML

Please provide your name and email address. Your email address will not be displayed and I won’t spam you, I promise. Your name and a link to your web address, if you provide one, will be displayed.

Your name:

Your email:

Homepage:

Do you comprehend the words on this page? (Please demonstrate that you aren't a mindless, screen-scraping robot.)

What is four plus three?   (e.g. six plus two is 8)

Enter your comment in the box below. You may style your comment with the CommonMark flavor of Markdown.

All comments are moderated. I don’t promise to preserve all of your formatting and I reserve the right to remove comments for any reason.