so

What about 3 to 6 occurrences?

Volume 9, Issue 30; 17 Nov 2025

A proposal to simplify Invisible XML grammars that match a range of occurrences.

Invisible XML has operators for “one, optionally”, “zero or more”, and “one or more” occurrences of a string.Technically a “factor” not a string. That means a terminal, a nonterminal, an insertion, or a parenthesized expression. But I’m going to stick with strings here for simplicity. Those operators are ? ("a"? matches “” or “a”), * ("a"* matches “” or “a” or “aa”, etc.), and + ("a"+ matches “a” or “aa”, etc.).

But what if you want at least three but not more than six occurrences?

You can do that too, but you have to work a little harder:

"a", "a", "a", ("a", ("a", ("a")?)?)?

That could be simplified slightly to

"aaa", ("a", ("a", "a"?)?)?

but I think that obscures the pattern a bit. It’s worth noting that you can also simplify it this way:

"a", "a", "a", "a"?, "a"?, "a"?

Unfortunately, that will result in ambiguous parses. In the string “aaaa”, that fourth “a” could match either the first, second, or third optional “a”.

In issue 308, martian-a observes that the inability to specify a specific number of repetitions leads to grammars that are harder to read. The example given is:

code: -" ", [Lu], [Lu], ["0"-"9"], ["0"-"9"], ["0"-"9"], ["0"-"9"], ["0"-"9"], ["0"-"9"] .

Quick! How many digits are allowed? This initial question is about a single, specific number of occurrences, but in the first follow-up comment, vincentml observes that it applies to ranges as well.

At this point, I entered the chat and stared noodling on some ideas. I’ve pulled those ideas together into a proposal for specified repetitions with and without a separator.

The basic idea is to allow <m,n> in addition to ?, *, and +. The new syntax expresses that at least “m” occurrences are required, but no more than “n” are allowed. If “n” is omitted, it’s assumed to be the same as “m”. If you want at least “m” occurrences with no upper bound, use * for “n”: <3,*> is at least three but as many as you like.

The example martian-a introduced can be simplified to:

code: -" ", [Lu], [Lu], ["0"-"9"]<6>.

And if you wanted 3 to 6 occurrences, you’d write:

code: -" ", [Lu], [Lu], ["0"-"9"]<3,6>.

The repetition counts have to be greater than or equal to zero. I don’t think -3 “a”s makes any sense. I also don’t think there’s any obvious interpretation for <6,3>, so I propose that the second number must be greater than or equal to the first. Technically, that leaves <0,0>,"a"<0> matches no “a”s. But so does {"a"}. Invisible XML uses curly braces for comments and commenting out the string is equally effective at removing it and less potentially confusing. but that seems really awkward, so I also propose that the second number must be greater than zero.

Invisible XML has two more repetition forms, ** and ++. They indicate repetition with a separator: "a"**"," matches zero or more “a”s separated by commas: “”, “a”, “a,a”, “a,a,a”, etc. And ++ matches one or more such “a”s.

Following the pattern of repeating the character, I propose <<m,n>> for this purpose. The pattern:

"a"<<3,6>>","

matches between three and six “a”s, separated by commas.

In the proposal, I extended the “hints for implementors” section to show how these new syntactic forms can be simplified to existing forms.

We’re left with the fact that each of the current occurrence indicators can be expressed using the new syntax (? is <0,1>, etc.). I think we should just live with that. Yes, those forms could be forbidden, or the existing occurrence indicators could be removed, but neither of those seems worthwhile.

If, at this point, you want to say that you don’t like the choice of < and >, well, okay. What do you propose instead? I think it’s useful if they’re a matched pair of delimiters. But parentheses, square brackets, and curly braces are all unavailable. That uses up the “easy to type on a regular keyboard” choices, I think. You could, instead, use the same character before and after, for example /3,6/ and //3,6//. I think that could be made to parse. I don’t think it’s better.Bikeshed all the things! (Or much worse, TBH.)

Repetition on an insertion is allowed, and a bit odd. But we allow the existing occurrence indicators and those are awful too:

"a", +"X", "a"

matches “aa” and serializes as “aXa”.

"a", +"X"*, "a"

is infinitely ambiguous. It matches “aa” and serializes as “a” followed by 0, 1, 2, 3, … ∞ “X”s, followed by “a”. The syntactic form +"X"<3,6> isn’t more infinitely ambiguous.

#Invisible XML #MarkupMonday #XML

Please provide your name and email address. Your email address will not be displayed and I won’t spam you, I promise. Your name and a link to your web address, if you provide one, will be displayed.

Your name:

Your email:

Homepage:

Do you comprehend the words on this page? (Please demonstrate that you aren't a mindless, screen-scraping robot.)

What is six times three?   (e.g. six plus two is 8)

Enter your comment in the box below. You may style your comment with the CommonMark flavor of Markdown.

All comments are moderated. I don’t promise to preserve all of your formatting and I reserve the right to remove comments for any reason.