so

You can’t parse XML DTDs

Volume 9, Issue 18; 01 Sep 2025

Another surprising (to me) observation about the XML grammar for XML.

I was thinking some more about parsing DTDs with the specification grammar for XML and I realized that you can’t. I mean, obviously, you can parse XML DTDs, but you can’t do it by using the grammar andI don’t think this is news. And maybe I knew it 30 years ago and have just forgotten. In any event, iXML has rekindled my interest in parsers, so I’m coming at this with a different perspective now than I would have had back then. off-the-shelf tools to build a parser from the grammar.

This problem, like the ambiguity, arises in the definition of conditional sections. But this one is insoluble.

Consider the example from the specification:

<!ENTITY % draft 'INCLUDE' >
<!ENTITY % final 'IGNORE' >

<![%draft;[
<!ELEMENT book (comments*, title, body, supplements?)>
]]>
<![%final;[
<!ELEMENT book (title, body, supplements?)>
]]>

It just flat out doesn’t parse according to the grammar because the grammar for conditional sections only allows INCLUDE or EXCLUDE and we’ve got parameter entity references (%draft; and %final;) in that example.

To be fair to the XML specification, this is called out explicitly:

If the keyword of the conditional section is a parameter-entity reference, the parameter entity MUST be replaced by its content before the processor decides whether to include or ignore the conditional section.

That means the parser has to keep track of what parameter entity declarations it’s seen and replace them in a conditional section before attempting to parse the conditional section. At that point, you’re so far outside the scope of what an off-the-shelf grammar-based parser can do that you might just as well special case the whole task of parsing conditional sections.

I expect this is why no one noticed that the grammar was ambiguous.

And note that fixing the grammar so that includeSect and ignoreSect could parse a parameter entity reference in addition to the literal words INCLUDE and IGNORE wouldn’t help. The content model of the section depends on what the parameter entity resolves to!

In SGML, you could put conditional sections in your document. They were sometimes quite useful especially because, unlike comments, they can be nested. But in XML, they can only occur in the DTD and I bet they are very nearly unused.

I’m just saying up front, XMLn’t ain’t gonna parse ’em if they’re defined with parameter entities.

#XML

Please provide your name and email address. Your email address will not be displayed and I won’t spam you, I promise. Your name and a link to your web address, if you provide one, will be displayed.

Your name:

Your email:

Homepage:

Do you comprehend the words on this page? (Please demonstrate that you aren't a mindless, screen-scraping robot.)

What is five minus one?   (e.g. six plus two is 8)

Enter your comment in the box below. You may style your comment with the CommonMark flavor of Markdown.

All comments are moderated. I don’t promise to preserve all of your formatting and I reserve the right to remove comments for any reason.