so

Annotating the XML Catalogs specification

Volume 5, Issue 10; 28 Jun 2021

A few notes on the production of an annotated XML Catalogs specification.

In the course of working on the XML Resolver, I wrote a lot more tests. And that sent me back to the XML Catalogs specification where I discovered a couple of things that I found confusing. Not because they’re actually wrong, just because I hadn’t read the spec in a decade or more so I was coming to it cold.

Of course, everyone in the world except perhaps the editor (me!) and the committee that approved the spec, comes to it cold. If it’s confusing from that perspective, we should fix that. I wanted to get a couple of clarifying errata published, but it turns out, that’s not easy because the Technical Committee that produced it has long since ceased to exist.

Another thing I noticed is that 15 year old HTML isn’t the prettiest thing to read. I thought I could fix both of these issues at once by reformatting the specification with a more modern set of stylesheets and adding some annotations.

Reformatting

It’s important that the reformatted draft is as nearly exactly the same as possible to the original. I want it to be uncontroversial and easy to see that it’s a republication of the original, not a derived work.

The original specification is in DocBook 4.0 and uses a bunch of entities and I wanted it in DocBook 5.0 for republishing. That’s a problem because a hands-off stylesheet conversion from 4.0 to 5.0 would be clean in one way (no opportunity for random user error) but less faithful in another (the entity declarations and uses would be lost). Conversely, a hand translation would leave the text looking “more the same” but could introduce other errors.

For better or worse, I went with a hand translation. So I assert that the source I’m using is a faithful copy of the original, but I can’t claim it’s “unedited” in the sense of never having opened it up in Emacs and poked at it. (That is, in fact, just not true, see below.)

DocBook to more modern HTML with the DocBook xslTNG Stylesheets was mostly easy. I did have to write a small customization layer to deal with the extension markup used for the tableaux for the catalog elements.

Annotating

I’m quite pleased with the way that annotations are processed by my new DocBook stylesheets. They’re rendered as unobtrusive pop-ups in a JavaScript capable browser and the sources for them can be stored completely out-of-band. I didn’t have to edit the XML Catalogs specification to add them, just point the stylesheets at the annotations file.

Unfortunately, two of the places I really wanted to annotate don’t have any ID values. By this point in the process, I was really keen on the “no changes to the specification” so I didn’t want to just add ID values.

Before about four days ago, you could only annotate elements that had IDs because location/annotation cross referencing was strictly by ID.

I fixed that by adding an extension to the stylesheets. It’s now possible to add annotations by XPath expression:

<annotation
  annotates="xpath:(//db:informaltable
                      [position() = 1 or position() = 2])
                    /db:tgroup/db:tbody/db:row[1]/db:entry[4]"/>
…
</annotation>

If the annotates attribute contains the string “xpath:” then everything following that marker is taken to be an XPath expression. Since “xpath:” can’t be an ID value, this can’t cause any non-extended use of annotations to be misinterpreted. (Luckily the annotates attribute is not defined as IDREFS, but simply as text, because that’s necessary for valid out-of-band annotation markup.)

Accessibility

Having the annotations in a separate file meant that I could publish them separately which both makes them accessible if your browser doesn’t support JavaScript popups, and means they can have their own metadata (publication date, version, etc.)

Along the lines of accessibility, another thing I wanted to do was make the specification easier to read on a mobile device. I don’t have a lot of experience trying to do that and I learn something new every time.

The first thing I noticed when I looked at the spec on my phone was that the little “persistent ToC” icon in the upper right corner was way off the screen. I initially assumed this was a CSS error on my part, so it took me a while to find the root cause: <pre> is not your friend.

The element tableaux (see, for example, 6.5.1 The catalog Entry) were rendered with pre. I’d played a few games in the stylesheets to add explicit line breaks to the “Content:” line, but those line breaks were way too wide on a mobile device.

That pushed everything that was floating to the right, about 100%Like US politics, then? [Hush now, Mr. Snarkypants -ed] too far to the right. It also made the tableaux almost unreadable.

I reworked the stylesheet so that those sections are rendered as nested div elements with CSS. This allows them to wrap when the column is narrow. They’re not quite as pretty if you look at them without CSS, but they’re not unreadable by any means.

The next thing I discovered was that long URIs, namespace names for example, caused a similar problem. I resolved this by the expedient of adding soft hyphens in the source. So I have, in fact, made changes to the original specification, but they aren’t significant changes. (I assert.)

The mobile rendering is still imperfect. On a sufficiently narrow device, the table in section 6.3 is also problematic, for example. I mess with it’s font size a bit below a certain width. Truth is, I don’t really have a solid grasp of how to efficiently use media queries and CSS to optimize the mobile experience. (And I’m somewhat loathe to get into the game of writing a dozen or more media queries for every conceivable device size.)

But it’s better. Suggestions for improvement (and annotations in this case, if there are other parts of the XML Catalogs spec that you find confusing) most welcome.

Please provide your name and email address. Your email address will not be displayed and I won’t spam you, I promise. Your name and a link to your web address, if you provide one, will be displayed.

Your name:

Your email:

Homepage:

Do you comprehend the words on this page? (Please demonstrate that you aren't a mindless, screen-scraping robot.)

What is five plus four?   (e.g. six plus two is 8)

Enter your comment in the box below. You may style your comment with the CommonMark flavor of Markdown.

All comments are moderated. I don’t promise to preserve all of your formatting and I reserve the right to remove comments for any reason.