DocBook xslTNG (progress report)

Volume 6, Issue 17; 28 Dec 2022

Will I be done by Christmas? No. By boxing day? Also no. New Year’s Day? Maybe.

Back in early November, I got a bug report. The underlying bug is that the xslTNG stylesheets get localization wrong for Chinese. What happened was, when I reworked things for the xslTNG stylesheets, I tried to remove some complexity around generated text.

By generated text, I mean the parts of titles (for example) that aren’t in your markup. If the fifth <chapter> element in your book begins:

<chapter xml:lang="en">
<title>The next big thing</title>

You might get “Chapter 5. The Next Big Idea” at the top of the page for that chapter and something similar, but perhaps not exactly the same, in the Table of Contents, and in cross references, etc.

The word “Chapter” and the number “5” are generated text. There are lots of other examples: titles, labels, list numbers, list separators, etc.

In xslTNG, I tried to replace a complex system of templates with a model that decomposed the text that had to be generated into logical parts. Worked fine for English. Not so much for Chinese. And possibly not so much for other languages too. (I was, in retrospect, incredibly naive and short-sighted.)

I don’t consider myself competent to construct a logical model for generated text across all human languages, so I decided to give up and go back to having templates. That puts the power in the hands of the authors of localizations.

At this point, we have to ask: “could I make the xslTNG stylesheets use exactly the same localization data files as the previous generation of stylesheets?” Depending on how much work you want to make “could I” do, I suppose it might be possible, but I decided not to. The original data comes from the XSLT 1.0 stylesheets from a couple of decades ago. That’s not exactly the same as the data used by the XSLT 2.0 stylesheets from almost a decade ago. And the xslTNG stylesheets already have a completely different model.

At the same time, I’m hampered by the fact that I don’t know how to read or write Afrikaans, Albanian, Amharic, Arabic, Assamese, Asturian, Azerbaijani, Bangla, Basque, Bosnian, Bulgarian, Catalan, Chinese, Chinese (Taiwan), Chinese Simplified, Croatian, Czech, Danish, Dutch, Esperanto, Estonian, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Hebrew, Hindi, Hungarian, Icelandic, Indian Bangla, Indonesian, Irish, Italian, Japanese, Kannada, Kirghiz, Korean, Latin, Latvian, Lithuanian, Low German, Malayalam, Marathi, Mongolian, Northern Sami, Norwegian Bokmål, Norwegian Nynorsk, Oriya, Polish, Portuguese, Portuguese (Brazil), Punjabi, Romanian, Russian, Serbian in Cyrillic script, Serbian in Latin script, Slovak, Slovenian, Spanish, Swedish, Tagalog, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Vietnamese, Welsh (well enough to count!), or Xhosa.

That means I need to stay somewhat close to the model that’s been demonstrated to work for DocBook in the past.

It’s slow going (and if I’m being perfectly honest, not especially fun), but I think I’m making progress. I got all the tests to pass, started writing the documentation, decided I needed to change something, and broke a bunch of tests. Lather. Rinse. Repeat.