so

xproc (for DocBook)

Volume 9, Issue 27; 03 Nov 2025

Making it much easier to format DocBook documents.

DocBookYes, I would say those things, wouldn’t I? is a great tag set. And the xslTNG Stylesheets are a great way to transform DocBook documents into HTML or, with the help of a CSS formatter, printed pages.

They are, however, a little bit fiddly to setup. Not hard, exactly, but

  1. You want access to the schemas, so you can validate, and the stylesheets, for transformations.
  2. You want all of the resources associated with the stylesheets (CSS, JavaScript, icons, etc.) to be transparently available.
  3. You want to avoid getting everything from the web everytime, so you want local copies of things. But you don’t want to manage that by hand.
  4. You want to load the Java extensions so that you get transparent access to the appropriate XML Catalogs and to functions that improve the quality of, for example, image processing.
  5. You want a wrapper around all the steps that have to be done so that XML goes in one end and HTML (or PDF) comes out the other.

If you’re setting up a new project, the documentation for a sophisticated bit of software, these are one time setup tasks and the payoff is obvious. But what if you aren’t?

That was the situation I found myself in last week. I had a single DocBook document that I wanted to turn into a PDF. I didn’t have a project setup to do it and setting up a project to do one document seemed like an awful lot of faff just to avoid using LibreOffice.LibreOffice is great, but I hate WYSIWYG word processing. I don’t want to see it, I want to tell you what I want and I want you to do it, with the minimum of fuss, thank you very much. I realized, what I wanted was to write a document, stick it in a directory, type “do the thing”, and have PDF come out.

So, on Saturday, I wrote that.

Clone the project, put document.xml in the xml directory, typeOr .\gradlew on Windows; if you’re on Windows, just mentally replace “/” in filenames and commands with “\”.

./gradlew document.pdf

in your shell window, and out pops document.pdf. Type

./gradlew document.html

and out pops the document.html. Type

./gradlew document.chunk.html

and out pops a chunked version of the document.

You have to write the XML using DocBook, of course, but if that’s not a thing you want to do, I’m surprised you’ve read this far. 🙂 If you want PDF, you have to configure a CSS formatter. Antenna House, Prince, and WeasyPrint are all options.

The system is extensible; if you add more documents or more pipelines, you automatically get more commands (technically, build targets). Write document2.xml and you can immediately format it. Add a PDF pipeline called foo.xpl and automatically

./gradlew document.foo.pdf

becomes a command that produces PDF with that pipeline.

About lunch time, I decided that I wanted to do two more things: I wanted to scratch another personal itch (writing correspondence), and I wanted to demonstrate how the system could be extended.

Writing business correspondence isn’t in DocBook’s wheelhouse, so I extended the schema a bit. I added a letter document type with appropriate furniture (from and to addresses, a salutation, and a closing). I’m not saying it’s the best schema for correspondence, but it’s good enough for my purposes.

Then I wrote a few XSLT templates to do the correct thing with letter documents. There’s a bit of flexibility depending on how you format the names of the correspondents and whether you mark the letter as formal or informal, but it’s not rocket science.

If you write letter.xmlThere’s no documentation for the schema (yet?) 🙁, but there’s an example in the repository and the schema isn’t complicated. and run

./gradlew letter.pdf

you get a PDF letter. If you want HTML, you can get that too. (I’ve made no attempt to support chunking in letters, so I don’t advise it.)

All of which makes it easy for me to write correspondence and demonstrates extensibility.

Finally, before dinner, I decided I wanted letterhead. Initially, I wrote a PDF pipeline called letterhead.xpl which used letterhead.xsl. That meant the command

./gradlew letter.letterhead.pdf

produced letter.pdf on letterhead.There’s an unresolved issue here. The letterhead is confined within the margins of the page. I can completely imagine that you might want a letterhead that spands the whole page (with bleeds, even, perhaps). I haven’t worked out how practical that is in CSS paged media. Suggestions welcome. But that only worked on PDF. I could make copies of things that so it also worked on HTML, but that seemed redundant.

Instead, I moved some things around and implemented it as an option. Adding -Pletterhead=true to any Gradle invocation will put the resulting output on letterhead. For example,

./gradlew -Pletterhead=true letter.pdf

(The letterhead is awful, you’ll want to replace that with your own.)

If you want to add your own options, there are breadcrumbs in build.gradle.kts. Feel free to open an issue if you get stuck.

It was fun, I think it’s useful, I’ve scratched a couple of personal itches,Oh, and later today, I’ll finally be able to format the document that started all of this! and I’m now seven-for-seven in #MarkupMonday posts. Not bad for a day’s hacking.

P.S. The xslTNG Stylesheets do quite a lot with the fn:transform() function. So, unlike the DocBook XSLT 2.0 Stylesheets, you don’t have to use XProc to use them. But I decided to use XProc because writing XProc pipelines is sometimes way easier than working out a complex stylesheet customization layer. Use the best tool for the job.

I’ve used XML Calabash, but there are no extension steps involved. It should be possible to port this to MorganaXProc-III if you prefer (and why shouldn’t you, it’s a great product!). The only wrinkle I can think of is that you’ll have to find a way to tell Saxon to register the extension functions (the equivalent of

-init:org.docbook.xsltng.extensions.Register

on the command line). But I assume that’s possible.

#DocBook #MarkupMonday #XML #XSLT

Please provide your name and email address. Your email address will not be displayed and I won’t spam you, I promise. Your name and a link to your web address, if you provide one, will be displayed.

Your name:

Your email:

Homepage:

Do you comprehend the words on this page? (Please demonstrate that you aren't a mindless, screen-scraping robot.)

What is six plus four?   (e.g. six plus two is 8)

Enter your comment in the box below. You may style your comment with the CommonMark flavor of Markdown.

All comments are moderated. I don’t promise to preserve all of your formatting and I reserve the right to remove comments for any reason.