xproc (for DocBook)
Making it much easier to format DocBook documents.
DocBookYes, I would say those things, wouldn’t I? is a great tag set. And the xslTNG Stylesheets are a great way to transform DocBook documents into HTML or, with the help of a CSS formatter, printed pages.
They are, however, a little bit fiddly to setup. Not hard, exactly, but
- You want access to the schemas, so you can validate, and the stylesheets, for transformations.
- You want all of the resources associated with the stylesheets (CSS, JavaScript, icons, etc.) to be transparently available.
- You want to avoid getting everything from the web everytime, so you want local copies of things. But you don’t want to manage that by hand.
- You want to load the Java extensions so that you get transparent access to the appropriate XML Catalogs and to functions that improve the quality of, for example, image processing.
- You want a wrapper around all the steps that have to be done so that XML goes in one end and HTML (or PDF) comes out the other.
If you’re setting up a new project, the documentation for a sophisticated bit of software, these are one time setup tasks and the payoff is obvious. But what if you aren’t?
That was the situation I found myself in last week. I had a single DocBook document that I wanted to turn into a PDF. I didn’t have a project setup to do it and setting up a project to do one document seemed like an awful lot of faff just to avoid using LibreOffice.LibreOffice is great, but I hate WYSIWYG word processing. I don’t want to see it, I want to tell you what I want and I want you to do it, with the minimum of fuss, thank you very much. I realized, what I wanted was to write a document, stick it in a directory, type “do the thing”, and have PDF come out.
So, on Saturday, I wrote that.
Clone the project, put document.xml in the xml directory, typeOr .\gradlew on Windows; if you’re on Windows, just mentally replace “/” in filenames
and commands with “\”.
./gradlew document.pdf
in your shell window, and out pops document.pdf. Type
./gradlew document.html
and out pops the document.html. Type
./gradlew document.chunk.html
and out pops a chunked version of the document.
You have to write the XML using DocBook, of course, but if that’s not a thing you want to do, I’m surprised you’ve read this far. 🙂 If you want PDF, you have to configure a CSS formatter. Antenna House, Prince, and WeasyPrint are all options.
The system is extensible; if you add more documents or more pipelines, you
automatically get more commands (technically, build targets). Write
document2.xml and you can immediately format it. Add a PDF pipeline
called foo.xpl and automatically
./gradlew document.foo.pdf
becomes a command that produces PDF with that pipeline.
About lunch time, I decided that I wanted to do two more things: I wanted to scratch another personal itch (writing correspondence), and I wanted to demonstrate how the system could be extended.
Writing business correspondence isn’t in DocBook’s wheelhouse, so I extended the
schema a bit. I added a letter document type with appropriate furniture
(from and to addresses, a salutation, and a closing). I’m not saying
it’s the best schema for correspondence, but it’s good enough for my purposes.
Then I wrote a few XSLT templates to do the correct thing with letter
documents. There’s a bit of flexibility depending on how you format the names of
the correspondents and whether you mark the letter as formal or informal, but
it’s not rocket science.
If you write letter.xmlThere’s no documentation for the schema (yet?) 🙁, but there’s an
example in the repository and the schema isn’t complicated. and run
./gradlew letter.pdf
you get a PDF letter. If you want HTML, you can get that too. (I’ve made no attempt to support chunking in letters, so I don’t advise it.)
All of which makes it easy for me to write correspondence and demonstrates extensibility.
Finally, before dinner, I decided I wanted letterhead. Initially, I wrote a PDF
pipeline called
letterhead.xpl which used letterhead.xsl. That meant the
command
./gradlew letter.letterhead.pdf
produced letter.pdf on letterhead.There’s an unresolved issue here. The letterhead is confined within the
margins of the page. I can completely imagine that you might want a letterhead
that spands the whole page (with bleeds, even, perhaps). I haven’t worked out
how practical that is in CSS paged media. Suggestions welcome. But that only worked on PDF. I could make copies of things that so it also
worked on HTML, but that seemed redundant.
Instead, I moved some things around and implemented it as an option. Adding
-Pletterhead=true to any Gradle invocation will put the resulting output on letterhead.
For example,
./gradlew -Pletterhead=true letter.pdf
(The letterhead is awful, you’ll want to replace that with your own.)
If you want to add your own options, there are breadcrumbs in build.gradle.kts.
Feel free to open an issue if you get stuck.
It was fun, I think it’s useful, I’ve scratched a couple of personal itches,Oh, and later today, I’ll finally be able to format the document that
started all of this! and I’m now
seven-for-seven in #MarkupMonday posts. Not bad for a day’s hacking.
P.S. The xslTNG Stylesheets do quite a lot with the fn:transform() function.
So, unlike the DocBook XSLT 2.0 Stylesheets, you don’t have to use XProc to use
them. But I decided to use XProc because writing XProc pipelines is sometimes way easier
than working out a complex stylesheet customization layer. Use the best tool for the job.
I’ve used XML Calabash, but there are no extension steps involved. It should be possible to port this to MorganaXProc-III if you prefer (and why shouldn’t you, it’s a great product!). The only wrinkle I can think of is that you’ll have to find a way to tell Saxon to register the extension functions (the equivalent of
-init:org.docbook.xsltng.extensions.Register
on the command line). But I assume that’s possible.