so

🧩 XML 🧩

Volume 10, Issue 13; 09 Mar 2026

I love how the pieces fit together. In this case, for a possibly mad idea about publishing with Typst.

On Saturday afternoon, I stumbled across a reference to Typst. I think I might have seen it once before, but not really paid much attention. I’m not interested in using an online platform for authoring, but the typst compiler is a cross platform, open source application that I can run locally.

The idea is simple enough, you type in a sort of elaborated Markdown and the typst compiler turns it into an accessible PDF (and maybe other things). The typst markdown language is, in fact, a complete programming language with variables and conditionals and loops and all. That’s an idea that’s as old as TeX, possibly older. But typst is a modern reinvention.

I took a look and I wondered if I could use it to generate PDF. By which I mean, can I generate its flavor of Markdown from XML? (Because of course that’s what I mean!)

The answer to this question is obviously “yes”, but it’d be tedious and perhaps difficult to write an XML-vocabulary-to-typst stylesheet. And when I was done, I’d have that, but nothing especially reusable. Nothing that would help me when I had some other flavor of XML (say, expense reports in a bespoke vocabulary) that I wanted to typeset.

Speaking of TeX, I’ve had exactly this idea before. My intuition is that rather than going from the source vocabulary to TeX directly, it would make more sense to define an XML vocabulary for the TeX data model, transform to that XML from the source vocabulary, and transform from that XML to TeX. Trouble is, the TeX data model is…large, complicated, and rather loose. Every time I imagine trying to write an XML vocabulary for it, I get lost in the weeds before I get to the first hurdle.

Typst is a lot simpler and has a much cleaner model. I had the wild thought that this might be an opportunity to try out the conversion idea and test something else I’ve wondered about. Peter Flynn and I have talked about the TeX transformation; Peter thinks that the place to do the conversion is in a custom serialization method. Might work, but I’d have to try it to know for sure.

My brain, having found about the third rabbit hole of the weekend, plunged in. I’ve no particular interest in recreating the whole typst programming environment and language in XML, but if I can model enough of it, I’ll have a playground for comparing conversion-with-XSLT and conversion-with-serializer. (And maybe a new publishing tool.)

So what do we need? We need an XML model for typst, we need a schema for that model so that we have some guard rails, we need a stylesheet to convert that vocabulary to Markdown (serializer experiment to come later), and we need a stylesheet to convert an arbitrary XML vocabulary (cough DocBook) into the typst model.

I started looking at the typst documentation and constructing a RELAX NG grammar to model it in XML. A half hour in, I’d got a few things sort of working, but it was feeling a bit daunting. There’s a lot of detail and copying it from the documentation into RELAX NG is…wait. What? Why am I doing that? What is wrong with me?

The typst documentation lays out the model in a clear, regular format. This, for example, is a paragraph:

par(
  leading: length,
  spacing: length,
  justify: bool,
  justification-limits: dictionary,
  linebreaks: auto str,
  first-line-indent: length dictionary,
  hanging-indent: length,
  content,
) -> content

Clear, regular formats can be parsed. I grabbed all the model descriptions and stuffed them in a text file. Then I wrote an iXML grammar to turn them into XML:

<model name="par">
   <keyword name="leading">
      <choice name="length"/>
   </keyword>
   <keyword name="spacing">
      <choice name="length"/>
   </keyword>
   <keyword name="justify">
      <choice name="bool"/>
   </keyword>
   <keyword name="justification-limits">
      <choice name="dictionary"/>
   </keyword>
   <keyword name="linebreaks">
      <choice name="auto"/>
      <choice name="str"/>
   </keyword>
   <keyword name="first-line-indent">
      <choice name="length"/>
      <choice name="dictionary"/>
   </keyword>
   <keyword name="hanging-indent">
      <choice name="length"/>
   </keyword>
   <content>
      <choice name="content"/>
   </content>
</model>

The markup isn’t really important.The markup would be a little nicer if it was hanging-indent/length instead of keyword[@name='hanging-indent']/choice[@name='length’] but that’s not possible directly in iXML. (Another rabbit hole opens up nearby, wait, I have an idea…) It’s a more-or-less 1:1 translation of the function signatures into XML. Critically, it distinguishes between positional and keyword arguments and enumerates what the choices are for each argument.

One XSLT stylesheet later and we have a fully formed RELAX NG grammar for all of typst, of which this is the corresponding part:

<define name="par">
   <element name="par">
      <optional>
         <ref name="par-leading"/>
      </optional>
      <optional>
         <ref name="par-spacing"/>
      </optional>
      <optional>
         <ref name="par-justify"/>
      </optional>
      <optional>
         <ref name="par-justification-limits"/>
      </optional>
      <optional>
         <ref name="par-linebreaks"/>
      </optional>
      <optional>
         <ref name="par-first-line-indent"/>
      </optional>
      <optional>
         <ref name="par-hanging-indent"/>
      </optional>
      <ref name="content"/>
   </element>
</define>

The par-leading pattern defines the leading element, but by using distinct pattern names, I don’t have to worry about some other model that uses leading in some different way.

And I lied. That’s not really a complete RELAX NG grammar because it doesn’t have patterns for the basic data types (int, str, bool, …). There’s few enough of them that I just scratched out a datatypes.rnc file and generated an include for it in the typst.rng file.

Given what I’ve described so far, we need to process the text file with Invisible XML, convert the resulting XML into RELAX NG, and convert the datatypes.rnc file to an rng file. Then we’ll have a grammar we can use to validate our typst XML input.

One short XProc pipeline later and that’s sorted.

But does it work? Let’s find out! Here’s a test document:

<article xmlns="http://docbook.org/ns/docbook">
<title>A test article</title>
<para>This is generated from <emphasis>DocBook</emphasis>.</para>
</article>

Two more (very experimental and very incomplete) stylesheets and another XProc pipeline later and we can turn our test document into this:

<typst xmlns="http://nwalsh.com/ns/typst">
   <document>
      <title>
         <content>A test article</content>
      </title>
   </document>
   <content>
      <heading>
         <level>
            <int>1</int>
         </level>
         <content>A test article</content>
      </heading>
      <par>
         <content>This is generated from <emph>
               <content>DocBook</content>
            </emph>.</content>
      </par>
   </content>
</typst>

and then this:

#set document(title: [A test article])
#heading(level: 1)[A test article]
#par()[This is generated from #emph()[DocBook].]

and then compile it with typst. Out comes an entirely reasonable PDF file!

Win!

Obviously, neither the typst XML format nor the resulting Markdown output is anything a user would want to type by hand. But that’s not important because no user is going to.

There are still some aspects of the typst data model that I don’t understand. I’m having trouble, for example, working out exactly what’s allowed in content in various places. Behind the scenes, I have a small hack that distinguishes between “inline content” where text is allowed and “block content” where it isn’t. But I’m pretty confident that’s me holding the wrong end of some stick.

Is this going to go anywhere? I dunno. But it was fun and I absolutely love how quickly things came together with the XML stack: Invisible XML, XML, XSLT, RELAX NG, and XProc really make short work of things.

#DocBook #Invisible XML #Markdown #MarkupMonday #XML

Please provide your name and email address. Your email address will not be displayed and I won’t spam you, I promise. Your name and a link to your web address, if you provide one, will be displayed.

Your name:

Your email:

Homepage:

Do you comprehend the words on this page? (Please demonstrate that you aren't a mindless, screen-scraping robot.)

What is nine plus one?   (e.g. six plus two is 8)

Enter your comment in the box below. You may style your comment with the CommonMark flavor of Markdown.

All comments are moderated. I don’t promise to preserve all of your formatting and I reserve the right to remove comments for any reason.