so

GetEntity

Volume 5, Issue 16; 23 Jul 2021

The .NET System.Xml.XmlResolver API’s GetEntity method returns the resolved entity as an unadorned Stream. That is an unfortunate design choice.

I previously observed that I think the .NET System.Xml.XmlResolver interface is fundamentally broken. I’ve discovered another problem. This is less “totally broken” and more “badly designed.”

To refresh your memory, here’s the API in question:

object? GetEntity(Uri absoluteUri,
                  string? role,
                  Type? ofObjectToReturn);

My previous comments were about the input parameters, but I think the return type is an error as well.

Consider the following document:

<!DOCTYPE book "https://xmlresolver.org/ns/sample/sample.dtd">
<book xmlns="https://xmlresolver.com/ns/sample">
<title>Testing</title>
<article>
<p>This is a test.</p>
</article>
</book>

The DTD URI will be passed to GetEntity. Suppose that it resolves that DTD locally to file:///usr/local/XML/sample/sample.dtd. So far, so good. But all it can return is the open stream for that local resource. Nothing in that stream provides the actual base URI of the resource.

Compare this to the SAX API:

InputSource resolveEntity(String name,
                          String publicId,
                          String baseURI,
                          String systemId)

The InputSource API isn’t my favorite, but it does provide a way for the resolver to tell the parser where the resource is actually coming from.

Why does this matter?

The sample.dtd include a parameter entity reference:

<!ENTITY % blocks PUBLIC "-//Sample//DTD Simple Blocks 1.0//EN"
                         "blocks.dtd">
%blocks;

In the SAX case, the parser will have been told that the base URI of the DTD was file:///usr/local/XML/sample/sample.dtd. It will use that location to compute the absolute URI of blocks.dtd: file:///usr/local/XML/sample/blocks.dtd. That means that the resolver doesn’t need to know anything about the resources contained within a DTD (or schema file, or stylesheet) if they’re available locally at the appropriate relative locations.

With the .NET API, the parser thinks it loaded that DTD from the HTTP URI, so the system identifier that it will pass to GetEntity will be: https://xmlresolver.org/ns/sample/blocks.dtd. That means your resolver has to know about every single resource referenced in the DTD (or schema, or stylesheet) And anytime any of those relative references are changed, the resolver’s knowledge will also have to be updated.

It’s not an insurmountable problem, it’s just tedious and error prone. If there’s ever will to update the System.Xml.XmlResolver API, it’d be nice to see this fixed as well.