GetEntity
The .NET System.Xml.XmlResolver API’s GetEntity method returns the resolved entity as an unadorned Stream. That is an unfortunate design choice.
I previously observed that I think the .NET System.Xml.XmlResolver
interface is fundamentally broken. I’ve discovered another problem.
This is less “totally broken” and more “badly designed.”
To refresh your memory, here’s the API in question:
object? GetEntity(Uri absoluteUri,
string? role,
Type? ofObjectToReturn);
My previous comments were about the input parameters, but I think the return type is an error as well.
Consider the following document:
<!DOCTYPE book "https://xmlresolver.org/ns/sample/sample.dtd">
<book xmlns="https://xmlresolver.com/ns/sample">
<title>Testing</title>
<article>
<p>This is a test.</p>
</article>
</book>
The DTD URI will be passed to GetEntity
. Suppose that it resolves that
DTD locally to file:///usr/local/XML/sample/sample.dtd
. So far, so
good. But all it can return is the open stream for that local
resource. Nothing in that stream provides the actual base URI of the
resource.
Compare this to the SAX API:
InputSource resolveEntity(String name,
String publicId,
String baseURI,
String systemId)
The InputSource
API isn’t my favorite, but it does provide a way for
the resolver to tell the parser where the resource is actually coming
from.
Why does this matter?
The sample.dtd
include a parameter entity reference:
<!ENTITY % blocks PUBLIC "-//Sample//DTD Simple Blocks 1.0//EN"
"blocks.dtd">
%blocks;
In the SAX case, the parser will have been told that the base URI of
the DTD was file:///usr/local/XML/sample/sample.dtd
. It will use that
location to compute the absolute URI of blocks.dtd
:
file:///usr/local/XML/sample/blocks.dtd
. That means that the
resolver doesn’t need to know anything about the resources contained
within a DTD (or schema file, or stylesheet) if they’re available
locally at the appropriate relative locations.
With the .NET API, the parser thinks it loaded that DTD from the HTTP
URI, so the system identifier that it will pass to GetEntity
will be:
https://xmlresolver.org/ns/sample/blocks.dtd
. That means your resolver
has to know about every single resource referenced in the DTD (or
schema, or stylesheet) And anytime any of those relative
references are changed, the resolver’s knowledge will also have to be
updated.
It’s not an insurmountable problem, it’s just tedious and error prone.
If there’s ever will to update the System.Xml.XmlResolver
API, it’d be
nice to see this fixed as well.