Volume 5, Issue 14; 06 Jul 2021

This .NET API is fundamentally broken. If anyone reading this thinks it might be possible to persuade someone to fix it, I’d be happy to help.

I wasn’t going to write this post. It feels like I’d be too easy to read it as a snarky post where I make fun of Microsoft for a bad API design. Like I haven’t made some bad API design choices in my day. (cough javax.xml.namespace cough)

But, in fact, it’s a .NET Core API that I think is completely broken and exposes a very small security hole, so I’m going to lay out what I think is wrong in the hopes that someone out there might be able to take steps towards getting it fixed. At the very least, I can point out where the small security issue resides.

A little background, familiar to most of you I expect, but relevant to the problem in this API: XML external entities (for example DTDs, but also external general parsed entities and parameter entities included in them, and in the internal subset when that’s relevant) are declared with an external identifier that consists of a system identifier (a URI in XML) and an optional public identifier.

For example:

<!DOCTYPE book PUBLIC "-//Sample//DTD Simple 1.0//EN"

Public identifiers predate URIs and provided a means for entities to have globally unique names even when the local system identifiers weren’t. For example:

<!DOCTYPE book PUBLIC "-//Sample//DTD Simple 1.0//EN"

That system identifier probably doesn’t resolve on your filesystem. Heck, it doesn’t even resolve on mine, but the presence of the public identifier provides a name for the resource that we can share. One of the reasons entity catalogs exist at all is in order to have a standard way to share these mappings.

XML parsers usually provide some callback mechanism to plug in your own entity resolver. Here’s the SAX EntityResolver2 interface:

InputSource resolveEntity(String name,
                          String publicId,
                          String baseURI,
                          String systemId)

If you’re parsing the XML document we started with using a SAX parser and its resolver interface, the parser will call resolveEntity to get the DTD:

resolveEntity("book", "-//Sample//DTD Simple 1.0//EN",

Your resolver can use any combination of the system and public identifiers and even the entity name and base URI to decide what to return. The parser will use whatever is returned as the book DTD.

Here’s the System.Xml.XmlResolver API:

object? GetEntity(Uri absoluteUri,
                  string? role,
                  Type? ofObjectToReturn);

We can ignore role and ofObjectToReturn (they’re always null and System.IO.Stream respectively, as far as I can tell). The obvious deficiency in this API is that it only passes in the absolute system identifier.

If that was the only issue, I’d probably frown a bit, but I’d get over it. Public identifiers are much less important in XML and if you’re using a resolver (which is, after all, why I’m looking at this API in the first place!), you can always put the full, public URI in the system identifier and let the resolver give you the local resource.

Unfortunately, that’s not the only issue. When the standard System.Xml.XmlReader parser class uses the resolver, it treats the public identifier as a relative URI, makes it absolute with respect to the current base URI, and passes it in as the absolute URI.

I am not even kidding.

If you’re parsing the XML document we started with using a System.Xml.XmlReader parser and its resolver interface, the parser will call getEntity to get the DTD.In my XML Catalog resolver, I tried to be clever and detect this case so that I could support public ID resolution. Unfortunately, only the public identifier is available and since the system identifier is not optional in XML, it all falls over anyway even if the resolution succeeds. It would be possible to twist the API further to make it work, but I’m not sure it’s a good idea anyway. First it will call:

resolveEntity("file:///home/ndw/-/Sample/DTD Simple 1.0/EN",
              null, typeof(Stream))

If that returns null, the parser will try again with:

              null, typeof(Stream))

From a security perspective, it would never have occurred to me that a maliciously formatted public identifier could cause a direct filesystem access. I expect that’s a teeny, tiny security problem in the grand scheme of things, but it’s still there.

What I’d really like to see in the .NET platform is a resolver interface more like the entityResolver2 interface. But even if that’s not possible, the underlying parser should stop treating the public identifier like a relative URI!