A better API for resolving resources?
With an eye towards correcting an oversight in the XML Resolver APIs, opinions on a new design most humbly solicited.
A while back, I added an API to the XML Resolver for resolving URIs
with a given nature and purpose. This is largely the design pioneered
by the RDDL project years ago. The API is the NamespaceResolver
so
named because the use envisioned by RDDL was that you’d request a
resource for a given namespace URI with a given nature and purpose:
- “For the DocBook namespace, given me the document with the RELAX NG nature for the purpose of validation.”
- “For the XHTML namespace, give me the document with the XSLT nature for the purpose of transformation.”
But, in fact, the ability to specify the nature and purpose of the resource you want is handy (especially the nature). For example, if you’re trying to resolve a URI for the unparsed-text function, it’s useful to be able to say that you’re looking for a plain text document.
Unfortunately, if you want to use the nature and purpose in the
request, the only API available is the NamespaceResolver
and that
API is fundamentally flawed in the following way. It defines this method:
public Source resolveNamespace(String uri,
String nature,
String purpose)
throws TransformerException;
But the return type Source
is only appropriate for XML resources. Returning a
plain text resource through that API is, at best, an abuse of the API.
And, in fact, not unreasonably, the Saxon unparsed text resolver isn’t
prepared to deal with a Source
reply so it’s kind of a mess. Mea
culpa.
On top of that, I’ve been asked a couple of times if the resolver could return more detail about the response. In the course of processing the request, for example, the resolver may have done an HTTP request and now be in possession of a status code, headers, and other details.
In fact, the resolver is in possession of these things, they just can’t be returned through any of the existing APIs. If you squint really hard at the return types (i.e., if you cast them) you can find some of these details.
My proposal is to create a new API, the GenericResourceResolver
for
lack of a better name (proposals for better names most welcome):
public GenericResourceResponse resolve(GenericResourceRequest request);
Where a GenericResourceRequest
is:
/**
* A generic URI resource request.
* <p>The resource request must identify the URI requested. It may
* also identify a number of other features (base URI, nature,
* purpose, etc.). The nature is either {@link
* ResolverConstants#DTD_NATURE} or {@link
* ResolverConstants#EXTERNAL_ENTITY_NATURE}, the resource requested
* is assumed to be an XML external identifier and is resolved
* accordingly. Otherwise, it's assumed to be any other kind of
* resource, identified by a URI.</p>
*/
public abstract class GenericResourceRequest {
/**
* The URI of the request
* @return the URI
*/
public abstract String getUri();
/**
* The base URI of the request
* @return the base URI, or null if none was provided
*/
public abstract String getBaseURI();
/**
* The absolute URI of the request.
* <p>If the request URI is absolute, returns that URI. If the
* request URI is not absolute and a base URI has been provided,
* returns the uri resolved against the base URI.</p>
* @return the resolved URI, or null if no absolute URI can be
* constructed
* @throws URISyntaxException if the uri or base URI are
* syntactically invalid URIs
*/
public abstract URI getAbsoluteURI() throws URISyntaxException;
/**
* The nature of the request.
* @return the nature, or null if none was provided
*/
public abstract String getNature();
/**
* The purpose of the request
* @return the purpose, or null if none was provided
*/
public abstract String getPurpose();
/**
* The system identifier of the request (this is the same as the URI)
* <p>For historical reasons, there's a distinction between the
* system identifier of an external entity in XML and a URI in a
* more generla context (like an XML Schema or a Query module).
* This API conflates them. For the purpose of resolving a
* request, the URI and the system identifer are the same.</p>
* @return the URI
*/
public abstract String getSystemId();
/**
* The public identifier of the request
* @return the public identifier, or null if no public identifier
* was provided
*/
public abstract String getPublicId();
/**
* The entity name.
* <p>For external parsed entities (and doctypes), it's possible
* to provide the name of the entity.</p>
* @return the entity name, or null if no name was provided
*/
public abstract String getEntityName();
/**
* The requested encoding.
* <p>For some resource types, it may be logical to request a
* particular encoding. The response is not gauranteed to be in
* this encoding, regardless of the request.</p>
* @return the requested encoding, or null if no encoding was provided
*/
public abstract String getEncoding();
}
The generic response includes all of the details discovered in the course of resolving the request (and it’s an abstract class so more can be added in the future):
/**
* A generic URI resource response.
* <p>This class provides as much detail as possible about the
* resolved resource.</p>
*/
public abstract class GenericResourceResponse {
/**
* The original request.
* <p>The original request is returned as a convenience.</p>
* @return the original request
*/
public abstract GenericResourceRequest getRequest();
/**
* Did resolution succeed?
* <p>If resolution did not succeed, there's no reliable detail
* in the response except the request.</p>
* @return true if the resolver found (or retrieved) a resource
* to satisfy the request
*/
public abstract boolean isResolved();
/**
* The resolved URI.
* <p>This is the absolute URI of the resolved request. This is
* often, but not always the absolute URI of the request.
* However, when a catalog maps the request to a local file,
* that file may be returned as the resolved URI.</p>
* <p>Whether or not {@code jar} or {@code classpath} URIs are
* returned depends on the setting of resolver features.</p>
* @return the resolved URI
*/
public abstract URI getResolvedURI();
/**
* The local URI, if there is one
* <p>If the resolver finds the resource locally, be it a file
* identified by a catalog, or a cached resource, or a {@code jar}
* or {@code classpath} URI, that URI is the local URI.</p>
* @return the local URI
*/
public abstract URI getLocalURI();
/**
* The response headers, if applicable
* <p>If resolution followed a protocol (for example https:) that
* includes headers, those headers are available here.</p>
* @return the headers, or an empty map if no headers are
* available
*/
public abstract Map<String, List<String>> getHeaders();
/**
* Return the (first) header with the specified name.
* <p>This is a convenience method partly because header names are
* case-insensitive. Returns only the first value, or null if
* there are no values.</p>
* @param name the name of the header whose value should be returned.
* @return the value of the first header with the specified name
* @throws NullPointerException if name is null
*/
public abstract String getHeader(String name);
/**
* The input stream.
* <p>The resource can be read from this stream.</p>
* @return the input stream
*/
public abstract InputStream getInputStream();
/**
* The resource content type, if applicable.
* <p>If resolution followed a protocol (for example https:) that
* includes a content type, or if a content type was inferred from
* the filename (as is common on many systems when file: URIs are
* accessed), it is available here.
* @return the content type, or null if no content type was available
*/
public abstract String getContentType();
/**
* The resource encoding.
* <p>If resolution identified a specific encoding, that encoding
* is available here. An implementation might provide a way to
* establish the encoding explicitly, or the encoding might be
* read from the content-type header if that's available.</p>
* @return
*/
public abstract String getEncoding();
/**
* The request status code.
* <p>If resolution followed a protocol (for example https:) that
* includes a status code, that code is provided here. It's common
* to set this code to 200 for consistency with HTTP even when the
* protocol doesn't provide a status. Generally speaking, the
* resolver will follow redirects, so this reflects the final
* status code.</p>
* @return the status code, or -1 if no code is available
*/
public abstract int getStatusCode();
}
It’s easy to see how the existing resolver APIs can be recast in terms of this more generic API. In my initial experiments, this has removed some duplicated code and cleaned up a few things. And all the tests still pass, so it doesn’t appear to be fundamentally broken.
Of course, existing applications will still call the resource and URI resolvers as they do now. But if you’re writing a new resolver, an unparsed text URI resolver, for example, you can use the new API and get back all of the details available about the resolved resource. New applications won’t have to try to shoehorn the request through an existing API that may not exactly suit their needs.
I think this API can be added to the resolver without introducing breaking changes. But that remains to be proven. And updating the C# API also remains to be done.
Comments most welcome.