Namespaces in JSON?

Volume 1, Issue 10; 24 May 2017

Published by Norman Walsh

More than 140 characters about property names in JSON.

The first step towards wisdom is calling things by their right names.

In the last couple of days (relative to 24 May 2017), I’ve been entangled in a conversation on Twitter with @asbjornu, @danbri, @hsivonen, @robinberjon, @RubenVerborgh, and perhaps others, about JSON, JSON-LD, XML, and XML namespaces. I haven’t been able to decompose my thoughts into 140 character snippets, so I’m writing this instead. It’s maybe a bit rambly.

I’m trying to avoid politics and personalities. I’m only interested in the technical issues.

The problem · The heart of the question, I think, is “does it matter that JSON property names have no global identity?” Consider this document:

{
  "name": "Norman Walsh",
  "uri": "https://nwalsh.com/"
}

The property “name” isn’t grounded anywhere. It has no self-evident relationship to any other property or any declared reference to what “name” means. The same is true of “uri”.

For reference, this is in contrast to this document:

<personname xmlns="http://docbook.org/ns/docbook"
>Norman Walsh</personname>

Here “personname” is explicitly the idea of personal name defined in DocBook.

Does this matter? · Well, surely “that depends”. Countless JavaScript objects (and hashes in other programming languages) are created every moment of every day. They exist for some period of time, and then they go away. The overwhelming majority of these objects exist only in the context of a specific application; there would be no benefit to establishing any sort of grounding for the names in these objects. They’re never observable outside the application that uses them.

But some subset of JSON documents do become visible: they’re returned from service APIs, they’re stored in databases or otherwise cached and accessed by multiple applications, etc.

It’s at least conceivable that (some of) those documents would benefit from having property names that are grounded in some way.

Declaration of bias · I like XML and I like XML Namespaces. I advocate always putting every XML document in a namespace. In the common case, documents only have one namespace and having an explicit one is no more expensive than not having one.

On the occasion where you do want to combine documents, the fact that you can distinguish different elements (or correlate elements that are the same), is very useful.

Hypothetical scenario · Suppose you’re ingesting data from a few different web services. It’s possible that you want to combine these results and correlate across them. If you get multiple objects with a “name” property, are they the same?

Ad hoc is OK! · One position on the Twitter thread, as I understand it, is that no general solution to the grounding problem is necessary or desirable. Human beings can look at the data, search the web, work out what the properties are, and work out how to combine the data within their application.

That’s undeniably true, but I don’t personally find it very satisfying.

Use JSON-LD · Another position on the thread, again as I understand it, is that JSON-LD should be used.

{
  "@context": {
    "name": "http://schema.org/name",
    "uri": {
      "@id": "http://xmlns.com/foaf/0.1/homepage",
      "@type": "@id"
    }
  },
  "name": "Norman Walsh",
  "uri": "https://nwalsh.com/"
}

If you have tools that understand JSON-LD, this is likely more appealing than ad hoc solutions. In deference to the the folks who prefer ad hoc solutions, I do find that JSON-LD does more violence to my data than I would like. (Though perhaps making @context a URI instead of a literal would help; I confess, I’m not very familiar with JSON-LD.)

Real world scenario · My team is building a big app that stores a lot of JSON data in MarkLogic. Most of the data that’s being ingested is not entirely under my team’s control. We need to annotate the incoming data with some additional properties. With namespaces, that would be easy.

Without them, I have to do something ad hoc. I could, for example, nest the data into another object where I put my properties:

{
  "id": "1234",
  "data": {
            "name": "Norman Walsh",
            "uri": "https://nwalsh.com/"
          }
}

Or I could just invent a property name and hope:

{
  "myapp_id": "1234",
  "name": "Norman Walsh",
  "uri": "https://nwalsh.com/"
}

What I really want here is namespaces to avoid property name collision. Near as I can tell, JSON-LD isn’t going to help me with that.

I don’t hold out much hope for any solution to this problem. There’s no syntactic space in JSON for providing better names. I suppose longer property names are a solution:

{
  "http://schema.org/name": "Norman Walsh",
  "http://xmlns.com/foaf/0.1/homepage": "https://nwalsh.com/"
}

But:

They complicate property access in JavaScript.
They’re more than a bit cumbersome.
Unless everyone does it, all I’m doing is picking “unlikely” names. They’re not something that’s…well, grounded.

I’m not advocating for any changes or solutions, just making a few observations. I fully expect to see JSON applications grow in complexity over time until everything in XML has been reinvented.

I still regret that the XML community wasn’t able to make XML more palatable to the browser vendors.