XML Unicode update

Volume 3, Issue 33; 25 Nov 2019

Some small improvements to my Emacs package for entering Unicode characters in Emacs.

A few days ago, Dave asked me a question about XML and Unicode and Emacs. The precise question isn’t really relevant, it was simply the impetus to go off and look at XMLUnicode again.

Why bother, you may ask, now that Emacs has Unicode support in insert-char? It’s a fair question. Certainly, this code is a lot less significant than it once was. I still think it beats out the built in support on two grounds:

  1. If the character isn’t displayable, my code will insert a numeric character reference instead. Ok, that’s only interesting in a markup context, but that’s like 90% of my life, so it’s still interesting to me.
  2. If you’re an old XML hack and still remember that → is the ISO entity name for , you can search for that or, with a little configuration, just literally type “&” “r” “a” “r” “r” “;” and the right thing will happen.

If that seems compelling, here’s what I improved:

  1. I’ve updated the list of characters from the Unicode 3.1 list to the 12.1.0 list. I’ve also added a script so you can build the data for any version you’d like.
  2. I figured out how to test if a character can be displayed or not. That allowed me to get rid of the “list of undisplayable characters”.
  3. I tweaked the utility function xmlunicode-character-list so that you can specify a range if you want. (It still defaults to the BMP.)
  4. I separated the Helm-related functions into a separate file. They need to (require 'helm) and I didn’t want to add helm as a dependency for everyone.

Along the way, I made what felt like about a dozen false starts, errors, and broken releases. Because the smallest, simplest changes are always the hardest.