so

Unicode in Java

Volume 7, Issue 46; 18 Nov 2023

TIL: Java 11 supports Unicode 10.0 with two extensions. With. Two. Extensions.

In principle, you can work out what version of Unicode an Invisible XML processor supports by testing the character categories that are matched by a carefully chosen set of characters. (If U+0560 is in the category Ll, for example, you’re using at least Unicode 11.0.)

Except you can’t rely on Java 11’s answers. Java 11 will tell you that U+32FF is in the So category even though that character wasn’t introduced until Unicode 12.1.

Leveraging Java regular expression matches against character categories is probably not the most efficiently implementation strategy anway.

#Invisible XML #Java #TIL #Unicode