Unicode in Java
Issue 46;
18 Nov 2023
,
TIL: Java 11 supports Unicode 10.0 with two extensions. With. Two. Extensions.
In principle, you can work out what version of Unicode an Invisible
XML processor supports by testing the character categories that are
matched by a carefully chosen set of characters. (If U+0560 is in the
category Ll
, for example, you’re using at least Unicode 11.0.)
Except you can’t rely on Java 11’s answers. Java 11 will tell you that
U+32FF is in the So
category even though that character wasn’t introduced until
Unicode 12.1.
Leveraging Java regular expression matches against character categories is probably not the most efficiently implementation strategy anway.