Making a Java SafeString that works with all unicode characters
In Java, there is an issue with String when using characters that take up more than 2 bytes in UTF-16. substring()
and similar methods can split the character in the middle. I was thinking switching Strings to UTF-8 might be good, and there are currently two JEPs for Java 9 somewhat related to this. 226: UTF-8 Property Files and 254: Compact Strings. But thinking about this a little more, I don’t necessarily want a UTF-8 String class, but a String class that works with all unicode characters. Here’s how I did it.