> On Oct 6, 2015, at 6:04 , Philippe Verdy <verd...@wanadoo.fr> wrote: > > In those conditions, normalizing the Java string will leave those lone > surrogates (and non-characters) as is, or will throw an exception, depending > on the API used. Java strings do not have any implied encoding (their "char" > members are also unrestricted 16-bit code units, they have some basic > properties but only in BMP, defined in the builtin Character class API: > properties for non-BMP characters require using a library to provide them, > such as ICU4J).
The Java Character class was enhanced in J2SE 5.0 to support supplementary characters. The String class was specified to be based on UTF-16, and string processing throughout the platform was updated to support supplementary characters based on UTF-16. These changes have been available to the public since 2004. For a summary, see http://www.oracle.com/technetwork/articles/java/supplementary-142654.html Norbert