Chris, Even with C3 (cocoon 3.0 beta) unless you specify optional encoding in your Serializer config, you fallback to default UTF-8:
org.apache.cocoon.optional.servlet.components.sax.serializers.util public class ConfigurationUtils { private ConfigurationUtils() { } public static String getEncoding(Map<String, ? extends Object> configuration) { String encoding = (String) configuration.get("encoding"); if (encoding == null || "".equals(encoding)) { encoding = "UTF-8"; } return encoding; } ... Greetings, Greg 2017-06-08 20:11 GMT+02:00 gelo1234 <gelo1...@gmail.com>: > > It depends on what type of Serializer you use and what kind of Serlializer > config you put into your sitemap? > > By default XMLSerializer/HTMLSerializer uses UTF-8 encoding. So instead of > 1 UTF-16 char you got 2 chars UTF-8 encoded. > Of cource there might be also issue with emoji charset, but I would first > try to change encoding in Serliazer config (to be UTF-16). > > Greetings, > -Greg > > 2017-06-07 10:43 GMT+02:00 Flynn, Peter <pfl...@ucc.ie>: > >> I had a related problem with 3–4 CJK characters being converted to their >> &#hex; format. Very weird, but it turned out to be the old and buggy copy >> of jtidy, and I can't figure out how to replace it. >> >> I haven't had the problem you describe, though, and I have a user who has >> implemented emoji in Cocoon, see http://research.ucc.ie/emojis/ >> >> P >> >> -- >> Peter Flynn | Academic and Collaborative Technologies | IT Services | >> University College Cork | Ireland | pfl...@ucc.ie | >> http://research.ucc.ie/profiles/H505/pflynn | Sent from Hiri >> <https://www.hiri.com/> >> >> >> On 2017-06-06 17:08:51+01:00 Christopher Schultz wrote: >> >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA256 >> >> All, >> >> I've been testing my application for use with high Unicode code points >> such as emoji like 😍 which is this >> one:http://www.fileformat.info/info/unicode/char/1F60D/index.htm >> >> My application and database can handle this code point, but Cocoon >> butchers it in a way that I have seen before -- the way that >> commons-lang's StringEscapeUtils.escapeXml/escapeHtml seems to do. >> >> Instead of letting the character through as-is, it tries to convert it >> into these two numbered entities: >> >> �� >> >> Oddly enough, those are the two double-byte UTF-16 characters you'd >> get, but they shouldn't be split-up like that, I don't think. >> >> I haven't found a version of commons-lang 2.x that doesn't break these >> kinds of characters. commons-lang3 does the right thing, but they are >> incompatible libraries. >> >> Does anyone know the code well enough to know how difficult it would >> be to change the way Cocoon 2.1 escapes its output? For example, by >> using commons-lang3? >> >> I haven't tried Cocoon 2.2, yet, and I can't tell what dependencies it >> has. I also can't exactly tell what to do now that I've downloaded the >> binary package. Can this just be used as a drop-in replacement for >> Cocoon 2.1.x? Cocoon 2.1.x could build a WAR file that I then >> customized for my own application, adding various libraries and >> configuration files to it. I think I'll follow-up with a separate post >> about this. >> >> - -chris >> >> -----BEGIN PGP SIGNATURE----- >> Comment: GPGTools - http://gpgtools.org >> >> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ >> >> iQIcBAEBCAAGBQJZNtOBAAoJEBzwKT+lPKRYEuIP/3gSJZDNEbzsHkI5zYjMZbFf >> vKvRRnBSl+6IdrcUasftf+AkXIIYwj6xnUQ7winsLW/n8TdDG6jPqsg4Khsozc6z >> aa23qDly62gmCsqpLohXxt/ZNKdPY4sOTghaaEUFTtTgpeD3M/INF90myT8SwO4K >> WUtqVparSqp/Zf9JMm3OCIguMKbsRNYWVIQuiJxDQJkWYwrw0iVk2v8mc6iz/mDF >> w6np4EvFr9fqdDufKpPw8anEkrp5JEuTx47vMOtz4sixVr2C6ehgP4zs3kVzdVid >> QPeUsrosV1tsRC9bMVLGmjo7UhNseeXCp/AceIT6AQE8Q1clgy9GcoNMf60dgGku >> et0xoGptYgbCfmJL+PuA9y7fJYjgTTQheqzuC721n2/sx+kyBSBWSMIhqia2sd4y >> spcT4kw+uChsWjwoeGOHOm4IimrVgXkfJeHVSXV4m66sHS9t+bDiiErwS1SikvSV >> qF64/L0u8hYFLD1ehURoHBi4foE1Td3eRGOGHgodcYL9C8U+Yv+fWaiYQ5O4CCnW >> pToFvVoQOdZY+VVC8hz1ggbRMSxjT2GQLLJ2mjbGzGUJjlwyQaoZnADSSu0efj88 >> O2AlWB2Bf/Ag6E4C9jEjj+cauBfR+1NIK7F1Jo6C02yY1SUOSoOAFDZ7EkO4qYAO >> YhvgSQXNmKps6rusNjNZ >> =q8Eh >> -----END PGP SIGNATURE----- >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org >> For additional commands, e-mail: users-h...@cocoon.apache.org >> >> >