Hi Chris, I suppose you cannot use 2 different encodings in 1 Serializer, so if you changed your Serializer config to be UTF16, you also have to use _external_ UTF16 encoded CSS styles. Of couse you can define many different Serializer configs per each pipeline.
By default common-lang/cocoon uses 2-byte char sequence as encoding base. If you had UTF-8 and 32 bits, you would have 4 chars (each 8 bits), encoded as 1 PAIR 2-bytes sequence. if you switched to UTF-16, you would have 2 chars (each 16 bits), encoded as 1 SINGLE 4-bytes sequence. Greetings, Greg 2017-06-20 22:14 GMT+02:00 Christopher Schultz <ch...@christopherschultz.net >: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > Greg, > > On 6/20/17 4:11 PM, Christopher Schultz wrote: > > Greg, > > > > On 6/8/17 2:17 PM, gelo1234 wrote: > >> Chris, > > > >> Even with C3 (cocoon 3.0 beta) unless you specify optional > >> encoding in your Serializer config, you fallback to default > >> UTF-8: > > > >> org.apache.cocoon.optional.servlet.components.sax.serializers.util > > > >> public class ConfigurationUtils { > > > >> private ConfigurationUtils() { } > > > >> public static String getEncoding(Map<String, ? extends Object> > >> configuration) { String encoding = (String) > >> configuration.get("encoding"); > > > >> if (encoding == null || "".equals(encoding)) { encoding = > >> "UTF-8"; } > > > >> return encoding; } ... > > > > I would have expected the Unicode codepoint to be converted into a > > single 4-byte UTF-8 byte without any &-encoding at all. It looks > > like what I got was a pair of 2-byte characters with &-encoding. > > > > I'll try UTF-16 but my expectation is that it's going to get > > worse, not better. > > Interestingly enough, my emojis are now showing (which I don't totally > understand why!) but it looks like my CSS aren't being loaded. That's > a separate problem I'll have to figure out for myself. > > In my own application, switching from commons-lang to commans-lang3 > HTML/XML escaping allowed me to use these 4-byte emojis and UTF-8 > together. I'm surprised that Cocoon can't do the same thing. (I think > it comes down to exactly how the character-escaper makes its decisions). > > Thanks, > - -chris > -----BEGIN PGP SIGNATURE----- > Comment: GPGTools - http://gpgtools.org > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAllJgiwACgkQHPApP6U8 > pFgJkRAAqiXn7DWNDN41m1V98aI5xWjTuoka0tKcadN1IUGemTZwipaXHtYQcois > 6yuI3st31ZuanghIpRPcBu9pZzuHtOSBVSHZSIhDGqPwYgczScQ2LgnfMi6zwAdd > j2LFlSWtKGjgCczV5Ok56PyMq1BEAOVw96vmF5xfXmpLAyNA/PvLKsncoW4pN+ES > 1MQMm1aPwbmEpWz7ykReUzfauwBtL4rEX1wO3pl88m9Wq3x174AKHWs/a+4Z1Hdq > 0CnxfrdTK50p7Ng+ECfnPwx8y1Em64lA7KKMuz2jTd0PnxlpZTAgO6lq8S7BdSeY > H1lwBJojVT/+m2w8b9OC/XoyiAyiC/zIswQ3TSMA3ZC2SnCxxAXMTsmT49Ql+lyq > 01JRCIVMitKeoKI4I4066oaBW91FpSSpZXX14XCHrMBtKnIJI+NxBnI++eQq8wdi > ZdX3GzLF2zaPHvZMSz4DRskR1xKGLsAxZAukINW3AGrEAZ/GwbPd76ml3YJam5Yy > R31u0kcRJl4z79pd1n46yxB66V10Rn5IkSMQ8R7uK/ht9wLi5T8bkeAoLjZFFoyq > awmfQTbJzquXAtwjX99WKWEzviN2ph+P0h2rBInHnos5ud8IlLjcS7FmdxQ4DNOw > Nirmj7cikxcr2Fn22pGQh6o3/Eph0lMf1d1HjUZ1C7SchEgsqrk= > =0nTd > -----END PGP SIGNATURE----- > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org > For additional commands, e-mail: users-h...@cocoon.apache.org > >