Hi,

I have a route like the following:

From(sjms2)
.unmarshal().jaxb("myjaxbpackage")

When I send an XML message with the following content

<?xml version="1.0" encoding="ISO-8859-1"?>
... Rest of XML content here ...

To the sjms2 endpoint, any Danish characters (e.g. ø) in the message get 
mangled. The Camel message body in this example is a String.

Looking at the unmarshal implementation, it looks like Camel forces messages to 
InputStream (seemingly with UTF-8 encoding by default) before passing them to 
the JAXB data format. See 
https://github.com/apache/camel/blob/3312243b32af03ac39c3af170e318f03e01d64f0/core/camel-support/src/main/java/org/apache/camel/support/processor/UnmarshalProcessor.java#L56

I can work around this by converting the message body to a Latin-1 InputStream 
before unmarshalling, or by setting the encoding property on the data format, 
but I'm wondering why Camel is implemented this way? For at least JAXB 
unmarshalling, there is no reason to serialize a String to InputStream before 
handing it off to JAXB, and it is less flexible than just passing the String to 
JAXB, as my code now needs to decide the input message's charset, which JAXB 
would otherwise handle for me.

In the current code, the serialization looks to be necessary because 
DataFormat.unmarshal takes an Exchange and an InputStream. Wouldn't it be more 
flexible to only pass the Exchange to the DataFormat, and leave the 
implementation free to check whether the message is already a format it can 
process before trying to serialize to bytes? For instance, the JAXB data format 
could check whether the input is a Reader or a String, and use the matching 
JAXB Unmarshaller methods.


Reply via email to