We have a route which calls a SOAP web service. The return message contains
UTF-8 encoded content. For some reason this results in the following
exception. I wonder what we're doing wrong?

2014-01-01 15:13:01,375 | INFO  | ler-ura_Worker-1 | JobRunShell                
     
| 216 - org.apache.servicemix.bundles.quartz - 1.8.6.1 | Job
DEFAULT.quartz-endpoint82 threw a JobExecutionException: 
org.quartz.JobExecutionException: java.io.IOException:
javax.xml.bind.UnmarshalException
 - with linked exception:
[com.ctc.wstx.exc.WstxIOException: Invalid UTF-8 middle byte 0x3c (at char
#408, byte #127)] [See nested exception: java.io.IOException:
javax.xml.bind.UnmarshalException
 - with linked exception:
[com.ctc.wstx.exc.WstxIOException: Invalid UTF-8 middle byte 0x3c (at char
#408, byte #127)]]
        at
org.apache.camel.component.quartz.QuartzEndpoint.onJobExecute(QuartzEndpoint.java:117)[218:org.apache.camel.camel-quartz:2.10.6]
        at
org.apache.camel.component.quartz.CamelJob.execute(CamelJob.java:61)[218:org.apache.camel.camel-quartz:2.10.6]
        at
org.quartz.core.JobRunShell.run(JobRunShell.java:223)[216:org.apache.servicemix.bundles.quartz:1.8.6.1]
        at
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)[216:org.apache.servicemix.bundles.quartz:1.8.6.1]
Caused by: java.io.IOException: javax.xml.bind.UnmarshalException
 - with linked exception:
[com.ctc.wstx.exc.WstxIOException: Invalid UTF-8 middle byte 0x3c (at char
#408, byte #127)]
        at
org.apache.camel.converter.jaxb.JaxbDataFormat.unmarshal(JaxbDataFormat.java:153)[222:org.apache.camel.camel-jaxb:2.10.6]
        at
org.apache.camel.dataformat.soap.SoapJaxbDataFormat.unmarshal(SoapJaxbDataFormat.java:275)[241:org.apache.camel.camel-soap:2.10.6]
        at
org.apache.camel.processor.UnmarshalProcessor.process(UnmarshalProcessor.java:57)[100:org.apache.camel.camel-core:2.10.6]
        at
org.apache.camel.util.AsyncProcessorConverterHelper$ProcessorToAsyncProcessorBridge.process(AsyncProcessorConverterHelper.java:61)[100:org.apache.camel.camel-core:2.10.6]

The relevant parts of our route look like this:

FooRoutes.java:
        from("direct:foo1").routeId("foo1").
                errorHandler(defaultErrorHandler().
                    maximumRedeliveries(3).
                    redeliveryDelay(100).
                    retryAttemptedLogLevel(WARN)).
                to(uraService).
                log(INFO, "1: before unmarshal: ${body}").
                unmarshal(soapJaxbDataFormat).
                log(INFO, "2: ${body}");

foo-camel-context.xml:
    <bean id="soapJaxbDataFormat"
class="org.apache.camel.model.dataformat.SoapJaxbDataFormat">  
        <property name="contextPath" value="fi.ourdomain.xsd._1"/>
    </bean> 

    <camelcxf:cxfEndpoint id="uraService"
                          address="${ura.cxf}"
                          serviceClass="fi.ourdomain._1_0.UraPort">
        <camelcxf:properties>
            <entry key="dataFormat" value="MESSAGE"/>
        </camelcxf:properties>
    </camelcxf:cxfEndpoint>

So - we are calling SOAP WS "uraService" which we have JAX-WS generated
interface "UraPort" to. Then we try to unmarshal this XML message into JAXB
beans. This works fine when content does not have special characters. But
when I set the content to contain for example "Ä" (U+00C4       , c3 84, LATIN
CAPITAL LETTER A WITH DIAERESIS) this breaks with the previous exception.

When I test the we service directly the character seems to be fine. When I
inspect the response from WS with hex editor, I see the character Ä
represented with bytes 0xc3 and 0x83 as I think it should be. "Ä" is the
last letter of the element and is followed by "<" (byte 0x3c). The
exceptions looks like it would hint that the unmashalling thinks that the
letter started with 0xc3 and 0x83 does not end there but continues with
0x3c, which is wrong. Or something like that...

The problematic part of the message:

         <HankkeenKuvaus>kuvaus jossa viimeinen merkki on
skandiÄ</HankkeenKuvaus>

This is OK when I access the WS directy with SOAPUI. When my route logs the
first log message, the problematic character looks garbled in the log:

2014-01-01 15:13:01,062 | INFO  | ault-workqueue-1 | foo1                  
| 100 - org.apache.camel.camel-core - 2.10.6 | !!!! ennen unmarshallia:
<soap:Envelope
xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/";><soap:Body><FooAResponse
xmlns="http://ourdomain/xsd/1.0";><FooB><FooC>....<HankkeenKuvaus>kuvaus
jossa viimeinen merkki on skandi�</HankkeenKuvaus>

and the second log step is never reached. I dont know how camel & servimix
logging works encoding-wise, this may or may not be a sign that things went
wrong already when calling the service, not when trying to unmarshal it.

I am using ServiceMix 4.5.2 and the bundled Camel 2.10.6.



--
View this message in context: 
http://camel.465427.n5.nabble.com/Trying-to-consume-SOAP-WS-with-UTF-8-content-getting-Invalid-UTF-8-middle-byte-0x3c-tp5745394.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Reply via email to