Hi - yes sorry I should have been more specific - the invalid characters I
mentioned are invalid for the Content-Type set by the response headers.
 We're told to expect UTF-8 yet these characters are from the Latin
character set.  Woodstox is justified in throwing an Exception so I really
just want to make sure we're extracting the illegal characters in the most
efficient way.

The easiest/best solution would be to fix the legacy service but we
unfortunately don't have that as an option at this point.. sadly I think
that because this is a content type issue,  serialize to string,
extract/replace and deserialize is probably our best option.

Kind Regards
Matthew


On 14 February 2012 18:42, Andreas Veithen <[email protected]>wrote:

> Axiom/Woodstox is of course able to handle special characters,
> provided that it gets the right information about the charset encoding
> of the message. To me this looks like the Content-Type header doesn't
> contain the correct charset encoding.
>
> Andreas
>
> On Tue, Feb 14, 2012 at 17:41, Hiranya Jayathilaka <[email protected]>
> wrote:
> > Hi Matthew,
> >
> > We use Axiom as the underlying XML infoset. AFAIK it usually works well
> > with special characters. Not sure why it cannot handle this pound sign.
> May
> > be Andreas, can shed some light on the matter? Actually in this case the
> > exception is thrown by the Woodstox parser which is at a layer lower than
> > Axiom. So this could be a Woodstox issue.
> >
> > However if the underlying XML parser cannot handle this payload, then I
> > don't think any of our built-in utils will be able to parse it without
> > throwing an error. So your best option is to serialize this into a string
> > buffer or a byte buffer and run the necessary replacement operations.
> > Anyway lets wait and see what others have to say.
> >
> > Thanks,
> > Hiranya
> >
> > On Tue, Feb 14, 2012 at 7:55 PM, Matthew Clark
> > <[email protected]>wrote:
> >
> >> Sure - the service I'm looking at right now is very simple - the input
> just
> >> looks like this:
> >>
> >> <oxxml version="1.0" xmlns="http://xyz.com/xmlapi/";>
> >>   <function>findOrderByReference</function>
> >>   <args>
> >>       <arg id="1">SomeRef123</arg>
> >>   </args>
> >> </oxxml>
> >>
> >> The response then looks like this (i've removed a large chunk of it":
> >>
> >> <oxxml version="1.0" xmlns="http://xyz.com/xmlapi/";>
> >>   <response function="findOrderByReference" uuid="4444-4444-4444-4444">
> >>        <matches count="1">
> >>            <order id="1234567">
> >>               <description>Some description including a £ (pound)
> >> sign</description>
> >>            </order>
> >>       </matches>
> >>   </response>
> >> </oxxml>
> >>
> >> The pound sign causes StAX to throw an exception.. so I'd like to
> replace
> >> it as follows:
> >>
> >> <oxxml version="1.0" xmlns="http://xyz.com/xmlapi/";>
> >>   <response function="findOrderByReference"
> txn-uuid="4444-4444-4444-4444">
> >>        <matches count="1">
> >>            <order id="1234567">
> >>               <description>Some description including a &#163;
> >> (ampersandhash163;)
> >> sign</description>
> >>            </order>
> >>       </matches>
> >>   </response>
> >> </oxxml>
> >>
> >>
> >> On 14 February 2012 13:16, Hiranya Jayathilaka <[email protected]>
> >> wrote:
> >>
> >> > On Tue, Feb 14, 2012 at 5:01 PM, Matthew Clark
> >> > <[email protected]>wrote:
> >> >
> >> > > Hi thanks for that - for some reason I had overlooked the message
> >> > > builders..
> >> > >
> >> > > I have a rudimentary version of this working now but given the
> various
> >> > > classes available (XMLStreamReader, StAXbuilder and so on), what
> would
> >> be
> >> > > the most efficient way to do the replacement?
> >> > >
> >> >
> >> > If the input byte stream contains invalid characters then I don't
> think
> >> you
> >> > can use any of the above classes to process your inputs.
> >> >
> >> >
> >> > >
> >> > > I have about 40 characters (such as the pound sign) that I would
> like
> >> to
> >> > > replace with entity references... For the first version, I simply
> >> > converted
> >> > > to a string used StringUtils.replaceEach() but this is obviously not
> >> > > ideal..
> >> > >
> >> >
> >> > Can you please share an input message and a preprocessed message for
> us
> >> to
> >> > get a better understanding of your requirement?
> >> >
> >> > Thanks,
> >> > Hiranya
> >> >
> >> >
> >> > >
> >> > >
> >> > > On 14 February 2012 04:32, Hiranya Jayathilaka <
> [email protected]>
> >> > > wrote:
> >> > >
> >> > > > Hi Mark,
> >> > > >
> >> > > > If you want to preprocess the responses then I'd recommend you to
> >> > write a
> >> > > > custom message builder. You can register the custom message
> builder
> >> in
> >> > > the
> >> > > > axis2.xml file against the content type of your responses. There
> you
> >> > will
> >> > > > be able to include any custom logic along with code for handling
> >> > invalid
> >> > > > characters in the payload.
> >> > > >
> >> > > > Here are some useful resources I found on the web:
> >> > > >
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> http://charithwiki.blogspot.com/2010/11/how-to-write-axis2-message-builder.html
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> http://wso2.org/library/articles/axis2-configuration-part2-learning-axis2-xml
> >> > > >
> >> > > > Thanks,
> >> > > > Hiranya
> >> > > >
> >> > > > On Tue, Feb 14, 2012 at 4:34 AM, Matthew Clark
> >> > > > <[email protected]>wrote:
> >> > > >
> >> > > > > Hi all, I'd really appreciate some help with this one... it's
> >> hurting
> >> > > my
> >> > > > > brain!
> >> > > > >
> >> > > > > We have a legacy service that I would like to include in some of
> >> our
> >> > > ESB
> >> > > > > operations.
> >> > > > > The legacy service uses XML for both request and response
> payloads
> >> > > making
> >> > > > > it a very easy integration.
> >> > > > >
> >> > > > > I've created a very simple proxy service (see below).
> >> > > > >
> >> > > > > The problem I am having is that the legacy service can return
> some
> >> > > > invalid
> >> > > > > characters and is causing the stax parser to blow up in such a
> way
> >> > > that I
> >> > > > > can't even handle it gracefully with a fault sequence.  I'd
> really
> >> > like
> >> > > > to
> >> > > > > pre-process the responses (before they are parsed/built) as 99%
> of
> >> > the
> >> > > > time
> >> > > > > it is simply a case of replacing characters with numeric
> character
> >> > > > > references or character entity references..
> >> > > > >
> >> > > > > We are unable to modify the legacy service to remove these
> >> erroneous
> >> > > > > responses.
> >> > > > >
> >> > > > > Heres the proxy config (I said it was simple!!) followed by the
> >> > > Exception
> >> > > > > thrown...  The exception causes the service to hang and the
> fault
> >> > > > sequence
> >> > > > > is only entered after a 60 second timeout.
> >> > > > >
> >> > > > > <proxy xmlns="http://ws.apache.org/ns/synapse";
> >> name="legacyservice"
> >> > > > > transports="http" startOnLoad="true">
> >> > > > >
> >> > > > >   <target endpoint="legacyXMLReceiver">
> >> > > > >
> >> > > > >      <inSequence>
> >> > > > >
> >> > > > >         <log level="full">
> >> > > > >
> >> > > > >            <property name="MESSAGE" value="InSequence" />
> >> > > > >
> >> > > > >         </log>
> >> > > > >
> >> > > > >      </inSequence>
> >> > > > >
> >> > > > >      <outSequence>
> >> > > > >
> >> > > > >         <log level="full">
> >> > > > >
> >> > > > >            <property name="MESSAGE" value="OutSequence" />
> >> > > > >
> >> > > > >         </log>
> >> > > > >
> >> > > > >            <send />
> >> > > > >
> >> > > > >         </outSequence>
> >> > > > >
> >> > > > >         <faultSequence>
> >> > > > >
> >> > > > >            <makefault version="soap11">
> >> > > > >
> >> > > > >               <code xmlns:soap11Env="
> >> > > > > http://schemas.xmlsoap.org/soap/envelope/";
> >> value="soap11Env:Server"
> >> > />
> >> > > > >
> >> > > > >               <reason
> expression="get-property('ERROR_MESSAGE')" />
> >> > > > >
> >> > > > >               <role />
> >> > > > >
> >> > > > >            </makefault>
> >> > > > >
> >> > > > >            <log level="full">
> >> > > > >
> >> > > > >               <property name="MESSAGE" value="FaultSequence" />
> >> > > > >
> >> > > > >            </log>
> >> > > > >
> >> > > > >            <property name="HTTP_SC" value="500" scope="axis2" />
> >> > > > >
> >> > > > >            <send />
> >> > > > >
> >> > > > >         </faultSequence>
> >> > > > >
> >> > > > >      </target>
> >> > > > >
> >> > > > >   </proxy>
> >> > > > >
> >> > > > >
> >> > > > > <endpoint xmlns="http://ws.apache.org/ns/synapse";
> >> > > > > name="legacyXMLReceiver">
> >> > > > >
> >> > > > >   <address uri="http://a.b.c.d:8080/legacyService/LegacyServlet
> "
> >> > > > > format="pox" >
> >> > > > >
> >> > > > >   </address>
> >> > > > >
> >> > > > > </endpoint>
> >> > > > >
> >> > > > >
> >> > > > > ERROR
> {org.apache.axis2.transport.base.threads.NativeWorkerPool} -
> >> > > > >  Uncaught exception
> >> > > > > {org.apache.axis2.transport.base.threads.NativeWorkerPool}
> >> > > > > *org.apache.axiom.om.OMException:
> com.ctc.wstx.exc.WstxIOException:
> >> > > > Invalid
> >> > > > > UTF-8 middle byte 0x3c (at char #714, byte #127)*
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.axiom.om.impl.builder.StAXOMBuilder.next(StAXOMBuilder.java:296)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.axiom.om.impl.llom.OMElementImpl.buildNext(OMElementImpl.java:653)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.axiom.om.impl.llom.OMNodeImpl.getNextOMSibling(OMNodeImpl.java:122)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.axiom.om.impl.llom.OMElementImpl.getNextOMSibling(OMElementImpl.java:343)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.axiom.om.impl.traverse.OMChildrenIterator.getNextNode(OMChildrenIterator.java:36)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.axiom.om.impl.traverse.OMAbstractIterator.hasNext(OMAbstractIterator.java:58)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.axiom.om.impl.util.OMSerializerUtil.serializeChildren(OMSerializerUtil.java:555)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.axiom.om.impl.llom.OMElementImpl.internalSerialize(OMElementImpl.java:875)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.axiom.om.impl.util.OMSerializerUtil.serializeChildren(OMSerializerUtil.java:556)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.axiom.om.impl.llom.OMElementImpl.internalSerialize(OMElementImpl.java:875)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.axiom.om.impl.util.OMSerializerUtil.serializeChildren(OMSerializerUtil.java:556)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.axiom.om.impl.llom.OMElementImpl.internalSerialize(OMElementImpl.java:875)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.axiom.soap.impl.llom.SOAPEnvelopeImpl.internalSerialize(SOAPEnvelopeImpl.java:230)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.axiom.om.impl.llom.OMSerializableImpl.serialize(OMSerializableImpl.java:125)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.axiom.om.impl.llom.OMSerializableImpl.serialize(OMSerializableImpl.java:113)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.axiom.om.impl.llom.OMElementImpl.toString(OMElementImpl.java:988)
> >> > > > > at java.lang.String.valueOf(String.java:2826)
> >> > > > > at java.lang.StringBuffer.append(StringBuffer.java:219)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.synapse.mediators.builtin.LogMediator.getFullLogMessage(LogMediator.java:184)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.synapse.mediators.builtin.LogMediator.getLogMessage(LogMediator.java:123)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.synapse.mediators.builtin.LogMediator.mediate(LogMediator.java:91)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:60)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.synapse.mediators.base.SequenceMediator.mediate(SequenceMediator.java:114)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.synapse.core.axis2.Axis2SynapseEnvironment.injectMessage(Axis2SynapseEnvironment.java:229)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.synapse.core.axis2.SynapseCallbackReceiver.handleMessage(SynapseCallbackReceiver.java:370)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.synapse.core.axis2.SynapseCallbackReceiver.receive(SynapseCallbackReceiver.java:160)
> >> > > > > at
> org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:181)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.synapse.transport.nhttp.ClientWorker.run(ClientWorker.java:275)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.axis2.transport.base.threads.NativeWorkerPool$1.run(NativeWorkerPool.java:173)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >> > > > > at java.lang.Thread.run(Thread.java:680)
> >> > > > > *Caused by: com.ctc.wstx.exc.WstxIOException: Invalid UTF-8
> middle
> >> > byte
> >> > > > > 0x3c (at char #714, byte #127)*
> >> > > > > at
> >> com.ctc.wstx.sr.StreamScanner.throwFromIOE(StreamScanner.java:708)
> >> > > > > at
> >> > com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1086)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.axiom.util.stax.wrapper.XMLStreamReaderWrapper.next(XMLStreamReaderWrapper.java:225)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.axiom.util.stax.dialect.DisallowDoctypeDeclStreamReaderWrapper.next(DisallowDoctypeDeclStreamReaderWrapper.java:34)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.axiom.util.stax.wrapper.XMLStreamReaderWrapper.next(XMLStreamReaderWrapper.java:225)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.axiom.om.impl.builder.StAXOMBuilder.parserNext(StAXOMBuilder.java:681)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.axiom.om.impl.builder.StAXOMBuilder.next(StAXOMBuilder.java:214)
> >> > > > > ... 31 more
> >> > > > > *Caused by: java.io.CharConversionException: Invalid UTF-8
> middle
> >> > byte
> >> > > > 0x3c
> >> > > > > (at char #714, byte #127)*
> >> > > > > at
> >> com.ctc.wstx.io.UTF8Reader.reportInvalidOther(UTF8Reader.java:313)
> >> > > > > at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:204)
> >> > > > > at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)
> >> > > > > at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> com.ctc.wstx.sr.StreamScanner.loadMoreFromCurrent(StreamScanner.java:1046)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> com.ctc.wstx.sr.StreamScanner.loadMoreFromCurrent(StreamScanner.java:1053)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> com.ctc.wstx.sr.StreamScanner.getNextInCurrAfterWS(StreamScanner.java:892)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> com.ctc.wstx.sr.BasicStreamReader.handleNsAttrs(BasicStreamReader.java:2963)
> >> > > > > at
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:2936)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2848)
> >> > > > > at
> >> > com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)
> >> > > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > Hiranya Jayathilaka
> >> > > > Associate Technical Lead;
> >> > > > WSO2 Inc.;  http://wso2.org
> >> > > > E-mail: [email protected];  Mobile: +94 77 633 3491
> >> > > > Blog: http://techfeast-hiranya.blogspot.com
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Hiranya Jayathilaka
> >> > Associate Technical Lead;
> >> > WSO2 Inc.;  http://wso2.org
> >> > E-mail: [email protected];  Mobile: +94 77 633 3491
> >> > Blog: http://techfeast-hiranya.blogspot.com
> >> >
> >>
> >
> >
> >
> > --
> > Hiranya Jayathilaka
> > Associate Technical Lead;
> > WSO2 Inc.;  http://wso2.org
> > E-mail: [email protected];  Mobile: +94 77 633 3491
> > Blog: http://techfeast-hiranya.blogspot.com
>

Reply via email to