Roman,

This has come up on the Woodstox mailing list at least once before.  You
might want to try there.

-- Matt


From: Roman Dolgov <[EMAIL PROTECTED]>
Reply-To: <[email protected]>
Date: Fri, 3 Aug 2007 10:55:32 -0700
To: <[email protected]>
Subject: Re: [xfire-user] what is the best way to deal with illegal xml
characters

Nope, for now I just clean javabeans before feeding them into
Xfire/Woodstox.

I think about tweaking woodstox implementation to allow 'illegal character
handling' mode, but didn't have time to do it.

Regards,
Roman



On 8/1/07, Matthew Kerle <[EMAIL PROTECTED]> wrote:
> Hi Roman,
> 
> did you ever find an answer to your question about how to get Woodstox
> to encode problem fields as character literal to deal with the illegal
> character problem? The workaround that I came up with is fine but we'd
> prefer not to use a 'hack' like this in a production system.
> 
> I'm thinking there must be something we can set in the aegis mapping
> file (or as an annotation attribute) that will tell xFire to encode that
> field such that illegal characters can be passed through. Anyone have
> any ideas?
> 
> looking in the aegis mapping XSD there doesn't seem to be much in the
> way of parameters to mark a field that might contain bad data...
> http://xfire.codehaus.org/schemas/1.0/mapping.xsd
> 
> *Matthew Kerle
> **IT Consultant
> **SRA Information Technology*
> 
> *Canberra*
> Ground Floor, 40 Brisbane Avenue
> BARTON  ACT  2600
> 
> Office:    +61 2 6273 6122
> Fax:         +61 2 6273 6155
> Mobile:  +61404 096 863
> Email:    [EMAIL PROTECTED] <mailto: [EMAIL PROTECTED]
> <mailto:[EMAIL PROTECTED]> <mailto: [EMAIL PROTECTED] >  >
> Web:     www.sra.com.au <http://www.sra.com.au>
> 
> 
> Matthew Kerle wrote:
>> > I had this problem, I don't have an answer for woodstox, but here's a
>> > valid **workaround** using reflection that you can call after you have
>> > your pojo's...
>> >
>> > call it like this:
>> > ...
>> > List list = delegate.doSearch(queryObject);
>> > Validator val = new Validator();
>> > val.cleanStrings(list);
>> > return list;
>> > ...
>> >
>> >
>> > // <<<<<<<< Start Validator.java
>> > package au.gov.environment.imgws.service.helper;
>> >
>> > import org.apache.commons.logging.Log;
>> > import org.apache.commons.logging.LogFactory ;
>> >
>> > import java.util.Collection;
>> > import java.lang.reflect.Method;
>> >
>> > import au.com.sra.framework.vo.IValueObject;
>> > import au.gov.environment.imgws.service.vo.ImageSearch;
>> >
>> > /**
>> >  * Created by IntelliJ IDEA.
>> >  * User: mkerle
>> >  * Date: 26/07/2007
>> >  * Time: 10:25:09
>> >  */
>> > public class Validator {
>> >     private static final Log log = LogFactory.getLog (Validator.class);
>> >
>> >     /**
>> >      * Some of the data returned by the ImageSearch contains non-ascii
>> >      * characters that stuffs up the Woodstox XmlWriter. This method
>> >      * @param dirty
>> >      */
>> >     public void cleanStrings(Object dirty){
>> >         if(dirty instanceof Collection){
>> >             //handle a collection
>> >             Collection c = (Collection)dirty;
>> >             for(Object o: c){
>> >                 cleanObject(o);
>> >             }
>> >         }else if(dirty instanceof IValueObject){ //IValueObject is our
>> > pojo bean marker interface
>> >             cleanObject(dirty);
>> >         } 
>> >         else {
>> >
>> >         }
>> >     }
>> >
>> >     //handles any kind of JavaBean
>> >     private void cleanObject(Object dirty){
>> >         Method[] methods = dirty.getClass().getMethods();
>> >         Class[] paramTypes = {String.class };
>> >         String attr;
>> >         for(Method m: methods){
>> >             if(m.getReturnType().equals(String.class)
>> >                     && m.getName().startsWith("get")
>> >                     && m.getParameterTypes().length == 0){
>> >                 //we now have a getter for a String property on a bean
>> >                 try{
>> >                     attr = m.getName();
>> >                     attr = attr.substring(3,attr.length());
>> >                     String s = (String)m.invoke(dirty);
>> >                     if(s!=null){
>> >                         s = validateASCII(dirty, attr, s);
>> >                     }
>> >                     if(s!=null){
>> >                         Method setString =
>> > dirty.getClass().getMethod("set"+attr, paramTypes);
>> >                         if(setString==null){
>> >                             throw new IllegalStateException("Missing
>> > setter for getter '" + attr + "' on " +
>> >
>> > dirty.getClass().getCanonicalName());
>> >                         }
>> >                         setString.invoke(dirty, s);
>> >                     }
>> >                 }catch(Exception e){
>> >                     log.error(e);
>> >                 }
>> >             }
>> >         }
>> >     }
>> >
>> >
>> >
>> >     private String validateASCII(Object dirty, String column, String s) {
>> >         boolean changed = false;
>> >         if(dirty instanceof ImageSearch){
>> >             ImageSearch image = (ImageSearch) dirty;
>> >             if(image != null && image.getBarcode().equals("eadig00962")){
>> >                 image.getBarcode();
>> >             }
>> >         }
>> >         for(int i = 0; i <  s.length(); i++){
>> >             char c = s.charAt(i);
>> >             if(c < ASCII.SPACE.value()
>> >                     && c != ASCII.CR.value()
>> >                     && c != ASCII.LF.value()
>> >                     && c != ASCII.TAB.value()){
>> >                 //ah-ha! we've got one of the buggers! ok lets sanitise it
>> >                 // and log it to output.
>> >                 log.error("Illegal character '0x" +
>> > Integer.toHexString(c) + "' found in attribute " + column);
>> >                 log.error(dirty.toString());
>> >                 s = s.substring(0,i) + " " + s.substring(i+1, s.length());
>> >                 changed = true;
>> >             }
>> >         }
>> >         return changed?s:null;
>> >     }
>> > }
>> >
>> > enum ASCII{
>> >     CR(13),
>> >     LF(10),
>> >     TAB(8),
>> >     SPACE(32);
>> >
>> >     private final int c;
>> >     ASCII(int b){c = b;  }
>> >     int value(){return c;}
>> > }
>> > // <<<<<<<<<<<<<end Validator.java
>> >
>> > *Matthew Kerle
>> > ***
>> >
>> >
>> > Roman Dolgov wrote:
>>> >> Hi All,
>>> >>
>>> >> Is there any way to let woodstox convert illegal characters into
>>> >> 'character entity'? (or strip them out).
>>> >>
>>> >> The problem I am facing, that some of my data may contain illegal
>>> >> characters, which causes com.ctc.wstx.exc.WstxIOException : Invalid
>>> >> white space character.. exception.
>>> >> I want to avoid running check on all my java beans and instead just
>>> >> have one place where everything get 'fixed'.
>>> >>
>>> >> Any ideas how to best handle this problem, either by configuring
>>> >> woodstox or providing some custom writer are welcome.
>>> >>
>>> >> Thanks,
>>> >> Roman
>> > ---------------------------------------------------------------------
>> > To unsubscribe from this list please visit:
>> > http://xircles.codehaus.org/manage_email
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe from this list please visit:
> 
>     http://xircles.codehaus.org/manage_email
> 



Reply via email to