Roman, This has come up on the Woodstox mailing list at least once before. You might want to try there.
-- Matt From: Roman Dolgov <[EMAIL PROTECTED]> Reply-To: <[email protected]> Date: Fri, 3 Aug 2007 10:55:32 -0700 To: <[email protected]> Subject: Re: [xfire-user] what is the best way to deal with illegal xml characters Nope, for now I just clean javabeans before feeding them into Xfire/Woodstox. I think about tweaking woodstox implementation to allow 'illegal character handling' mode, but didn't have time to do it. Regards, Roman On 8/1/07, Matthew Kerle <[EMAIL PROTECTED]> wrote: > Hi Roman, > > did you ever find an answer to your question about how to get Woodstox > to encode problem fields as character literal to deal with the illegal > character problem? The workaround that I came up with is fine but we'd > prefer not to use a 'hack' like this in a production system. > > I'm thinking there must be something we can set in the aegis mapping > file (or as an annotation attribute) that will tell xFire to encode that > field such that illegal characters can be passed through. Anyone have > any ideas? > > looking in the aegis mapping XSD there doesn't seem to be much in the > way of parameters to mark a field that might contain bad data... > http://xfire.codehaus.org/schemas/1.0/mapping.xsd > > *Matthew Kerle > **IT Consultant > **SRA Information Technology* > > *Canberra* > Ground Floor, 40 Brisbane Avenue > BARTON ACT 2600 > > Office: +61 2 6273 6122 > Fax: +61 2 6273 6155 > Mobile: +61404 096 863 > Email: [EMAIL PROTECTED] <mailto: [EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]> <mailto: [EMAIL PROTECTED] > > > Web: www.sra.com.au <http://www.sra.com.au> > > > Matthew Kerle wrote: >> > I had this problem, I don't have an answer for woodstox, but here's a >> > valid **workaround** using reflection that you can call after you have >> > your pojo's... >> > >> > call it like this: >> > ... >> > List list = delegate.doSearch(queryObject); >> > Validator val = new Validator(); >> > val.cleanStrings(list); >> > return list; >> > ... >> > >> > >> > // <<<<<<<< Start Validator.java >> > package au.gov.environment.imgws.service.helper; >> > >> > import org.apache.commons.logging.Log; >> > import org.apache.commons.logging.LogFactory ; >> > >> > import java.util.Collection; >> > import java.lang.reflect.Method; >> > >> > import au.com.sra.framework.vo.IValueObject; >> > import au.gov.environment.imgws.service.vo.ImageSearch; >> > >> > /** >> > * Created by IntelliJ IDEA. >> > * User: mkerle >> > * Date: 26/07/2007 >> > * Time: 10:25:09 >> > */ >> > public class Validator { >> > private static final Log log = LogFactory.getLog (Validator.class); >> > >> > /** >> > * Some of the data returned by the ImageSearch contains non-ascii >> > * characters that stuffs up the Woodstox XmlWriter. This method >> > * @param dirty >> > */ >> > public void cleanStrings(Object dirty){ >> > if(dirty instanceof Collection){ >> > //handle a collection >> > Collection c = (Collection)dirty; >> > for(Object o: c){ >> > cleanObject(o); >> > } >> > }else if(dirty instanceof IValueObject){ //IValueObject is our >> > pojo bean marker interface >> > cleanObject(dirty); >> > } >> > else { >> > >> > } >> > } >> > >> > //handles any kind of JavaBean >> > private void cleanObject(Object dirty){ >> > Method[] methods = dirty.getClass().getMethods(); >> > Class[] paramTypes = {String.class }; >> > String attr; >> > for(Method m: methods){ >> > if(m.getReturnType().equals(String.class) >> > && m.getName().startsWith("get") >> > && m.getParameterTypes().length == 0){ >> > //we now have a getter for a String property on a bean >> > try{ >> > attr = m.getName(); >> > attr = attr.substring(3,attr.length()); >> > String s = (String)m.invoke(dirty); >> > if(s!=null){ >> > s = validateASCII(dirty, attr, s); >> > } >> > if(s!=null){ >> > Method setString = >> > dirty.getClass().getMethod("set"+attr, paramTypes); >> > if(setString==null){ >> > throw new IllegalStateException("Missing >> > setter for getter '" + attr + "' on " + >> > >> > dirty.getClass().getCanonicalName()); >> > } >> > setString.invoke(dirty, s); >> > } >> > }catch(Exception e){ >> > log.error(e); >> > } >> > } >> > } >> > } >> > >> > >> > >> > private String validateASCII(Object dirty, String column, String s) { >> > boolean changed = false; >> > if(dirty instanceof ImageSearch){ >> > ImageSearch image = (ImageSearch) dirty; >> > if(image != null && image.getBarcode().equals("eadig00962")){ >> > image.getBarcode(); >> > } >> > } >> > for(int i = 0; i < s.length(); i++){ >> > char c = s.charAt(i); >> > if(c < ASCII.SPACE.value() >> > && c != ASCII.CR.value() >> > && c != ASCII.LF.value() >> > && c != ASCII.TAB.value()){ >> > //ah-ha! we've got one of the buggers! ok lets sanitise it >> > // and log it to output. >> > log.error("Illegal character '0x" + >> > Integer.toHexString(c) + "' found in attribute " + column); >> > log.error(dirty.toString()); >> > s = s.substring(0,i) + " " + s.substring(i+1, s.length()); >> > changed = true; >> > } >> > } >> > return changed?s:null; >> > } >> > } >> > >> > enum ASCII{ >> > CR(13), >> > LF(10), >> > TAB(8), >> > SPACE(32); >> > >> > private final int c; >> > ASCII(int b){c = b; } >> > int value(){return c;} >> > } >> > // <<<<<<<<<<<<<end Validator.java >> > >> > *Matthew Kerle >> > *** >> > >> > >> > Roman Dolgov wrote: >>> >> Hi All, >>> >> >>> >> Is there any way to let woodstox convert illegal characters into >>> >> 'character entity'? (or strip them out). >>> >> >>> >> The problem I am facing, that some of my data may contain illegal >>> >> characters, which causes com.ctc.wstx.exc.WstxIOException : Invalid >>> >> white space character.. exception. >>> >> I want to avoid running check on all my java beans and instead just >>> >> have one place where everything get 'fixed'. >>> >> >>> >> Any ideas how to best handle this problem, either by configuring >>> >> woodstox or providing some custom writer are welcome. >>> >> >>> >> Thanks, >>> >> Roman >> > --------------------------------------------------------------------- >> > To unsubscribe from this list please visit: >> > http://xircles.codehaus.org/manage_email > > > --------------------------------------------------------------------- > To unsubscribe from this list please visit: > > http://xircles.codehaus.org/manage_email >
