Hi,

Thanks very much for helps! I checked solr source code, what happened is that
for XML text inside one element, solr does not call URLDecoder (but to pass
CTRL character, I have to call urlencode from PHP).

So either I try to remove CTRL character from PHP side, or I change solr 
XMLReader
slightly to call URLDecoder on text.

Thanks and best regards, Lisheng


-----Original Message-----
From: Markus Jelsma [mailto:markus.jel...@openindex.io]
Sent: Wednesday, January 16, 2013 2:41 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr exception when parsing XML


In Apache Nutch we strip non-character code points with a simple method. Check 
the patch, the relevant part is easily ported to any language: 
https://issues.apache.org/jira/browse/NUTCH-1016

 
 
-----Original message-----
> From:Zhang, Lisheng <lisheng.zh...@broadvision.com>
> Sent: Wed 16-Jan-2013 20:48
> To: solr-user@lucene.apache.org
> Subject: RE: Solr exception when parsing XML
> 
> Hi Alex,
> 
> Thanks very much for helps! I switched to (I am using PHP in client side)
> 
> createTextNode(urlencode($value))
> 
> so CTRL character problem is avoided, but I noticed that somehow solr did
> not perform urldecode($value), so my initial value
> 
> abc xyz
> 
> becomes 
> 
> abc+xyz 
> 
> I have not fully read through solr code on this part, but guess maybe it
> is a configuration issue (when using CDATA I donot have this issue)?
> 
> Thanks and best regards, Lisheng
> 
> -----Original Message-----
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Tuesday, January 15, 2013 12:56 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr exception when parsing XML
> 
> 
> Interesting point. Looks like CDATA is more limiting than I thought:
> http://en.wikipedia.org/wiki/CDATA#Issues_with_encoding . Basically, the
> recommendation is to avoid CDATA and automatically encode characters such
> as yours, as well as less/more and ampersand.
> 
> Regards,
>    Alex.
> 

Reply via email to