@Arnold: are these non UTF-8 control characters (which is what the Nutch issue was about) or otherwise legal UTF-8 characters which Solr for some reason is choking on ?
If you could provide a full stack trace it would be really helpful. On Thu, Sep 14, 2017 at 2:55 PM, Markus Jelsma <markus.jel...@openindex.io> wrote: > Hello, > > You can not do this in Solr, you cannot even send non-character code > points in the first place. For Apache Nutch we solved the problem by > stripping those non-character code points from Strings before putting them > in SolrDocument. Check the ticket, you can easily resuse the strip method. > > Perhaps it would be a good idea to move the method to SolrDocument or > somewhere in SolrJ in the first place, so others don't have to bother with > this problem. > > Regards, > Markus > > https://issues.apache.org/jira/browse/NUTCH-1016 > > > > -----Original message----- > > From:Arnold Bronley <arnoldbron...@gmail.com> > > Sent: Thursday 14th September 2017 19:46 > > To: solr-user@lucene.apache.org > > Subject: How to remove control characters in stored value at Solr side > > > > I know I can apply PatternReplaceFilterFactory to remove control > characters > > from indexed value. However, is it possible to do similar thing for > stored > > value? Because of some control characters included in indexing request, > > Solr throws Illegal Character Exception. > > >