@Arnold: are these non UTF-8 control characters (which is what the Nutch
issue was about) or otherwise legal UTF-8  characters which Solr for some
reason is choking on ?

If you could provide a full stack trace it would be really helpful.


On Thu, Sep 14, 2017 at 2:55 PM, Markus Jelsma <markus.jel...@openindex.io>
wrote:

> Hello,
>
> You can not do this in Solr, you cannot even send non-character code
> points in the first place. For Apache Nutch we solved the problem by
> stripping those non-character code points from Strings before putting them
> in SolrDocument. Check the ticket, you can easily resuse the strip method.
>
> Perhaps it would be a good idea to move the method to SolrDocument or
> somewhere in SolrJ in the first place, so others don't have to bother with
> this problem.
>
> Regards,
> Markus
>
> https://issues.apache.org/jira/browse/NUTCH-1016
>
>
>
> -----Original message-----
> > From:Arnold Bronley <arnoldbron...@gmail.com>
> > Sent: Thursday 14th September 2017 19:46
> > To: solr-user@lucene.apache.org
> > Subject: How to remove control characters in stored value at Solr side
> >
> > I know I can apply PatternReplaceFilterFactory to remove control
> characters
> > from indexed value. However, is it possible to do similar thing for
> stored
> > value? Because of some control characters included in indexing request,
> > Solr throws Illegal Character Exception.
> >
>

Reply via email to