Re: verifying that an index contains ONLY utf-8

Markus Jelsma Wed, 12 Jan 2011 14:28:50 -0800

This is supposed to be dealt with outside the index. All input must be UTF-8 
encoded. Failing to do so will give unexpected results.


> We've created an index from a number of different documents that are
> supplied by third parties. We want the index to only contain UTF-8
> encoded characters. I have a couple questions about this:
> 
> 1) Is there any way to be sure during indexing (by setting something
> in the solr configuration?) that the documents that we index will
> always be stored in utf-8? Can solr convert documents that need
> converting on the fly, or can solr reject documents containing illegal
> characters?
> 
> 2) Is there a way to scan the existing index to find any string
> containing non-utf8 characters? Or is there another way that I can
> discover if any crept into my index?

Re: verifying that an index contains ONLY utf-8

Reply via email to