Re: verifying that an index contains ONLY utf-8

Peter Karich Wed, 12 Jan 2011 15:29:26 -0800

converting on the fly is not supported by Solr but should be relative
easy in Java.
Also scanning is relative simple (accept only a range). Detection too:
http://www.mozilla.org/projects/intl/chardet.html


> We've created an index from a number of different documents that are
> supplied by third parties. We want the index to only contain UTF-8
> encoded characters. I have a couple questions about this:
>
> 1) Is there any way to be sure during indexing (by setting something
> in the solr configuration?) that the documents that we index will
> always be stored in utf-8? Can solr convert documents that need
> converting on the fly, or can solr reject documents containing illegal
> characters?
>
> 2) Is there a way to scan the existing index to find any string
> containing non-utf8 characters? Or is there another way that I can
> discover if any crept into my index?
>


-- 
http://jetwick.com open twitter search

Re: verifying that an index contains ONLY utf-8

Reply via email to