verifying that an index contains ONLY utf-8

Paul Wed, 12 Jan 2011 14:17:12 -0800

We've created an index from a number of different documents that are
supplied by third parties. We want the index to only contain UTF-8
encoded characters. I have a couple questions about this:


1) Is there any way to be sure during indexing (by setting something
in the solr configuration?) that the documents that we index will
always be stored in utf-8? Can solr convert documents that need
converting on the fly, or can solr reject documents containing illegal
characters?

2) Is there a way to scan the existing index to find any string
containing non-utf8 characters? Or is there another way that I can
discover if any crept into my index?

verifying that an index contains ONLY utf-8

Reply via email to