Re: Does solr supports indexing of files other than UTF-8

Yonik Seeley Fri, 28 Jan 2011 09:00:16 -0800

On Thu, Jan 27, 2011 at 3:51 AM, prasad deshpande
<prasad.deshpand...@gmail.com> wrote:
> The size of docs can be huge, like suppose there are 800MB pdf file to index
> it I need to translate it in UTF-8 and then send this file to index.


PDF is binary AFAIK... you shouldn't need to do any charset
translation before sending it to solr, or any other extraction
library.  If you're using solr-cell then it's the Tika component that
is responsible for pulling out the text in the right format.

-Yonik
http://lucidimagination.com

Re: Does solr supports indexing of files other than UTF-8

Reply via email to