On Thu, Jan 27, 2011 at 3:51 AM, prasad deshpande
<prasad.deshpand...@gmail.com> wrote:
> The size of docs can be huge, like suppose there are 800MB pdf file to index
> it I need to translate it in UTF-8 and then send this file to index.

PDF is binary AFAIK... you shouldn't need to do any charset
translation before sending it to solr, or any other extraction
library.  If you're using solr-cell then it's the Tika component that
is responsible for pulling out the text in the right format.

-Yonik
http://lucidimagination.com

Reply via email to