On Thu, Jan 27, 2011 at 3:51 AM, prasad deshpande <prasad.deshpand...@gmail.com> wrote: > The size of docs can be huge, like suppose there are 800MB pdf file to index > it I need to translate it in UTF-8 and then send this file to index.
PDF is binary AFAIK... you shouldn't need to do any charset translation before sending it to solr, or any other extraction library. If you're using solr-cell then it's the Tika component that is responsible for pulling out the text in the right format. -Yonik http://lucidimagination.com