The size of docs can be huge, like suppose there are 800MB pdf file to index
it I need to translate it in UTF-8 and then send this file to index. Now
suppose there can be any number of clients who can upload file. at that time
it will affect performance. and already our product support localization
with local encoding.

Thanks,
Prasad

On Thu, Jan 27, 2011 at 2:04 PM, Paul Libbrecht <p...@hoplahup.net> wrote:

> Why is converting documents to utf-8 not feasible?
> Nowadays any platform offers such services.
>
> Can you give a detailed failure description (maybe with the URL to a sample
> document you post)?
>
> paul
>
>
> Le 27 janv. 2011 à 07:31, prasad deshpande a écrit :
> > I am able to successfully index/search non-Engilsh data(like Hebrew,
> > Japnese) which was encoded in UTF-8.
> > However, When I tried to index data which was encoded in local encoding
> like
> > Big5 for Japanese I could not see the desired results.
> > The contents after indexing looked garbled for Big5 encoded document when
> I
> > searched for all indexed documents.
> >
> > Converting a complete document in UTF-8 is not feasible.
> > I am not very clear about how Solr support these localizations with other
> > than UTF-8 encoding.
> >
> >
> > I verified below links
> > 1. http://lucene.apache.org/java/3_0_3/api/all/index.html
> > 2.  http://wiki.apache.org/solr/LanguageAnalysis
> >
> > Thanks and Regards,
> > Prasad
>
>

Reply via email to