Use ICONV library in your server side language.

Convert it to utf-8, store it with a filed describing what incoding it was in, 
and re encode it if you wish.

 Dennis Gearon


Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



----- Original Message ----
From: prasad deshpande <prasad.deshpand...@gmail.com>
To: solr-user@lucene.apache.org
Sent: Fri, January 28, 2011 12:41:29 AM
Subject: Re: Does solr supports indexing of files other than UTF-8

Thanks paul.

However I want to support local encoding files to be indexed. How would I
achieve it?

On Thu, Jan 27, 2011 at 2:46 PM, Paul Libbrecht <p...@hoplahup.net> wrote:

> At least in java utf-8 transcoding is done on a stream basis. No issue
> there.
>
> paul
>
>
> Le 27 janv. 2011 à 09:51, prasad deshpande a écrit :
>
> > The size of docs can be huge, like suppose there are 800MB pdf file to
> index
> > it I need to translate it in UTF-8 and then send this file to index. Now
> > suppose there can be any number of clients who can upload file. at that
> time
> > it will affect performance. and already our product support localization
> > with local encoding.
> >
> > Thanks,
> > Prasad
> >
> > On Thu, Jan 27, 2011 at 2:04 PM, Paul Libbrecht <p...@hoplahup.net>
> wrote:
> >
> >> Why is converting documents to utf-8 not feasible?
> >> Nowadays any platform offers such services.
> >>
> >> Can you give a detailed failure description (maybe with the URL to a
> sample
> >> document you post)?
> >>
> >> paul
> >>
> >>
> >> Le 27 janv. 2011 à 07:31, prasad deshpande a écrit :
> >>> I am able to successfully index/search non-Engilsh data(like Hebrew,
> >>> Japnese) which was encoded in UTF-8.
> >>> However, When I tried to index data which was encoded in local encoding
> >> like
> >>> Big5 for Japanese I could not see the desired results.
> >>> The contents after indexing looked garbled for Big5 encoded document
> when
> >> I
> >>> searched for all indexed documents.
> >>>
> >>> Converting a complete document in UTF-8 is not feasible.
> >>> I am not very clear about how Solr support these localizations with
> other
> >>> than UTF-8 encoding.
> >>>
> >>>
> >>> I verified below links
> >>> 1. http://lucene.apache.org/java/3_0_3/api/all/index.html
> >>> 2.  http://wiki.apache.org/solr/LanguageAnalysis
> >>>
> >>> Thanks and Regards,
> >>> Prasad
> >>
> >>
>
>

Reply via email to