There are are many steps that can go wrong. Your platform should have UTF-8 as its default encoding. Windows and Macos don't do this. I had to configure Chrome to use UTF-8 as its default display encoding. Also, if you use Tomcat, it has to be configured for UTF-8:
http://wiki.apache.org/solr/SolrTomcat The characters you posted are not transferring correctly. I think you need to decode them using one of the online unicode utility pages. On Mon, May 21, 2012 at 4:57 AM, Jack Krupansky <j...@basetechnology.com> wrote: > Is it possible that your text editor/display does not support UTF-8 > encoding? > > Assuming the data is properly encoded, do you have the encoding="UTF-8" > attribute in your DIH dataSource tag? > > > -- Jack Krupansky > > -----Original Message----- From: KP Sanjailal > Sent: Monday, May 21, 2012 7:37 AM > To: solr-user@lucene.apache.org > Subject: Re: Indexing & Searching MySQL table with Hindi and English data > > > Hi, > > Thank you so much for replying. > > The MySQL database server is running on a Fedora Core 12 Machine with Hindi > Language Support enabled. Details of the database are - ENGINE=MyISAM and > DEFAULT CHARSET=utf8 > > Data is imported using the Solr DataImportHandler (mysql jdbc driver). > In the schema.xml file the title field is defined as: > <field name="title" type="text_general" indexed="true" stored="true"/> > > I tried saving the query results directly to a text file from the MySQL > command prompt but it is not storing the results correctly. The file > contains the following characters. > > > à ¤¸à ¥Åà ¤° à ¤Šà ¤°à ¥<8d>à ¤Åà ¤¾ Saur oorja > > Please suggest what I have to do to solve this issue. > > Regards, > > Sanjailal KP > -- > > > > On Sun, May 20, 2012 at 6:59 AM, Lance Norskog <goks...@gmail.com> wrote: > >> Also, try saving data from a query into a file and verify that it is >> UTF-8 and the characters are correct. >> >> On Fri, May 18, 2012 at 7:54 AM, Jack Krupansky <j...@basetechnology.com> >> wrote: >> > Check the analyzers for the field types containing Hindi text to be sure >> > that they are not using a character mapping or "folding" filter that >> might >> > mangle the Hindi characters. Post the field type, say for the "title" >> field. >> > >> > Also, try manually (using curl or the post jar) adding a single document >> > that has Hindi data and see if that works. >> > >> > -- Jack Krupansky >> > >> > -----Original Message----- From: KP Sanjailal >> > Sent: Thursday, May 17, 2012 5:55 AM >> > To: solr-user@lucene.apache.org >> > Subject: Indexing & Searching MySQL table with Hindi and English data >> > >> > >> > Hi, >> > >> > I tried to setup indexing of MySQL tables in Apache Solr 3.6. >> > >> > Everything works fine but text in Hindi script (only some 10% of total >> > records) not getting indexed properly. >> > >> > A search with keyword in Hindi retrieve emptly result set. Also a >> > retrieved hindi record displays junk characters. >> > >> > The database tables contains bibliographical details of books such as >> > title, author, publisher, isbn, publishing place, series etc. and out of >> > the total records about 10% of records contains text in Hindi in title, >> > author, publisher fields. >> > >> > Example: >> > >> > *Search Results from MySQL using PHP* >> > >> > 1. >> > <http://192.168.0.132/shared/biblio_view.php?bibid=26913&tab=opac> >> > *Title:* सौर ऊर्जा Saur >> > oorja<http://192.168.0.132/shared/biblio_view.php?bibid=26913&tab=opac> >> > *Author(s):* विनोद कुमार मिश्र MISHRA (VK) *Material:* Books ** ** >> > *Search Results from Apache Solr (searched using keyword in English)* >> > >> > 1. >> > <http://192.168.0.132/test/biblio_view.php?bibid=26913&tab=opac> >> > *Title:* सौर ऊरॠजा Saur >> > oorja<http://192.168.0.132/test/biblio_view.php?bibid=26913&tab=opac> >> > *Author(s):* विनोद कॠमार मिशॠर MISHRA (VK) >> * >> > Material:* Books >> > >> > >> > How do I go about solving this language problem. >> > >> > Thanks in advace. >> > >> > K. P. Sanjailal >> > -- >> > >> >> >> >> -- >> Lance Norskog >> goks...@gmail.com >> > -- Lance Norskog goks...@gmail.com