Re: Encoding problem with ExtractRequestHandler for HTML indexing

2010-03-24 Thread Teruhiko Kurosaka
I suppose you mean Extract_ing_RequestHandler. Out of curiosity, I sent in a Japanese HTML file of EUC-JP encoding, and it converted to Unicode properly and the index has correct Japanese words. Does your HTML files have META tag for Content-type with the value having charset= ? For example,

RE: encoding problem

2009-09-01 Thread Bernadette Houghton
To: 'solr-user@lucene.apache.org' Subject: RE: encoding problem Still having a few issues with encoding, although I've been able to resolve the particular issue below by just re-editing the affected record. The other encoding issue is with Greek characters. With solr turned off in our user

RE: encoding problem

2009-08-30 Thread Bernadette Houghton
...@deakin.edu.au] Sent: Friday, 28 August 2009 9:31 AM To: 'solr-user@lucene.apache.org'; 'yo...@lucidimagination.com' Subject: RE: encoding problem Shalin, the XML from solr admin for the relevant field is displaying as - str name=citation_ta title=Browse by Author Name for Moncrieff, Joan href

RE: encoding problem

2009-08-27 Thread Bernadette Houghton
Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Wednesday, 26 August 2009 5:50 PM To: solr-user@lucene.apache.org Subject: Re: encoding problem On Wed, Aug 26, 2009 at 12:52 PM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Thanks for your quick reply, Shalin. Tomcat

Re: encoding problem

2009-08-27 Thread Yonik Seeley
Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Wednesday, 26 August 2009 5:50 PM To: solr-user@lucene.apache.org Subject: Re: encoding problem On Wed, Aug 26, 2009 at 12:52 PM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Thanks for your quick reply, Shalin. Tomcat

RE: encoding problem

2009-08-27 Thread Bernadette Houghton
Shalin, the XML from solr admin for the relevant field is displaying as - str name=citation_ta title=Browse by Author Name for Moncrieff, Joan href=/fez/list/author/Moncrieff%2C+Joan/Moncrieff, Joan/a, a title=Browse by Author Name for Macauley, Peter

RE: encoding problem

2009-08-26 Thread Bernadette Houghton
Hi Shalin, stupid question - I'm an apache/solr newbie - but how do I access the JVM??? Regards Bern -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Wednesday, 26 August 2009 5:10 PM To: solr-user@lucene.apache.org Subject: Re: encoding problem

Re: encoding problem

2009-08-26 Thread Shalin Shekhar Mangar
On Wed, Aug 26, 2009 at 12:42 PM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Hi Shalin, stupid question - I'm an apache/solr newbie - but how do I access the JVM??? When you execute the java executable, just add -Dfile.encoding=UTF-8 as a command line argument to the

RE: encoding problem

2009-08-26 Thread Bernadette Houghton
Thanks for your quick reply, Shalin. Tomcat is running on my Windows machine, but does not appear in Windows Services (as I was expecting it should ... am I wrong?). I'm running it from a startup.bat on my desktop - see below. Do I add the Dfile line to the startup.bat? SOLR is part of the

Re: encoding problem

2009-08-26 Thread Shalin Shekhar Mangar
On Wed, Aug 26, 2009 at 12:52 PM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Thanks for your quick reply, Shalin. Tomcat is running on my Windows machine, but does not appear in Windows Services (as I was expecting it should ... am I wrong?). I'm running it from a

RE: encoding problem

2009-08-26 Thread Fuad Efendi
If you are complaining about Web Application (other than SOLR) (probably behind-the Apache HTTPD) having encoding problem - try to troubleshoot it with Mozilla Firefox + Live Http Headers plugin. Look at Content-Encoding HTTP response headers, and don't forget about meta http-equiv... tag

Re: Encoding problem

2009-04-01 Thread Rui Pereira
Thanks,I detected that same problem. I have CP 1252 system file encoding and was recording data-config.xml file in UTF-8. DIH was reading using the default encoding. One possible workarround was using InputStream and OutputStream like DIH, but the files won't be in UTF-8 if the system has

Re: Encoding problem

2009-03-27 Thread aerox7
Hi, I had the same problem with DATAIMPORTHandler : i have a utf-8 mysql DATABASE but it's seems that DIH import data in LATIN... So i just use Transformer to (re)encode my strings in UTF-8. Rui Pereira-2 wrote: I'm having problems with encoding in responses from search queries. The

Re: Encoding problem

2009-03-27 Thread Shalin Shekhar Mangar
On Fri, Mar 27, 2009 at 8:41 PM, Rui Pereira ruipereira...@gmail.comwrote: I'm having problems with encoding in responses from search queries. The encoding problem only occurs in the topologyname field, if a instancename has accents it is returned correctly. In all my configurations I have

Re: Encoding problem

2009-03-27 Thread Shalin Shekhar Mangar
On Sat, Mar 28, 2009 at 12:51 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: I see that you are specifying the topologyname's value in the query itself. It might be a bug in DataImportHandler because it reads the data-config as a string from an InputStream. If your default platform