Re: Conceptual Question

2007-06-22 Thread Chris Hostetter
: > There is a change interface in JIRA, as long as all of the fields : > originally sent are stored. : : Do you remember the JIRA issue, or a token to find it ? It sounds useful : in some cases, for example, when you are working on analysers. That : could be real life for me in future. https://i

Re: Use Windows 1252 encoding...

2007-06-22 Thread Chris Hostetter
: Is it possible to use Windows 1252 encoding instead of UTF-8 for Solr ? The not at the moment... https://issues.apache.org/jira/browse/SOLR-96 -Hoss

Re: Use Windows 1252 encoding...

2007-06-22 Thread Nick Jenkin
Have you tried using the PHP functions utf8_decode/utf8_encode? As far as I understand only UTF8 is supported (but I could be wrong on that!) -Nick On 6/23/07, escher2k <[EMAIL PROTECTED]> wrote: Is it possible to use Windows 1252 encoding instead of UTF-8 for Solr ? The application runs on Lin

Re: page rank

2007-06-22 Thread Nick Jenkin
Hi David 1) you will have to re-add the documents, solr does not support an update operation (only add/del) 2) same as above, solr does not support an update operation, you will need to re-add the document with the updated numberField, if its any help I have a popularity field in my index (3 mi

Use Windows 1252 encoding...

2007-06-22 Thread escher2k
Is it possible to use Windows 1252 encoding instead of UTF-8 for Solr ? The application runs on Linux/JDK 1.5. We are using PHP for the front end. The problem we are having is that some characters are displayed weirdly owing to the encoding. Thanks. -- View this message in context: http://www.n

RE: Multi-language Tokenizers / Filters recommended?

2007-06-22 Thread Teruhiko Kurosaka
Hi Daniel, As you know, Chinese and Japanese does not use space or any other delimiters to break words. To overcome this problem, CJKTokenizer uses a method called bi-gram where the run of ideographic (=Chinese) characters are made into tokens of two neighboring characters. So a run of five chara

Re: add CJKTokenizer to solr

2007-06-22 Thread Chris Hostetter
: What would be the best way to not hide their use? : : How about just... -Hoss

Re: add CJKTokenizer to solr

2007-06-22 Thread Mike Klaas
On 21-Jun-07, at 10:22 PM, Chris Hostetter wrote: like i said though: i'm in favore of factories like this ... i just don't think we should do anything to hide their use and make refering to Tokenizer or TOkenFilter class names directly use reflection magicly. What would be the best way to

Re: add CJKTokenizer to solr

2007-06-22 Thread Chris Hostetter
: Sorry I've confused things a bit... The thread safeness have to be : considered only on the Tokenizers, not on the factories. So are the : Tokenizers thread safe? nope ... they are constructed using Readers and mainting state about the text they are processing ... the only api is a "next()" met

RE: add CJKTokenizer to solr

2007-06-22 Thread Xuesong Luo
Thanks, otis, I didn't know CJK is only used for Asian language. I'll try the German Analyzer. -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Friday, June 22, 2007 3:18 AM To: solr-user@lucene.apache.org Subject: Re: add CJKTokenizer to solr I'm jumping in th

RE: page rank

2007-06-22 Thread David Xiao
I have a few more questions base on your kindly replies to my first question. 1. My solr instance already indexed hundreds of thousands of documents, so how can I update these documents to add new field "numberField" 2. In runtime, my application might want to update value of "numberField" very

Re: add CJKTokenizer to solr

2007-06-22 Thread Otis Gospodnetic
Tokenizers are not thread safe (I made a mistake yesterday saying they are - I don't know what I was thinking). This is why: public abstract class Tokenizer extends TokenStream { /** The text source for this Tokenizer. */ protected Reader input; < oops :(

Re: add CJKTokenizer to solr

2007-06-22 Thread Daniel Alheiros
Sorry I've confused things a bit... The thread safeness have to be considered only on the Tokenizers, not on the factories. So are the Tokenizers thread safe? Regards, Daniel On 22/6/07 11:36, "Daniel Alheiros" <[EMAIL PROTECTED]> wrote: > Hi Hoss. > > I've done a few tests using reflection to

Re: add CJKTokenizer to solr

2007-06-22 Thread Daniel Alheiros
Hi Hoss. I've done a few tests using reflection to instantiate a simple object and the results will vary a lot depending on the JVM. As the JVM optimizes code as it is executed it will vary depending on the usage, but I think we have something to consider: If done 1,000 samples (5 clean X loop of

Re: add CJKTokenizer to solr

2007-06-22 Thread Otis Gospodnetic
I'm jumping in the middle of the thread here. CJK = Chinese, Japanese, Korean German = etwas ganz anderes Why are you trying to use CJKAnalyzer+Tokenizer for German? Have you tried German Analyzer from Lucene contrib? Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- htt

Re: Conceptual Question

2007-06-22 Thread Frédéric Glorieux
Hi Yonik, Sorry to jump on an old post There is a change interface in JIRA, as long as all of the fields originally sent are stored. Do you remember the JIRA issue, or a token to find it ? It sounds useful in some cases, for example, when you are working on analysers. That could be real lif