Re: reusing the term-frequency count while indexing

2011-10-24 Thread Simon Willnauer
so you are saying you got (uniqueTerm, freq) tuples and you want to make lucene use this directly? I think the easiest way is to write a simple tokenFilter that emit the term X times where X is the term frequency. There is no easy way to pass these tuples to lucene directly. simon On Mon, Oct 24,

custome index rule

2011-10-24 Thread janwen
Hi, I want to implement a custom index rule: Assume the sentence like the following:Note comma I am in China,I am in USA,I am in UK I hope lucene index above sentece based on the rule: 1)split the sentence with comma(,),so we get(I am in China)(I am in USA)(I am in UK) 2)then lucene just

Re: Merging several taxonomy indexes for faceted search

2011-10-24 Thread Christoph Kaser
Hi Shai, thank you very much for pointing me to the TaxonymMergeUtils-class, it does exactly what I need. I had only included the maven artifact for facet support and therefore did not see the (very helpful) examples package before. Best regards, Christoph Am 19.10.2011 21:02, schrieb Shai

Re: Merging several taxonomy indexes for faceted search

2011-10-24 Thread Shai Erera
I think that by mistake, the examples were not packaged with Lucene 3.4.0. I fixed it in the current 3x branch, so hopefully they will be available with 3.5.0, when it's out. Shai On Mon, Oct 24, 2011 at 11:43 AM, Christoph Kaser wrote: > Hi Shai, > > thank you very much for pointing me to the T

Re: custome index rule

2011-10-24 Thread Ian Lea
You can achieve pretty much anything by customizing parsers and tokenizers but for your simple case I'd just use String.split() and add the phrases one by one. Something like Document d = ... String[] phrases = sentence,split(","); for (String phrase : phrases) { d.add(new Field("phrase", phras

Re: Re: custome index rule

2011-10-24 Thread janwen
thanks,Ian.I will try your idea. 2011-10-24 janwen | China website : http://www.qianpin.com/ From:Ian Lea Date:2011-10-24 18:01 Subject:Re: custome index rule To:java-user Cc: You can achieve pretty much anything by customizing parsers and tokenizers but for your simple case I'd just use

Re: Language Identifier with Lucene?

2011-10-24 Thread Mead Lai
Luca, I would like to know: how much language, your system could identify? In my view, this difficult part in your system is: how to collect so many languages/character in the world for *one person*... Regards, Mead On Sun, Oct 23, 2011 at 1:27 AM, Petite Abeille wrote: > > On Oct 22, 2011, at

Re: performance question - number of documents

2011-10-24 Thread sol myr
Hi, Thanks for this reply. Could I please just ask - doesn't Lucene keep the data sorted, at least partially (heuristically)? E.g. if the reverse index says "the word DOE appears in documents #1, #7, #5" . Won't Lucene do some smart sorting on this list of documents? Maybe by frequency, first

AlreadySetException ?

2011-10-24 Thread Clemens Wyss
I am seeing this stack trace in my logs: org.apache.lucene.util.SetOnce$AlreadySetException: The object cannot be set twice! at org.apache.lucene.util.SetOnce.set(SetOnce.java:69) at org.apache.lucene.index.MergePolicy.setIndexWriter(MergePolicy.java:271) at org.apache.luc

Re: AlreadySetException ?

2011-10-24 Thread Dawid Weiss
> What can possibly cause this exception? I can't be calling the constructor of > IndexWriter twice, can I ;) I beet Chuck Norris can do that! :) Dawid - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For addit

SIGSEGV in JCCEnv::setClassPath

2011-10-24 Thread Stein, Ruben
Hello, I am using the ReviewBoard software, which internally uses PyLucene for its search function. Almost every time I use the search functionality however, I get a segmentation fault, which gets logged by apache: # A fatal error has been detected by the Java Runtime Environment: # # SIGS

RE: AlreadySetException ?

2011-10-24 Thread Uwe Schindler
Hi, You cannot use the same IndexWriterConfig for two different IndexWriters. Clone it before, if you want to use it multiple times. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Clemens Wyss [ma

Re: Language Identifier with Lucene?

2011-10-24 Thread Luca Rondanini
Mead, it just depends on how many languages you enter in the system! Collecting the data is not a huge problem: I'm using news websites in 19 languages! The quality of the content is usually high and they "talk" a lot! Watch out that the real problem is the encoding: you want to be sure everythi

Re: AlreadySetException ?

2011-10-24 Thread Simon Willnauer
Clone won't help here, you need to create your MergePolicy again even if you clone since MP is not cloneable etc. simon On Mon, Oct 24, 2011 at 5:08 PM, Uwe Schindler wrote: > Hi, > > You cannot use the same IndexWriterConfig for two different IndexWriters. > Clone it before, if you want to use

Re: reusing the term-frequency count while indexing

2011-10-24 Thread prasenjit mukherjee
Thats exactly I was trying to avoid :( I can afford to do that during indexing time, but it will be time-consuming to do that at search time. On Mon, Oct 24, 2011 at 1:05 PM, Simon Willnauer wrote: > so you are saying you got (uniqueTerm, freq) tuples and you want to > make lucene use this direc

setting up lucene for use on mac OSX

2011-10-24 Thread Daniel Quach
Hi all, I am unable to get the lucene demo to run on my macbook pro. I downloaded the jars into my home directory and then I set the CLASSPATH variable to point to them. However, once I run the example command for the lucene demo, it still complains to me about the missing class. Is there som

MoreLikeThis and TermVector relationship

2011-10-24 Thread Saurabh Gokhale
Hi, In my project, my intention is to show similar documents to the user based on the documents searched by the user. *As per Lucid Solr reference guide...* For best results, use stored TermVectors in the schema.xml for fields specified for similarity. For example: If termVectors are not stored,

AW: AlreadySetException ?

2011-10-24 Thread Clemens Wyss
Simon & Uwe, thanks a lot! - Clemens > -Ursprüngliche Nachricht- > Von: Simon Willnauer [mailto:simon.willna...@googlemail.com] > Gesendet: Montag, 24. Oktober 2011 19:35 > An: java-user@lucene.apache.org > Betreff: Re: AlreadySetException ? > > Clone won't help here, you need to create y

AW: AlreadySetException ?

2011-10-24 Thread Clemens Wyss
>Chuck Norris ... with his swiss army knife ... ;) Greetings from Switzerland - Clemens-having-a-swiss-army-knife-too > -Ursprüngliche Nachricht- > Von: Dawid Weiss [mailto:dawid.we...@gmail.com] > Gesendet: Montag, 24. Oktober 2011 17:01 > An: java-user@lucene.apache.org > Betreff: Re: A

Re: SIGSEGV in JCCEnv::setClassPath

2011-10-24 Thread Stein, Ruben
On 2011-10-24 17:06, ruben wrote: >Find attached the complete error log from apache. # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x7fab1f777874, pid=16468, tid=140373040924416 # # JRE version: 6.0_26-b0

RE: SIGSEGV in JCCEnv::setClassPath

2011-10-24 Thread Uwe Schindler
Hi Ruben, This mailing list is about Lucene Core (Java), which of course does the work behind the PyLucene JCC wrapper. But this SIGSEGV is not caused by Java code, so you'd better ask on the PyLucene Developer mailing list, as this seems to be a problem in the JCC wrapper: http://lucene.apache.o