Re: Wildcard query with untokenized punctuation

2007-03-10 Thread Doron Cohen
Hi Colin, Is it possible that you are using an analyzer that breaks words on non letters? For instance SimpleAnalyzer? if so, the doc text: pagefile.sys is indexed as two words: pagefile sys At search time, the query text: pagefile.sys is also parsed-tokenized into a two words query: prof

Re: Find related question

2007-03-10 Thread markharw00d
>>most of the body text is the same, but I want to group them all under one result. I created this analyzer class to identify content that was "mostly similar" but not necessarily identical. http://issues.apache.org/jira/browse/LUCENE-725 If you feed a small set of documents through it (say y

index optimisation - disk fill-up

2007-03-10 Thread Dino Korah
Hi All, I understand lucene has a requirement of double the size of index available free on the disk on which the index is being optimised. But if in case the disk gets filled up during optimisation, what will happen to the index, theoretically? Is there an effective way of avoiding this? Many T

Re: index optimisation - disk fill-up

2007-03-10 Thread Michael McCandless
"Dino Korah" <[EMAIL PROTECTED]> wrote: > I understand lucene has a requirement of double the size of index > available > free on the disk on which the index is being optimised. But if in case > the > disk gets filled up during optimisation, what will happen to the index, > theoretically? Is there

Need a help

2007-03-10 Thread Chaminda Amarasinghe
Hi all, I'm new to this group, I'm using lucene for indexing. I have a problem. Any help gratly appreciate. Please see the following code // three fields MultiFieldQueryParser parser = new MultiFieldQueryParser(new String[]{"title", "tags", "content"}, new StandardAnalyzer())

HITS and termDoc give different results

2007-03-10 Thread dziadgba
hye, I want to extract documents which contain a specific term. I tried to do it in two different ways: 1 Using the 'iterator' termdocs = reader.termDocs(term); 2 Using search and examing Hits turns out that the result are sometimes equal, sometimes the first is a subset of the second and som

Re: updating index

2007-03-10 Thread no spam
BTW Erick this works brilliantly with UN_TOKENIZED. SUPER fast :) On 2/25/07, Erick Erickson <[EMAIL PROTECTED]> wrote: Yes, I'm pretty sure you have to index the field (UN_TOKENIZED) to be able to fetch it with TermDocs/TermEnum! The loop I posted works like this for each term in the ind

RE: Wildcard query with untokenized punctuation

2007-03-10 Thread McGuigan, Colin
Doron; You're entirely correct about the analyzer (I'm using one that breaks on non-alphanumeric characters, so all punctuation is ignored). To be honest, I hadn't thought about altering this, but I guess I could; just reticent that there might be unforeseen consequences. But I'm still curious a

Re: index optimisation - disk fill-up

2007-03-10 Thread Dino Korah
Cheers Michael. On 10/03/07, Michael McCandless <[EMAIL PROTECTED]> wrote: "Dino Korah" <[EMAIL PROTECTED]> wrote: > I understand lucene has a requirement of double the size of index > available > free on the disk on which the index is being optimised. But if in case > the > disk gets filled u

Re: Query String for a phrase?

2007-03-10 Thread ruchi thakur
does that mean* jakarta&apache* should search for * jakartaapache* But using *jakarta&apache* am able to search for *jakarta apache* , but was confused as no reference to this query String(jakarta&apache) could find anywhere on net. Regards, Ruchi On 3/8/07, Doron Cohen <[EMAIL PROTECTED]> wrot

help!!!!

2007-03-10 Thread ashwin kumar
hi all my name is ashwin i am trying to connect my servlet front end to my backend lucene search program these r the two programs <> import javax.servlet.*; import javax.servlet.http.*; import java.io.*; import java.lang.*; //import java.io.*; import java.io.FileReader; import java.io.Reader;

Re: A solution to HitCollector-based searches problems

2007-03-10 Thread Mohammad Norouzi
Hi Oramas if I use that jar file, it conflicts with lucene-core.jar file. for exampl, IndexSearcher class that you defined is different from the original one. Do I have to remove the lucene-core jar file? if yes, how about the other original classes On 3/8/07, oramas martín <[EMAIL PROTECTED]> wr

Re: HITS and termDoc give different results

2007-03-10 Thread Doron Cohen
Is "Text" the only field in the index? Note that the search only looks at field "Text", while the terms() iteration as appears in that code might bump into a term with same text but in another field. A better comparison would be to create a Term ("Text",), and compare TermQuery(thatTerm) to termDo

RE: Wildcard query with untokenized punctuation

2007-03-10 Thread Doron Cohen
"McGuigan, Colin" <[EMAIL PROTECTED]> wrote on 10/03/2007 11:04:37: > You're entirely correct about the analyzer (I'm using one that breaks on > non-alphanumeric characters, so all punctuation is ignored). To be > honest, I hadn't thought about altering this, but I guess I could; just > reticent

Delete document with keyword field

2007-03-10 Thread Harini Raghavan
Hi All, I have a lucene index with many fields, one of which is a Keyword field IS. The values stored in this field are the document ids like _839930494, _839930492. But I am unable to delete the documents using this id. Is this something to do with the underscore? Can someone suggest how I shou

Re: Query String for a phrase?

2007-03-10 Thread Doron Cohen
"ruchi thakur" <[EMAIL PROTECTED]> wrote on 10/03/2007 19:32:14: > does that mean* jakarta&apache* should search for * jakartaapache* I assume '*' here is for emphasizing the query text, - this is somewhat confusing because '*' is part of Lucene's query syntax for wildcard search. To the questi