Low hits

2007-01-23 Thread DECAFFMEYER MATHIEU
Hi, I'm pretty new to Lucene and I try to find some help here. I added the title of the document : doc.add(Field.Text("title", title)); e.g. the title is "Constructions" When I do a search on this title I have as result 2% Can someone help me udnerstanding what I am doing wrong ? Thank u. ___

Re: Low hits

2007-01-23 Thread Erick Erickson
What version of Lucene are you using? 2.0 doesn't have a doc.add like that. You'd do something like doc.add(new Field("title", title, Field.Store.YES, Field.Index.TOKENIZED); So I really don't understand what you're trying to do. Nor do I understand what "2%" means in this context But there

Re: Websphere and Dark Matter

2007-01-23 Thread Nadav Har'El
On Mon, Jan 22, 2007, John Haxby wrote about "Re: Websphere and Dark Matter": > Nadav Har'El wrote: > Are you implying that the process memory shrinks, that memory is > returned to the kernel? I didn't read the page you referenced that way. > I know that if I allocate memory by memory mapping anon

Re: Long Query Performance

2007-01-23 Thread Somnath Banerjee
Thanks for all the reply. I'll try the methods suggested by you will post the result of my experiment. Chris, I was measuring the query time only. I have increased the heap size of java to 1 GB. Now, 5 - 8 words query is taking about 0.1 - 0.4 second. That's reasonable I guess. Thanks, Somnath

RE: Low hits

2007-01-23 Thread DECAFFMEYER MATHIEU
Actually I am using Regain over Lucene for URL indexing. And Regain uses in its last stable release Lucene 1.4.3 When I index the whole website, then when I type a title of a document I have like 60 to 70 % as score. When I index only one page, then when I type the title I have like 2% as score.

NO_NORMS and TOKENIZED?

2007-01-23 Thread Nadav Har'El
Hi, When adding a field to a document, Field.Index gives me four options: NO, NO_NORMS, TOKENIZED and UN_TOKENIZED. NO_NORMS means, according to the documentation "index the field's value without an Analyzer, and disable the storing of norms." What can I do if I want to index the field's value *

Re: NO_NORMS and TOKENIZED?

2007-01-23 Thread Yonik Seeley
On 1/23/07, Nadav Har'El <[EMAIL PROTECTED]> wrote: Hi, When adding a field to a document, Field.Index gives me four options: NO, NO_NORMS, TOKENIZED and UN_TOKENIZED. NO_NORMS means, according to the documentation "index the field's value without an Analyzer, and disable the storing of norms."

Re: NO_NORMS and TOKENIZED?

2007-01-23 Thread Nadav Har'El
On Tue, Jan 23, 2007, Yonik Seeley wrote about "Re: NO_NORMS and TOKENIZED?": > >When adding a field to a document, Field.Index gives me four options: NO, > >NO_NORMS, TOKENIZED and UN_TOKENIZED. >.. > >What can I do if I want to index the field's value *with* an Analyzer, but > >still disable the

Re: Lucene Internals question

2007-01-23 Thread Grant Ingersoll
You might also be interested in https://issues.apache.org/jira/browse/ LUCENE-755 (aka the Payloads patch) which will enable storing information at the token level and allow for plugging in more scoring options related to it. There has been a variety of discussions over on java-dev related t

Re: Lucene Internals question

2007-01-23 Thread Yonik Seeley
On 1/23/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote: You might also be interested in https://issues.apache.org/jira/browse/ LUCENE-755 (aka the Payloads patch) which will enable storing information at the token level and allow for plugging in more scoring options related to it. There has been

query parser syntax -- does "-" require no-space after it to act as "prohibitor"?

2007-01-23 Thread Felix Litman
Does a special character lika a "-" prohibitor operator require no-space after it in order to work as a prohibitor? Typically on the web, e.g. Google and others, the "-" operator works as a boolean prohibitor only when not followed by a space. Otherwise it is treated as just a dash query t

Re: query parser syntax -- does "-" require no-space after it to act as "prohibitor"?

2007-01-23 Thread Yonik Seeley
On 1/23/07, Felix Litman <[EMAIL PROTECTED]> wrote: Does a special character lika a "-" prohibitor operator require no-space after it in order to work as a prohibitor? Typically on the web, e.g. Google and others, the "-" operator works as a boolean prohibitor only when not followed by a spa

Re: proximity and location scoring

2007-01-23 Thread Doron Cohen
Felix Litman <[EMAIL PROTECTED]> wrote on 23/01/2007 10:01:00: > Is there a straightforward way to extend the "standard" parser to > incorporate proximity into the score in multi-word queries, > including boost factors? Current parser supports relaxed phrase syntax: http://lucene.apache.org/java/

Re: query parser syntax -- does "-" require no-space after it to act as "prohibitor"?

2007-01-23 Thread Chris Hostetter
: If you want literals, put quotes around your terms... : "Sales +service" or if you don't want a full phrase, you just want "-" to be treated as a term match you can escape it, or quote it by itself... Sales \- service Sales "-" service -Hoss ---

Re: Long Query Performance

2007-01-23 Thread Chris Hostetter
: Chris, I was measuring the query time only. I have increased the heap size that's still doesn't tell us what you are doing -- "query time" can mean a lot of things ... are you using the Hits class? are you iterating over results? are you pulling out stored fields? are you sorting? are you using

RE: proximity and location scoring

2007-01-23 Thread Chima Echeruo
What about implementing a scoring policy that computes the score based only on which word position the term is matched? If the match occurred in the first word position, the score should be highest, if in the second word position it would be least highest etc.. Finally for matches that share th

Extending scoring to eliminate sorting on timestamp

2007-01-23 Thread rayvittal-lists
For various reasons, we'd like to eliminate the sort step. Our current query interface takes a start time and end time as an input range: RangeFilter rf = new RangeFilter("day", start, end, true, true); hits = searcher.search(query,rf,new Sort(new SortField[]{

RE: Low hits

2007-01-23 Thread Chris Hostetter
: When I index the whole website, then when I type a title of a document I : have like 60 to 70 % as score. : When I index only one page, then when I type the title I have like 2% as : score. I don't know what Regain is ... but this sounds like some issue between how it reports the scores Lucene

RE: proximity and location scoring

2007-01-23 Thread Chris Hostetter
: What about implementing a scoring policy that computes the score based : only on which word position the term is matched? if you wrote your own Similarity class and used SpanFirst queries that should be possible. It's the same basic principle as a Similarity that scores entirely by tf, except

Re: custom similarity based on tf but greater than 1.0

2007-01-23 Thread Vagelis Kotsonis
Even if I get what a I want using the coord method, I would still have the same problem becuase the similarity would return a number > 1 and afterwards, the scoring mechanisms would normilize these number to something <1.0 Thank you! Vagelis Otis Gospodnetic wrote: > > Jumping in at this point

Re: query parser syntax -- does "-" require no-space after it to act as "prohibitor"?

2007-01-23 Thread Felix Litman
Thank you. Lucene documentation is vague on this subject. On the LIA-book -earch powered by Lucene it seems the "-" operator works as a prohibitor regardless of the number of spaces after the "-". Still can't tell if this is a bug or by design. A Nutch parser, however, seems to have changed t

Re: custom similarity based on tf but greater than 1.0

2007-01-23 Thread Vagelis Kotsonis
So the normalization was made through Hits. That was something I didn't understand. If I was alone I would search in Scorer and query classes. Thank you for this. Finally I used the following: final HitQueue hq = new HitQueue(results.length()); searcher.search(qr, new HitCollector

Re: Long Query Performance

2007-01-23 Thread Somnath Banerjee
Here is the code. Let me know if you need any clarification // MaxConcepts is set to 100 long stTime = System.currentTimeMillis(); // bq is the Boolean query constructed out of the title of the query document TopDocs docs = searcher.search(bq, null, MaxConcepts); // Store the title of the resu