Hi,
I'm pretty new to Lucene and I try to find some help here.
I added the title of the document :
doc.add(Field.Text("title", title));
e.g. the title is "Constructions"
When I do a search on this title I have as result 2%
Can someone help me udnerstanding what I am doing wrong ?
Thank u.
___
What version of Lucene are you using? 2.0 doesn't have a doc.add like that.
You'd do something like
doc.add(new Field("title", title, Field.Store.YES, Field.Index.TOKENIZED);
So I really don't understand what you're trying to do. Nor do I understand
what "2%" means in this context
But there
On Mon, Jan 22, 2007, John Haxby wrote about "Re: Websphere and Dark Matter":
> Nadav Har'El wrote:
> Are you implying that the process memory shrinks, that memory is
> returned to the kernel? I didn't read the page you referenced that way.
> I know that if I allocate memory by memory mapping anon
Thanks for all the reply. I'll try the methods suggested by you will post
the result of my experiment.
Chris, I was measuring the query time only. I have increased the heap size
of java to 1 GB. Now, 5 - 8 words query is taking about 0.1 - 0.4 second.
That's reasonable I guess.
Thanks,
Somnath
Actually I am using Regain over Lucene for URL indexing.
And Regain uses in its last stable release Lucene 1.4.3
When I index the whole website, then when I type a title of a document I
have like 60 to 70 % as score.
When I index only one page, then when I type the title I have like 2% as
score.
Hi,
When adding a field to a document, Field.Index gives me four options: NO,
NO_NORMS, TOKENIZED and UN_TOKENIZED.
NO_NORMS means, according to the documentation "index the field's value
without an Analyzer, and disable the storing of norms."
What can I do if I want to index the field's value *
On 1/23/07, Nadav Har'El <[EMAIL PROTECTED]> wrote:
Hi,
When adding a field to a document, Field.Index gives me four options: NO,
NO_NORMS, TOKENIZED and UN_TOKENIZED.
NO_NORMS means, according to the documentation "index the field's value
without an Analyzer, and disable the storing of norms."
On Tue, Jan 23, 2007, Yonik Seeley wrote about "Re: NO_NORMS and TOKENIZED?":
> >When adding a field to a document, Field.Index gives me four options: NO,
> >NO_NORMS, TOKENIZED and UN_TOKENIZED.
>..
> >What can I do if I want to index the field's value *with* an Analyzer, but
> >still disable the
You might also be interested in https://issues.apache.org/jira/browse/
LUCENE-755 (aka the Payloads patch) which will enable storing
information at the token level and allow for plugging in more scoring
options related to it.
There has been a variety of discussions over on java-dev related t
On 1/23/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
You might also be interested in https://issues.apache.org/jira/browse/
LUCENE-755 (aka the Payloads patch) which will enable storing
information at the token level and allow for plugging in more scoring
options related to it.
There has been
Does a special character lika a "-" prohibitor operator require no-space after
it in order to work as a prohibitor?
Typically on the web, e.g. Google and others, the "-" operator works as a
boolean prohibitor only when not followed by a space. Otherwise it is treated
as just a dash query t
On 1/23/07, Felix Litman <[EMAIL PROTECTED]> wrote:
Does a special character lika a "-" prohibitor operator require no-space after
it in order to work as a prohibitor?
Typically on the web, e.g. Google and others, the "-" operator works as a
boolean prohibitor only when not followed by a spa
Felix Litman <[EMAIL PROTECTED]> wrote on 23/01/2007 10:01:00:
> Is there a straightforward way to extend the "standard" parser to
> incorporate proximity into the score in multi-word queries,
> including boost factors?
Current parser supports relaxed phrase syntax:
http://lucene.apache.org/java/
: If you want literals, put quotes around your terms...
: "Sales +service"
or if you don't want a full phrase, you just want "-" to be treated as
a term match you can escape it, or quote it by itself...
Sales \- service
Sales "-" service
-Hoss
---
: Chris, I was measuring the query time only. I have increased the heap size
that's still doesn't tell us what you are doing -- "query time" can mean a
lot of things ... are you using the Hits class? are you iterating over
results? are you pulling out stored fields? are you sorting? are you using
What about implementing a scoring policy that computes the score based
only on which word position the term is matched?
If the match occurred in the first word position, the score should be
highest, if in the second word position it would be least highest etc..
Finally for matches that share th
For various reasons, we'd like to eliminate the sort step.
Our current query interface takes a start time and end time as an input range:
RangeFilter rf = new RangeFilter("day", start, end, true, true);
hits = searcher.search(query,rf,new Sort(new SortField[]{
: When I index the whole website, then when I type a title of a document I
: have like 60 to 70 % as score.
: When I index only one page, then when I type the title I have like 2% as
: score.
I don't know what Regain is ... but this sounds like some issue between
how it reports the scores Lucene
: What about implementing a scoring policy that computes the score based
: only on which word position the term is matched?
if you wrote your own Similarity class and used SpanFirst queries that
should be possible. It's the same basic principle as a Similarity
that scores entirely by tf, except
Even if I get what a I want using the coord method, I would still have the
same problem becuase the similarity would return a number > 1 and
afterwards, the scoring mechanisms would normilize these number to something
<1.0
Thank you!
Vagelis
Otis Gospodnetic wrote:
>
> Jumping in at this point
Thank you. Lucene documentation is vague on this subject.
On the LIA-book -earch powered by Lucene it seems the "-" operator works as a
prohibitor regardless of the number of spaces after the "-". Still can't tell
if this is a bug or by design.
A Nutch parser, however, seems to have changed t
So the normalization was made through Hits. That was something I didn't
understand.
If I was alone I would search in Scorer and query classes.
Thank you for this.
Finally I used the following:
final HitQueue hq = new HitQueue(results.length());
searcher.search(qr, new HitCollector
Here is the code. Let me know if you need any clarification
// MaxConcepts is set to 100
long stTime = System.currentTimeMillis();
// bq is the Boolean query constructed out of the title of the query
document
TopDocs docs = searcher.search(bq, null, MaxConcepts);
// Store the title of the resu
23 matches
Mail list logo