Hi Steven,
Thanks for your clarification.
I am using the Searcher.search(query, filter, n, sort) method.
I presume this method doesn't have the same problem, since I already
pass it the max number of results returned.
Regards,
Cedric
On 8/15/07, Steven Rowe <[EMAIL PROTECTED]> wrote:
> Hi Cedr
>
> Some options:
> 1) Try minimise leaping around the disk - maybe sorting your selected terms
> will help. Look at methods in TermEnum and TermDocs which you can use to
> build your own bitset from your (sorted) list of terms.
Thanks, I'll try this method.
> 2) Can you add higher-level terms
Hi
I am using WhitespaceAnalyzer and the query is " icdCode:H* " but there is
no result however I know that there are many documents with this field value
such as H20, H20.5 etc. this field is tokenized and indexed what is
wrong with this?
when I test this query with Luke it will return no res
Could be normalized relative to the max score among the matching documents -
but I realize that this can only be done AFTER collecting the documents (as
the Hits class does currently). It could also be normalized to some
"absolute relevance score" that is comparable across queries, but there is
no
I added this under Use Cases. Thanks for the suggestion.
Peter
On 8/13/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
>
> There is also a Use Cases item on the Wiki...
>
> On Aug 13, 2007, at 3:26 PM, Peter Keegan wrote:
>
> > I suppose it could go under performance or HowTo/Interesting uses of
14 aug 2007 kl. 21.34 skrev John Paul Sondag:
What exactly is a RAMDirectory, I didn't see it mentioned on that
page. Is
there example code of using it? Do I just create a Ram Directory
and then
use it like it's a normal directory?
Yes, it is just like FSDirectory, but resides in RAM a
Lukas,
One last thing, be sure to log only when a user clicks on a result
and in Hadoop document_id will be a key in the map phase.
Lucene related steps are the same.
Best,
Peter W.
On Aug 14, 2007, at 1:28 PM, Peter W. wrote:
When users perform
a search, log the unique document_id, IP add
Hey Lukas,
You can get a basic demo of this working in Lucene
first then make a more advanced and efficient version.
First, give each document in your index a score field
using NumberTools so it's sortable. When users perform
a search, log the unique document_id, IP address and
result position f
: [1] I need to rank matches by some combination of keyword match, popularity
: and recency of the doc. I read the docs about CustomScoreQuery and seems to
: be a resonable fit. An alternate way of achieving my goals is to use a
: custom sort. What are the trade-offs between these two approaches?
: Thanks for pointing me at the DisjunctionMaxQuery, though you're
: correct, this is close but not exactly what I want.
:
: I think the difference lies in that it's not which subexpression had
: the greater score, but that a normally lower scoring document should
: get its rank elevated becaus
Hello again,
The file are local, sorry for using the confusing /mounts, I can see
where that is confusing.
What exactly is a RAMDirectory, I didn't see it mentioned on that page. Is
there example code of using it? Do I just create a Ram Directory and then
use it like it's a normal director
On Aug 14, 2007, at 11:57 AM, Walt Stoneburner wrote:
Grant,
Thanks for pointing me at the DisjunctionMaxQuery, though you're
correct, this is close but not exactly what I want.
I think the difference lies in that it's not which subexpression had
the greater score, but that a normally low
Wow Mark, quite the hint. Thanks so much.
Spencer
-Original Message-
From: Mark Miller [mailto:[EMAIL PROTECTED]
Sent: August 14, 2007 12:07 PM
To: java-user@lucene.apache.org
Subject: Re: MultiSearcher with mulitple filter
Here is a hint:
package org.apache.lucene.search;
import jav
Here is a hint:
package org.apache.lucene.search;
import java.io.IOException;
/**
* Implements search over a set of Searchables using
multiple filters.
*/
public class MultiFilterMultiSearcher extends MultiSearcher {
public MultiFilterMultiSearcher(Searchable[] searchables)
throws
Hi List,
Thanks in advance for the help. I can't wrap my head around the
MultiSearcher. I need to search across multiple indexes, but also need to
filter documents from users based on Access. The problem seems to be that
MultiSearcher takes in 1 filter, however my filter varies from one index t
Hi Cedric,
Cedric Ho wrote:
> On 8/13/07, Erick Erickson <[EMAIL PROTECTED]> wrote:
>> Are you iterating through a Hits object that has more than
>> 100 (maybe it's 200 now) entries? Are you loading each document that
>> satisfies the query? Etc. Etc.
>
> Unfortunately, yes. And I know this is an
Grant,
Thanks for pointing me at the DisjunctionMaxQuery, though you're
correct, this is close but not exactly what I want.
I think the difference lies in that it's not which subexpression had
the greater score, but that a normally lower scoring document should
get its rank elevated because i
Hi
We have a system like 'google news'. We currently parse and index
over 180.000 headlines per day. One month data is 10Gb and the
indexation process takes 2 hours +/-, the index size is 6Gb +/-
(We're using mergeFactor 40, setMaxBufferedDocs 10,
setRAMBufferSizeMb 500 and useCompou
Thomas Arni a écrit :
> Hello Luceners
>
> I have started a new project and need to index pdf documents.
> There are several projects around, which allow to extract the content,
> like pdfbox, xpdf and pjclassic.
>
> As far as I studied the FAQ's and examples, all these
> tools allow simple text ex
Hi Rohit,
The way I showed you doesn't suit your need, because FieldNormModifier
should be used for modifying all fieldNorm values of the field specified
at the
command line parameter in batch mode.
You can have an extra field other than content and register the point
to the field. Then use Fu
Thanks for the help,
please provide the code to do that.
I tried with this one but it didn't work:
Query filterQuery = MultiFieldQueryParser.parse(new String{query1, query2,
query3, query4, }, new String{field1, field2, field1, field2, ... },
new KeywordAnalyzer());
this results in:
field
>>Do u mean it will count the number of documents for each publication source ?
Lucene does that for all terms. The Luke plugin simply offers a visualisation
of the variance in term frequencies for a field. It looks something like this:
http://www.ucl.ac.uk/~ucbplrd/zipf.png
>>each set can be
Hi koji,
please give me an example. Let me explain what I want to do:
I have indexed some documents. Now I want to update the ranking of the
documents based on following criteria:
1.) The documents which come into search result should get one point
2.) The documents which are viewed by the user s
Hi,
I want to define different implementations for the functions in
Similarity class. For example i need to define sloppyFreq() different
for fields "foo" and "bar". Is there a way around this?
Also i wonder why the fieldname is passed to some of the functions in
Similarity (such as Similari
24 matches
Mail list logo