Hello all,
I am looking for a strategy to exclude duplicate entries when searching
multiple indexes which may contain the same document. I have an email
system which archives and indexes emails on a per-recipient basis. So, each
email recipient has their own index. In the case where the same
On Sun, 2005-01-23 at 22:09 -0800, Otis Gospodnetic wrote:
A number of people have tried putting Lucene indices in RDBMS. As far
as I know, all were slower than FSDirectory.
Do you know if the Berkeley DB back end also has a performance hit?
--
Miles Barr [EMAIL PROTECTED]
Runtime
On Jan 24, 2005, at 09:14, Jason Polites wrote:
I am aware of the Filter object however the unique identifier of my
document is a field within the lucene document itself (messageid); and
I am reluctant to access this field using the public API for every Hit
as I fear it will have drastic
Do stemming algorithms take into consideration abbreviations too? Some
examples:
mg = milligrams
US = United States
CD = compact disc
vcr = video casette recorder
And, the next logical question, if stemming does not take care of
abbreviations, are there any solutions that include abbreviations
On Jan 24, 2005, at 7:24 AM, Kevin L. Cobb wrote:
Do stemming algorithms take into consideration abbreviations too?
No, they don't. Adding abbreviations, aliases, synonyms, etc is not
stemming.
And, the next logical question, if stemming does not take care of
abbreviations, are there any
I spent some time reading the Lucene in Action book this weekend (great job,
btw), and came across the section on using custom filters. Since the data
that I need to use to filter my hit set with comes from a database, I
thought it would be worth my effort this morning to write a custom filter
On Sun, 2005-01-23 at 22:09 -0800, Otis Gospodnetic wrote:
A number of people have tried putting Lucene indices in RDBMS. As far
as I know, all were slower than FSDirectory.
Do you know if the Berkeley DB back end also has a performance hit?
Try it, it all depends on how you configure it. And
Jerry,
On Monday 24 January 2005 18:26, Jerry Jalenak wrote:
I spent some time reading the Lucene in Action book this weekend (great job,
btw), and came across the section on using custom filters. Since the data
that I need to use to filter my hit set with comes from a database, I
thought it
On Jan 24, 2005, at 12:26 PM, Jerry Jalenak wrote:
I spent some time reading the Lucene in Action book this weekend
(great job,
btw)
Thanks!
public class AccountFilter extends Filter
I see where the AccountFilter is setting the cooresponding 'bits', but
I end
up without any 'hits':
Entering
Paul / Erik -
I'm use the ParallelMultiSearcher to search three indexes concurrently -
hence the three entries into AccountFilter. If I remove the filter from my
query, and simply enter the query on the command line, I get two hits back.
In other words, I can enter this:
smith AND
As Paul suggested, output the Lucene document numbers from your Hits,
and also output which bit you're setting in your filter. Do those sets
overlap?
Erik
On Jan 24, 2005, at 2:13 PM, Jerry Jalenak wrote:
Paul / Erik -
I'm use the ParallelMultiSearcher to search three indexes
Pierrick Brihaye wrote:
Hi,
David Spencer a écrit :
One example of expansion with the synonym boost set to 0.9 is the
query big dog expands to:
Interesting.
Do you plan to add expansion on other Wordnet relationships ? Hypernyms
and hyponyms would be a good start point for thesaurus-like
sheepish-look-on-face/
After re-reading the book (again), and the javadocs (again), it dawned on my
little brain that I needed to have a doc and freq array *the size of
maxDocs* for the index reader. I also needed to iterate through the docs
array and call bitSet.set for each entry in docs (that
Agreed on the set of unique messages, however the problem I have is with
the count of the Hits. The Hits object may contain 100 results (for
example), of which only 90 are unique. Because I am paging through results
10 at a time, I need to know the total count without loading each document.
I am working on a public accessible Struts based species database project
where the number of species names is currently at 2.3 million, and in the
near future will be somewhere nearer 4 million (probably the largest there
is). The species names are typically 1 to 7 words in length, and the
Hi,
do you optimize the index?
Do you tried to implement a own hit collector?
Stefan
Am 25.01.2005 um 01:01 schrieb Peter Hollas:
I am working on a public accessible Struts based species database
project where the number of species names is currently at 2.3 million,
and in the near future will
Hi Peter,
I just got on the list a few hours ago. I am still reading the source code. I
am not going to send this to the list.
I would like to know the .2 sec query time for 2 million fields, should it
display only the first page (100 or so), not the whole 3000 found? It is very
fast I
On Jan 24, 2005, at 7:01 PM, Peter Hollas wrote:
I am working on a public accessible Struts based
Well there's the problem right there :))
(just kidding)
To sort the resultset into alphabetical order, we added the species
names as a seperate keyword field, and sorted using it whilst
querying.
Peter,
Currently we can issue a simple search query and expect a response back
in about 0.2 seconds (~3,000 results)
You may want to try something like the following (I do this in FishEye,
seems to be performant for moderately large field-spaces).
Use a custom HitCollector, and store all the
Hi
Guys
Apologies..
On
STANDALONE Usge of UPDATION/DELETION/ADDITION of Documents into
MergerIndex, the Code of mine
runs
PERFECTLY with out any Problems.
But When the
same Code is plugged into a WEBAPP on TOMCAT with a servlet Running in SINGLE
THREAD MODE,Some times
Hi Karthik,
If you are talking about SingleThreadModel (i.e. your servlet
implements javax.servlet.SingleThreadModel), this does not guarantee
that two different instances of your servlet won't be run at the same
time. It only guarantees that each instance of your servlet will only
be run by one
Hi
Ok Still I have the Exeption in process ,If even I try to have a Servlet
Single Instance [may be by Authentication
processs] , but I made shure that Lucene's MergerIndexing is controlled by
single Initiation...
But With out any Shared Resource's the Exception is popping on Frequently,
22 matches
Mail list logo