Maybe I should add that I am currently using Lucene 2.0. From other
threads I get the impression that this might be solved in Lucene 2.1.
-Original Message-
From: Andreas Guther [mailto:[EMAIL PROTECTED]
Sent: Thursday, May 10, 2007 10:33 PM
To: java-user@lucene.apache.org
Subject: How t
Hi,
Thanks for your comments!
I was thinking that there could be some method based on frequency and
linguistic research. So far it seems that manually choosen set of words is
very common approach but this leaves some questions opened in my mind.
I am not a native english speaker but I think that
Hi,
How can I re-use an IndexSearcher and keep track of changes to the
index?
I am dealing with Index Directories of several GB. Opening and
IndexSearcher is very expensive and can take several seconds. Therefore
I am caching the IndexSearcher for re-use.
Our indexes are frequently updated.
There is a handy class in contrib/misc.../ that will show you the most frequent
terms in an index. Handy dandy.
Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/ - Tag - Search - Share
- Original Message
From: Lukas Vlcek <[EMAIL PROT
Also, from the empirical side, have a look at Luke (after indexing w/
o any stopwords, or just the standard ones) and see what the most
common terms are and see if they are meaningful or not in the context
of your application.
-Grant
On May 10, 2007, at 7:41 PM, Doron Cohen wrote:
See al
I've thought about a flag field, and I see no reason why that wouldn't
work quite well, it all depends, I suppose, upon how ugly it would
eventually get
But about caching, what does making a filter have to do with Lucene
caching? Sure, there exist Lucene filter caching classes, but
there's no
See also en.wikipedia.org/wiki/Stop_words and
www.ranks.nl/tools/stopwords.html
karl wettin <[EMAIL PROTECTED]> wrote on 10/05/2007 13:57:33:
>
> 10 maj 2007 kl. 20.39 skrev Lukas Vlcek:
>
> > Can anybody point me to some references how to create an ideal set
> > of stop
> > words? I konw that
Unfortuantely at the moment we don't make good use of lucene caching, so
the setting up of the filter on startup doesn't really work for us at
the moment. Maybe just a general flag field instead of a hasname field
would work better and be more general. You could just fill this field
with any
I think this is a simple question; or dont know. Is there a way to
automatically convert all tokens to wildcard query with any given input.
ie, if I enter 'n' it will convert that to 'n*'. Also, I am using multiple
fields, so this is how I presently have it.
MultiFieldQueryParser parser = new
Here's a way to do it using the XML query parser in contrib
1) Create this query.xsl file (note use of cached double negative filter)
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
upperTerm="z"/>
I was going to suggest something about TermEnum/TermDocs, but
upon reflection that doesn't work so well because you have to
enumerate all the terms over all the docs for a field. Ouch.
But one could combine the two approaches. Don't index
any "special" values in your firstname or lastname fields.
10 maj 2007 kl. 20.39 skrev Lukas Vlcek:
Can anybody point me to some references how to create an ideal set
of stop
words? I konw that this is more like a theoretical question but how do
Luceners determine which words shuold be excluded when creating
Analyzers
for a new languages?
The id
Would a good solution be to insert a secret string into blank fields
that represents blank. That way you could search for:
firstname:(-Xd8fgrSjg) lastname:(-Xd8fgrSjg) some query string
Les
Les Fletcher wrote:
I like the idea of the filter since I am making heavy use of filters
for this par
10 maj 2007 kl. 21.28 skrev Yonik Seeley:
Deleted documents are removed on segment merges (for documents marked
as deleted in those segments).
Due to the nature of an inverted index, it's impossible w/o going over
the complete index looking for all references to that docid.
What about alte
Of course, that doesn't have to be the case. It would be a trivial
change to merge segments and not remove the deleted docs. That
usecase could be useful in conjunction with ParallelReader.
If the behavior of deleted docs during merging or optimization ever changes,
please make this configurab
On 5/10/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:
Deleted documents are removed on segment merges (for documents marked
as deleted in those segments).
Of course, that doesn't have to be the case. It would be a trivial
change to merge segments and not remove the deleted docs. That
usecase co
On 5/10/07, karl wettin <[EMAIL PROTECTED]> wrote:
I really want to use document numbers as a secondary key in my object
storage. If I got it all right, the main problem is deleted documents
and optimization. Are there any other issues?
Deleted documents are removed on segment merges (for docum
: But how can you use both the MissingStringLastComparatorSource and also your
: own custom SortComparator (i.e. having a custom getComparable() method)?
:
: I have tried the obvious, which was to make my custom SortComparator extend
: MissingStringLastComparatorSource instead of SortComparator. B
I like the idea of the filter since I am making heavy use of filters for
this particular query, but how would one go about constructing it
efficiently at query time? All I can see is hacking around not being
able to use the * as the first character.
Les
Erick Erickson wrote:
You could crea
Hi,
Can anybody point me to some references how to create an ideal set of stop
words? I konw that this is more like a theoretical question but how do
Luceners determine which words shuold be excluded when creating Analyzers
for a new languages? And which technique was used for validation of stop
I really want to use document numbers as a secondary key in my object
storage. If I got it all right, the main problem is deleted documents
and optimization. Are there any other issues?
All my tests tells me optimization does this:
Legend:
action
docNum doc.toString()
for (int i=0; i<4; i
You could create a Lucene Filter that had a bit for each document that
had a first or last name and use that at query time to restrict your
results appropriately. You could create this at startup time or at
query time. See CachingWrapperFilter for a way to cache it.
Another approach would be to
I have a question about empty fields. I want to run a query that will
search against a few particular fields for the query term but then also
also check to see if a two other fields have any value at all. i.e., I
want to search for a set records but don't want to return a record if
that recor
"Andreas Guther" <[EMAIL PROTECTED]> wrote:
> I opened an issue: https://issues.apache.org/jira/browse/LUCENE-877
Thanks for opening this issue, Andreas. I've fixed this in the trunk so
it will be in Lucene 2.2.
Mike
-
To unsu
24 matches
Mail list logo