RE: How to re-open the IndexSearcher's IndexReader

2007-05-10 Thread Andreas Guther
Maybe I should add that I am currently using Lucene 2.0. From other threads I get the impression that this might be solved in Lucene 2.1. -Original Message- From: Andreas Guther [mailto:[EMAIL PROTECTED] Sent: Thursday, May 10, 2007 10:33 PM To: java-user@lucene.apache.org Subject: How t

Re: Stop words (how to create ideal set of stop words?)

2007-05-10 Thread Lukas Vlcek
Hi, Thanks for your comments! I was thinking that there could be some method based on frequency and linguistic research. So far it seems that manually choosen set of words is very common approach but this leaves some questions opened in my mind. I am not a native english speaker but I think that

How to re-open the IndexSearcher's IndexReader

2007-05-10 Thread Andreas Guther
Hi, How can I re-use an IndexSearcher and keep track of changes to the index? I am dealing with Index Directories of several GB. Opening and IndexSearcher is very expensive and can take several seconds. Therefore I am caching the IndexSearcher for re-use. Our indexes are frequently updated.

Re: Stop words (how to create ideal set of stop words?)

2007-05-10 Thread Otis Gospodnetic
There is a handy class in contrib/misc.../ that will show you the most frequent terms in an index. Handy dandy. Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Original Message From: Lukas Vlcek <[EMAIL PROT

Re: Stop words (how to create ideal set of stop words?)

2007-05-10 Thread Grant Ingersoll
Also, from the empirical side, have a look at Luke (after indexing w/ o any stopwords, or just the standard ones) and see what the most common terms are and see if they are meaningful or not in the context of your application. -Grant On May 10, 2007, at 7:41 PM, Doron Cohen wrote: See al

Re: query syntax question

2007-05-10 Thread Erick Erickson
I've thought about a flag field, and I see no reason why that wouldn't work quite well, it all depends, I suppose, upon how ugly it would eventually get But about caching, what does making a filter have to do with Lucene caching? Sure, there exist Lucene filter caching classes, but there's no

Re: Stop words (how to create ideal set of stop words?)

2007-05-10 Thread Doron Cohen
See also en.wikipedia.org/wiki/Stop_words and www.ranks.nl/tools/stopwords.html karl wettin <[EMAIL PROTECTED]> wrote on 10/05/2007 13:57:33: > > 10 maj 2007 kl. 20.39 skrev Lukas Vlcek: > > > Can anybody point me to some references how to create an ideal set > > of stop > > words? I konw that

Re: query syntax question

2007-05-10 Thread Les Fletcher
Unfortuantely at the moment we don't make good use of lucene caching, so the setting up of the filter on startup doesn't really work for us at the moment. Maybe just a general flag field instead of a hasname field would work better and be more general. You could just fill this field with any

Simple, always do wildcard or fuzzy query

2007-05-10 Thread bbrown
I think this is a simple question; or dont know. Is there a way to automatically convert all tokens to wildcard query with any given input. ie, if I enter 'n' it will convert that to 'n*'. Also, I am using multiple fields, so this is how I presently have it. MultiFieldQueryParser parser = new

Re: query syntax question

2007-05-10 Thread markharw00d
Here's a way to do it using the XML query parser in contrib 1) Create this query.xsl file (note use of cached double negative filter) xmlns:xsl="http://www.w3.org/1999/XSL/Transform";> upperTerm="z"/>

Re: query syntax question

2007-05-10 Thread Erick Erickson
I was going to suggest something about TermEnum/TermDocs, but upon reflection that doesn't work so well because you have to enumerate all the terms over all the docs for a field. Ouch. But one could combine the two approaches. Don't index any "special" values in your firstname or lastname fields.

Re: Stop words (how to create ideal set of stop words?)

2007-05-10 Thread karl wettin
10 maj 2007 kl. 20.39 skrev Lukas Vlcek: Can anybody point me to some references how to create an ideal set of stop words? I konw that this is more like a theoretical question but how do Luceners determine which words shuold be excluded when creating Analyzers for a new languages? The id

Re: query syntax question

2007-05-10 Thread Les Fletcher
Would a good solution be to insert a secret string into blank fields that represents blank. That way you could search for: firstname:(-Xd8fgrSjg) lastname:(-Xd8fgrSjg) some query string Les Les Fletcher wrote: I like the idea of the filter since I am making heavy use of filters for this par

Re: optimization behaviour

2007-05-10 Thread karl wettin
10 maj 2007 kl. 21.28 skrev Yonik Seeley: Deleted documents are removed on segment merges (for documents marked as deleted in those segments). Due to the nature of an inverted index, it's impossible w/o going over the complete index looking for all references to that docid. What about alte

Re: optimization behaviour

2007-05-10 Thread Peter Keegan
Of course, that doesn't have to be the case. It would be a trivial change to merge segments and not remove the deleted docs. That usecase could be useful in conjunction with ParallelReader. If the behavior of deleted docs during merging or optimization ever changes, please make this configurab

Re: optimization behaviour

2007-05-10 Thread Yonik Seeley
On 5/10/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: Deleted documents are removed on segment merges (for documents marked as deleted in those segments). Of course, that doesn't have to be the case. It would be a trivial change to merge segments and not remove the deleted docs. That usecase co

Re: optimization behaviour

2007-05-10 Thread Yonik Seeley
On 5/10/07, karl wettin <[EMAIL PROTECTED]> wrote: I really want to use document numbers as a secondary key in my object storage. If I got it all right, the main problem is deleted documents and optimization. Are there any other issues? Deleted documents are removed on segment merges (for docum

Re: Sorting on a field that can have null values

2007-05-10 Thread Chris Hostetter
: But how can you use both the MissingStringLastComparatorSource and also your : own custom SortComparator (i.e. having a custom getComparable() method)? : : I have tried the obvious, which was to make my custom SortComparator extend : MissingStringLastComparatorSource instead of SortComparator. B

Re: query syntax question

2007-05-10 Thread Les Fletcher
I like the idea of the filter since I am making heavy use of filters for this particular query, but how would one go about constructing it efficiently at query time? All I can see is hacking around not being able to use the * as the first character. Les Erick Erickson wrote: You could crea

Stop words (how to create ideal set of stop words?)

2007-05-10 Thread Lukas Vlcek
Hi, Can anybody point me to some references how to create an ideal set of stop words? I konw that this is more like a theoretical question but how do Luceners determine which words shuold be excluded when creating Analyzers for a new languages? And which technique was used for validation of stop

optimization behaviour

2007-05-10 Thread karl wettin
I really want to use document numbers as a secondary key in my object storage. If I got it all right, the main problem is deleted documents and optimization. Are there any other issues? All my tests tells me optimization does this: Legend: action docNum doc.toString() for (int i=0; i<4; i

Re: query syntax question

2007-05-10 Thread Erick Erickson
You could create a Lucene Filter that had a bit for each document that had a first or last name and use that at query time to restrict your results appropriately. You could create this at startup time or at query time. See CachingWrapperFilter for a way to cache it. Another approach would be to

query syntax question

2007-05-10 Thread Les Fletcher
I have a question about empty fields. I want to run a query that will search against a few particular fields for the query term but then also also check to see if a two other fields have any value at all. i.e., I want to search for a set records but don't want to return a record if that recor

RE: Locking in Lucene 2.1

2007-05-10 Thread Michael McCandless
"Andreas Guther" <[EMAIL PROTECTED]> wrote: > I opened an issue: https://issues.apache.org/jira/browse/LUCENE-877 Thanks for opening this issue, Andreas. I've fixed this in the trunk so it will be in Lucene 2.2. Mike - To unsu