Deleting from distributed index

2007-07-08 Thread Amadeous
Dear All I want to use Lucene in a multi-server architecture. The built-in rmi implementation of Lucene helped me to search in my servers concurrently: I have a gateway machine, and queries are handed to this machine. Then it asks the query from server machines and returns the aggregative results

Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-08 Thread Jong Kim
Hi, The MoreLikeThis class in Lucene's contrib/queries project performs noise word filtering based on the case-sensitive comparison of the terms against the user-supplied stopwords set. I need this comparison to be case-insensitive, but I don't see any way of achieving it by extending this cla

Re: problems with deleteDocuments

2007-07-08 Thread Erick Erickson
First, let me say that I ran a few tests to determine the behavior, so it's entirely possible someone who actually understands the code will tell me I'm all wet. The problem here is that for every scenario in which deleting on partial field matches would be good, I can create one where it would b

Using KeywordAnalyzer with stop word filter

2007-07-08 Thread Kai Weber
Hello, I want to parse my query string as follows: * filter out stop words (from GermanAnalyzer) * ignore every string field:foo * search words exactly as written (key_word not "key word") on a certain field Example (using english words): Querystring: how cool is a crazyanalyzer -test -baz:foo b

Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-08 Thread Chris Hostetter
: I need this comparison to be case-insensitive, but I don't see any way of : achieving it by extending this class. I would have created a subclass of : MoreLikeThis and override the isNoiseWord() method. However, the problem is : that, neither isNoiseWord() method nor the instance variables refer

Re: product based term combination for BooleanQuery?

2007-07-08 Thread Chris Hostetter
: At index time, I used a per document boost (over all fields) and a per : field bost (over all documents). I can certainly factor out the first : into a query boost, but I was under the impression that if I ever wanted : to combine fields (eg to index all "name" "alias" and "title" data in a : si

RE: Too Many Open files Exception

2007-07-08 Thread Chris Hostetter
: Issuing a "limit descriptors", I see that I have it set to 1024 : In the directory that I'm getting this particular error: 3 : I have 24 different index directories... I think the most I saw at that : particular time in any one index was 20 as i said ... it doesn't matter where in the code you

RE: Too Many Open files Exception

2007-07-08 Thread Chris Hostetter
: Ok... after spending time looking at the code... I see that a method is : not closing a TokenStream in one of the classes (a class that is : instantiated quite often) - I would imagine this could quite possibly be : the culprit? can you be more specific about the code in question? I'm not sure

Search that supports all valid characters in a Unix filename

2007-07-08 Thread Ed Murray
Could someone let me know the best Analyzer to use to get an exact match on a Unix filename when it is inserted into an untokened field. Filenames obviously contain spaces and forward slashes along with other characters. I am using a WhitespaceAnalyzer but when the query is parsed it is chopp