Re: Accent Insensitive Search

2008-07-17 Thread Wojtek H
Note that ISOLatin1AccentFilter converts accent characters only from ISO-8859-1 character set. Which means that if you need to convert accents of eastern European languages you need to write your own accent filter. wojtek 2008/7/16 Petite Abeille [EMAIL PROTECTED]: On Jul 16, 2008, at 10:58 AM,

Re: Document ids in Lucene index

2008-04-13 Thread Wojtek H
Thank you for the answer. So it means that I can without any problems iterate over index documents using this algoritm (I don't want to use MatchAllQuery): - check maxDoc() - iterate from 0 to maxDoc() and process doc if it is not deleted Am I right? Best, wojtek 2008/4/12, Chris Hostetter

Document ids in Lucene index

2008-04-09 Thread Wojtek H
Hi all, I am wondering if there are possible holes in set of index documents ids. Being more specific - is it possible that there exist integer i between 0 and IndexReader.maxDoc() such that reader.document(i) == null and reader.isDeleted(i)==false ??? Regards, wojtek

stemming in Lucene

2008-04-01 Thread Wojtek H
Hi all, Snowball stemmers are part of Lucene, but for few languages only. We have documents in various languages and so need stemmers for many languages (in particular polish). One of the ideas is to use ispell dictionaries. There are ispell dicts for many languages and so this solution is good

The best way to iterate over document

2008-03-26 Thread Wojtek H
Hi all, our problem is to choose the best (the fastest) way to iterate over huge set of documents (basic and most important case is to iterate over all documents in the index). Some slow process accesses documents and now it is done via repeating query (for instance MatchAllDocsQuery). It

Re: The best way to iterate over document

2008-03-26 Thread Wojtek H
noticeable difference between the first and last request unless you're doing something like accessing the documents before you get to the first one you expect to return. And a TopDocs should even preserve scoring... Best Erick On Wed, Mar 26, 2008 at 5:48 AM, Wojtek H [EMAIL

Is there a way to speed up boolean query if I don't care about score?

2008-03-26 Thread Wojtek H
Hi all, Suppose my query has normal part for which I want score as usual and other part which is big disjunction (OR) query for which I just want documents to match and don't care about scoring. Is there a way to make it fast? As far as I understand if 'no-score' part was the same in many queries