Copying a part of index and index structure

2008-06-18 Thread Anshum
I have 2 indexes and I would like to move index for a few 'selected' and 'specified' terms from one of the indexes to the other. Would some one have an idea on how to do it? Actually, I am looking at splitting my index on keywords (terms) and would like a single index be distributed over 2 smaller

Re: Snowball Analyzer and apostrophes

2008-06-18 Thread Erick Erickson
This is tricky If you strip the apostrophe, you'd get interesting results from O'brien, depending upon how you stripped it (i.e. closed up the word to Obrien or substituted a space, e.g. O brien). We've generally had the fewest surprises by closing up apostrophes (i.e. Obrien, Charlies).

Getting irrelevant results using fuzzy query

2008-06-18 Thread László Monda
Hi List, I've been redirected from [EMAIL PROTECTED] to here to discuss my issue. -- My original email -- I try to provide relevant results for the users of a lyrics site, even in the case of misspellings by indexing artist and songs with Lucene. The problem is that Lucene

indexing unsupported mime types using Lucene

2008-06-18 Thread Gaurav Sharma
Hi, I am using Lucene for indexing and searching the documents. Its working file for supported documents. Now i want to index documents with unsupported mime types. Right now i am using LIUS which is built over Lucene for indexing the documents. Is there any tool which I can use for indexing the

Lucene Search Very Slow

2008-06-18 Thread Sebastin
Hi All, I need to fetch approximately 225 GB of Index Store records in a web page .the total time to fetch the record and display to the user takes 10 minutes.is it possible to reduce the time to milliseconds sample code snippet: IndexReader[] readArray = { indexIR1,

Re: Lucene Search Very Slow

2008-06-18 Thread Toke Eskildsen
On Wed, 2008-06-18 at 07:17 -0700, Sebastin wrote: I need to fetch approximately 225 GB of Index Store records in a web page .the total time to fetch the record and display to the user takes 10 minutes.is it possible to reduce the time to milliseconds Depends on your indexes and your queries,

Re: Getting irrelevant results using fuzzy query

2008-06-18 Thread Daniel Naber
On Mittwoch, 18. Juni 2008, László Monda wrote: Since fuzzy searching is based on the Levenshtein distance, the distance between coldplay and coldplay is 0 and the distance between coldplay and downplay is 3 so how on earth is possible that when searching for coldplay, Lucene returns

Re: Getting irrelevant results using fuzzy query

2008-06-18 Thread Daniel Naber
On Mittwoch, 18. Juni 2008, László Monda wrote: Additional info: Lucene seems to do the right thing when only few documents are present, but goes crazy when there is about 1.5 million documents in the index. Lucene works well with more documents (currently using it with 9 million). but the

Re: Getting irrelevant results using fuzzy query

2008-06-18 Thread markharw00d
This looks like it is related to an issue I first raised here: http://markmail.org/message/37ywsemfudpos6uh At the time I identified 2 issues with FuzzyQuery - that the usual coord and idf scoring factors shouldn't be applied to fuzzy queries. The coord factor got fixed but idf remains an

Re: Displaying and highlighting results from a Wild Card and Fuzzy search using Lucene in Java

2008-06-18 Thread syedfa
Thanks so much for your responses, I have it figured out: Query parser=new WildcardQuery(new Term(LINES, the*)); parser=parser.rewrite(IndexReader.open(fsDir)); and I was able to get my results highlighted for both WildCard and Fuzzy searches. Thanks for the responses. Sincerely;

Improving search performance with the results returned

2008-06-18 Thread syedfa
Dear Fellow Java/Lucene developers: I want to know if there is a way to improve the efficiency of doing a search using lucene such that when a user does a search, and should there be hundreds of hits, by paging the results for the user, provide only the best 20 hits first (like google). If the