Re: Fragment Highlighter Phrase?

2009-02-14 Thread Mark Miller
Sorry, I wasn't specific enough. I meant the SpanScorer in the contrib highlighter package - they are very different. I assume the latest Highlighter package has been ported to Lucene .NET, but if not I know of a guy that ported the SpanScorer stuff to C# a while back. I might be able to get my

Re: Fragment Highlighter Phrase?

2009-02-14 Thread Ian Vink
I use the Lucene.NET implementation. (2.3) There is a Lucene.Net.Search.Spans.SpanScorer class, but it's not public I assume I'd have to use it as a base class for my own. Do you have a simple example on how, in Java, to use the SpanScorer to get a highlighter to return only fragments that are par

Re: Multiple indexes vs single index

2009-02-14 Thread Chris Lu
A normal Lucene index should be able to handle it. As long as no frequent insert/update, which can sometimes cause hiccups for large indexes, one index is enough. If your customer numbers keep growing, you will need to have one index for each customer, which isn't that difficult really, especiall

Re: Multiple indexes vs single index

2009-02-14 Thread Shashi Kant
Take a look at Solr - it should be able to handle the scale you describe. My suggestion is not to partition indexes unless absolutely have to. - Original Message From: "spr...@gmx.eu" To: java-user@lucene.apache.org Sent: Saturday, February 14, 2009 10:27:58 AM Subject: RE: Multiple

RE: Multiple indexes vs single index

2009-02-14 Thread spring
Hi, > You get one answer if each document is 1K, another if it's > 1G. If you have 2 users or 10,000 users. If you require > 100 queries/sec response time or 1 query can take 10 > seconds. If you require an update to the index every > second or month... Each doc has up to 10 A4 pages text. There

Re: Multiple indexes vs single index

2009-02-14 Thread Erick Erickson
Define efficiency. Define document. Define user. Define This kind of question is unanswerable except in gross generalities unless you take the time to provide details. You get one answer if each document is 1K, another if it's 1G. If you have 2 users or 10,000 users. If you require 100 querie

Multiple indexes vs single index

2009-02-14 Thread spring
Hi, We have have an application which manages the data of multiple customers. A customer can only search its own data, never the data of other customers. So what is more efficent in respect of performance and resources: One big single index filtered by an index field (customer-Id) or multiple sm

Re: Partial / starts with searching

2009-02-14 Thread Karl Wettin
You probably only want to use Ngrams for the text fields, leaving the user name field untokenized. As for loosing text field words less than 3 characters long: consider letting them through, perhaps by implementing a filter that pass longer word to an Ngram filter while you just return the