Sorry, I wasn't specific enough. I meant the SpanScorer in the contrib
highlighter package - they are very different.
I assume the latest Highlighter package has been ported to Lucene .NET,
but if not I know of a guy that ported the SpanScorer stuff
to C# a while back. I might be able to get my
I use the Lucene.NET implementation. (2.3)
There is a Lucene.Net.Search.Spans.SpanScorer class, but it's not public I
assume I'd have to use it as a base class for my own.
Do you have a simple example on how, in Java, to use the SpanScorer to get a
highlighter to return only fragments that are par
A normal Lucene index should be able to handle it.
As long as no frequent insert/update, which can sometimes cause hiccups for
large indexes, one index is enough.
If your customer numbers keep growing, you will need to have one index for
each customer, which isn't that difficult really, especiall
Take a look at Solr - it should be able to handle the scale you describe. My
suggestion is not to partition indexes unless absolutely have to.
- Original Message
From: "spr...@gmx.eu"
To: java-user@lucene.apache.org
Sent: Saturday, February 14, 2009 10:27:58 AM
Subject: RE: Multiple
Hi,
> You get one answer if each document is 1K, another if it's
> 1G. If you have 2 users or 10,000 users. If you require
> 100 queries/sec response time or 1 query can take 10
> seconds. If you require an update to the index every
> second or month...
Each doc has up to 10 A4 pages text.
There
Define efficiency. Define document. Define user. Define
This kind of question is unanswerable except in gross
generalities unless you take the time to provide details.
You get one answer if each document is 1K, another if it's
1G. If you have 2 users or 10,000 users. If you require
100 querie
Hi,
We have have an application which manages the data of multiple customers.
A customer can only search its own data, never the data of other customers.
So what is more efficent in respect of performance and resources:
One big single index filtered by an index field (customer-Id) or multiple
sm
You probably only want to use Ngrams for the text fields, leaving the
user name field untokenized. As for loosing text field words less than
3 characters long: consider letting them through, perhaps by
implementing a filter that pass longer word to an Ngram filter while
you just return the