Re: Check to see if index is optimized

2005-01-07 Thread Mike Snare
Based on the method sent earlier, it looks like Lucene first checks to see if optimization is even necessary. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Check to see if index is optimized

2005-01-07 Thread Mike Snare
> If an index has no deletions, it does not need to be optimized. You can > find out if it has deletions with IndexReader.hasDeletions. Is that true? An index that has just been created (with no deletions) can still have multiple segments that could be optimized. I'm not sure your statement is c

Re: Search not working properly. Bug !!!!!!

2004-12-30 Thread Mike Snare
You appear to be searching for the word "Engineer" in the "name" field. Shouldn't this query be directed at the "designation" field? The only terms in the name field would be "Ebrahim", "Faisal", "John", and "Smith", wouldn't they? On Thu, 30 Dec 2004 22:06:46 +0530, Mohamed Ebrahim Faisal <[EM

Re: retrieve tokens

2004-12-22 Thread Mike Snare
> But for the other issue on 'store lucene' vs 'store db'. Does anyone can > provide me with some field experience on size? > The system I'm developing will provide searching through some 2000 > pdf's, say some 200 pages each. I feed the plain text into Lucene on a > Field.UnStored bases. I also st

Re: Indexing terms only

2004-12-22 Thread Mike Snare
Thanks for correcting me. I use the reader version -- hence my confusion. -Mike On Wed, 22 Dec 2004 11:53:31 -0500, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > On Dec 22, 2004, at 11:36 AM, Mike Snare wrote: > > Whether or not the text is stored in the index is a different co

Re: Indexing terms only

2004-12-22 Thread Mike Snare
I've never used the german analyzer, so I don't know what stop words it defines/uses. Someone else will have to answer that. Sorry On Wed, 22 Dec 2004 17:45:17 +0100, DES <[EMAIL PROTECTED]> wrote: > I actually use Field.Text(String,String) to add documents to my index. Maybe > I do not understa

Re: Indexing terms only

2004-12-22 Thread Mike Snare
Whether or not the text is stored in the index is a different concern that how it is analyzed. If you want the text to be indexed, and not stored, then use the Field.Text(String, String) method or the appropriate constructor when adding a field to the Document. You'll need to also store a referen

Re: Relevance percentage

2004-12-20 Thread Mike Snare
I'm still new to Lucene, but wouldn't that be the coord()? My understanding is that the coord() is the fraction of the boolean query that matched a given document. Again, I'm new, so somebody else will have to confirm or deny... -Mike On Mon, 20 Dec 2004 00:33:21 -0800 (PST), Gururaja H <[EMAI

Re: Why does the StandardTokenizer split hyphenated words?

2004-12-16 Thread Mike Snare
Absolutely, but -- correct me if I'm wrong -- it would give no higher ranking to half-baked and would take a good deal longer on large indices. On Thu, 16 Dec 2004 20:03:27 +0100, Daniel Naber <[EMAIL PROTECTED]> wrote: > On Thursday 16 December 2004 13:46, Mike Snare wrote: >

Re: Why does the StandardTokenizer split hyphenated words?

2004-12-16 Thread Mike Snare
> Not if these words are spelling variations of the same concept, which > doesn't seem unlikely. > > > In addition, why do we assume that a-1 is a "typical product name" but > > a-b isn't? > > Maybe for "a-b", but what about English words like "half-baked"? Perhaps that's the difference in think

Re: Why does the StandardTokenizer split hyphenated words?

2004-12-15 Thread Mike Snare
> a-1 is considered a typical product name that needs to be unchanged > (there's a comment in the source that mentions this). Indexing > "hyphen-word" as two tokens has the advantage that it can then be found > with the following queries: > hypen-word (will be turned into a phrase query internally)

Why does the StandardTokenizer split hyphenated words?

2004-12-15 Thread Mike Snare
I am writing a tool that uses lucene, and I immediately ran into a problem searching for words that contain internal hyphens (dashes). After looking at the StandardTokenizer, I saw that it was because there is no rule that will matchor . Based on what I can tell from the source, every other