Re: CachingWrapperFilter: why cache per IndexReader?

2008-01-11 Thread Toke Eskildsen
On Tue, 2008-01-01 at 15:06 -0500, Mark Miller wrote: > Perhaps, in some esoteric case, multiple readers is the right idea > (monster, monster, super IO system, static index?? maybe...)...but > unless you have run into this case and have some data to show it, I > would stick with what the commun

Question about Search formula

2008-01-11 Thread thrgroovyboy
Hi, When I am searching with lucene, the formula takes care of the number of total words in the document. For exemple, an indexed one power-point slide with the term "JAVA" is most relevent than a 50 pages Word document on JAVA. It is a problem for me, the Word document on Java should be most r

RE: Design questions

2008-01-11 Thread spring
Hi, > You could even store all of the page offsets in your > meta-data document > in a special field if you wanted, then lazy-load that field > rather than > dynamically counting. How can I lazy load a field? > You'd have to be careful that your offsets > corresponded to the data *after* it

Re: Design questions

2008-01-11 Thread Erick Erickson
See below On Jan 11, 2008 9:36 AM, <[EMAIL PROTECTED]> wrote: > Hi, > > > > You could even store all of the page offsets in your > > meta-data document > > in a special field if you wanted, then lazy-load that field > > rather than > > dynamically counting. > > How can I lazy load a field? > See

Re: Question about Search formula

2008-01-11 Thread Grant Ingersoll
Have a look at the Similarity class and also the Scoring section of the website (Documentation-> Scoring on the left hand side) This is a classic problem of dealing with TF/IDF and length normalization. Lucene makes general assumptions about what is best, but does allow you to tune as wel

Retrieve the number of deleted documents

2008-01-11 Thread Shai Erera
Hi I didn't find a proper API on InderWriter or IndexReader to retrieve the total number of deleted documents. Will IndexReader.maxDocs() - IndexReader.numDocs() give the correct result? or this is just a heuristic? Thanks, Shai

Lucene sorting case-sensitive by default?

2008-01-11 Thread Alex Wang
Hi All, I was searching my index with sorting on a field called "Label" which is not tokenized, here is what came back: Extended Sites Catalog Asset Store Extended Sites Catalog Asset Store SALES Print Catalog 2 Print catalog test Test Print Catalog Test refresh catalog print test 3

Lucene 2.3 RC2 available for testing

2008-01-11 Thread Michael Busch
Hi Lucene Users, good news: we are planning to release Lucene 2.3 in about ten days from now! Lucene 2.3 will have significant performance improvements and various other new features. (see http://people.apache.org/~buschmi/staging_area/lucene_2_3/CHANGES.txt for a full list of new features and API

Re: Lucene sorting case-sensitive by default?

2008-01-11 Thread Tom Emerson
String fields are sorted using natural (lexicographic) order. For characters in ASCII range this means uppercase letters will sort before lowercase letters (e.g., 'A' U+0041 sorts before 'a' U+0061). This behaviour is documented on in the JavaDocs for org.apache.lucene.search.Sort. -tree On

Re: Prioiritze new documents

2008-01-11 Thread Tom Emerson
You can utilize the CustomScoreQuery introduced in Lucene 2.2 to provide this type of functionality. This is quite straight forward to do and works really well. Since "recentness" is a function of the time the search was made, we store the appropriate date in an index field and use a CustomScoreQue

RE: Retrieve the number of deleted documents

2008-01-11 Thread Steven A Rowe
Hi Shai, On 01/11/2008 at 7:42 AM, Shai Erera wrote: > Will IndexReader.maxDocs() - IndexReader.numDocs() give the > correct result? or this is just a heuristic? I think your expression gives the correct result - the abstract IndexReader.numDocs() method is implemented in SegmentReader as: pu

Re: Retrieve the number of deleted documents

2008-01-11 Thread Shai Erera
Thanks I guess I should have looked in the code before asking those silly questions :-) I wonder why there isn't a specific API for that though ... On Jan 11, 2008 7:36 PM, Steven A Rowe <[EMAIL PROTECTED]> wrote: > Hi Shai, > > On 01/11/2008 at 7:42 AM, Shai Erera wrote: > > Will IndexReader.max

RE: how do I get my own TopDocHitCollector?

2008-01-11 Thread Beard, Brian
Thanks for all this. We're doing warmup searching also, but just for some common date searches. The warmup would be a good place to add some pre-caching capability. I'll plan for this eventually and start with the partial cache for now. Thanks, Brian Beard -Original Message- From: Antony

Re: Lucene sorting case-sensitive by default?

2008-01-11 Thread Erick Erickson
I've often stored a special sort field that's lower-cased. On Jan 11, 2008 11:40 AM, Alex Wang <[EMAIL PROTECTED]> wrote: > Hi All, > > > > I was searching my index with sorting on a field called "Label" which is > not tokenized, here is what came back: > > > > Extended Sites Catalog Asset Store

How to model hierarchy info to be searched related to a document

2008-01-11 Thread Roger Camargo
I'm trying to index information related to Olap Cubes. Each cube I'm trying to model it like a document. The cube have the following information: ID - Unique identifier for the cube Name - Name of the cube Description - Description of the cube (There can be many dimensions per cube) Dimensi

RE: Prioiritze new documents

2008-01-11 Thread Chris Hostetter
: IMHO it would be nice if Lucene's Similarity formula took the : indexed-date of the document into account. Ideally as an optional : setting, where the user can provide a date field as well. It really wouldn't make sense to incorporate this into the Similarity class. : Some of the other searc