Transaction semantics in Document addition

2008-05-19 Thread Dino Korah
Hi All, I am dealing with a situation where a document could possibly have multiple attachments to it, and they are all added to the index under a document-id (not lucene doc-id). Now if one of the attachments fail to get indexed due to failure of any subsystem like the text extraction module, I

Re: Transaction semantics in Document addition

2008-05-19 Thread Michael McCandless
Dino Korah wrote: Hi All, I am dealing with a situation where a document could possibly have multiple attachments to it, and they are all added to the index under a document-id (not lucene doc-id). Now if one of the attachments fail to get indexed due to failure of any subsystem like the

Re: Version 2.3 Does Not Index/Digest All Document Tokens

2008-05-19 Thread Michael McCandless
Or, if it's hard to reduce this to a compact test, can you post the code you are using and describe where/how it finds a difference? I'd like to get to the bottom of this. Mike Grant Ingersoll wrote: Can you reduce this down to a unit test? Thanks, Grant On May 16, 2008, at 11:37 AM, D

Re: Transaction semantics in Document addition

2008-05-19 Thread N Hira
How about an attribute (fullyIndexed=true/false) to keep track of whether the indexing was successful? We used a similar attribute for a similar problem, but stored it in the accompanying database instead. -h - Original Message From: Michael McCandless <[EMAIL PROTECTED]> To: java-

RE: Transaction semantics in Document addition

2008-05-19 Thread Dino Korah
In your scenario, it might work, but I wonder how you generate hits, excluding the fullyindexed=false. -Original Message- From: N Hira [mailto:[EMAIL PROTECTED] Sent: 19 May 2008 18:31 To: java-user@lucene.apache.org Subject: Re: Transaction semantics in Document addition How about an at

Re: Transaction semantics in Document addition

2008-05-19 Thread N Hira
You could probably add a query filter, but we use that attribute to find the documents that need to be re-indexed... -h - Original Message From: Dino Korah <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Monday, May 19, 2008 1:32:09 PM Subject: RE: Transaction semantics in D

slow FieldCacheImpl.createValue

2008-05-19 Thread Alex
hi, I have a ValueSourceQuery that makes use of a stored field. The field contains roughly 27.27 million untokenized terms. The average length of each term is 8 digits. The first search always takes around 5 minutes, and it is due to the createValue function in the FieldCacheImpl. The search is e

Re: slow FieldCacheImpl.createValue

2008-05-19 Thread Anshum
Hey Alex, I guess you haven't tried warming up the engine before putting it to use. Though one of the simpler implementation, you could try warming up the engine first by sending a few searches and then put it to use (put it into the serving machine loop). You could also do a little bit of preproce

RE: slow FieldCacheImpl.createValue

2008-05-19 Thread Alex
Hi, thanks for the reply. Yes, after the first slow search, subsequent searches have good performance. I guess the issue is why exactally, is createValue taking so long, or should it take so long (4 ~ 5 minutes ). Given roughly 27million terms, each of roughly 8 characters long and few other by