Occasional Hang in IndexWriter.close()

2008-04-18 Thread Stu Hood
Hello, I'm having an issue with the IndexWriter in Lucene 2.3.1. Specifically, the IndexWriter.close() method is non-deterministically hanging with the following stack: """ Thread 23044: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise) - j

lucene 2.3.1 QueryTermExtractor error

2008-04-18 Thread Peiran Song
Hi All, I recently upgraded Lucene to 1.9.1 and then to 2.3.1. The application program compiled successfully but throws run time error: java.lang.NoSuchFieldError: prohibited org.apache.lucene.search.highlight.QueryTermExtractor.getTermsFromBooleanQuery(QueryTermExtractor.java:91)

Re: how to cache multivalued field using fieldcache.

2008-04-18 Thread Chris Lu
No. FieldCache is only for single-valued field. You would need to use your own data structure to cache multi-valued field. Or leave the index on disk and use Solid State Disk to read for faster access. -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Applic

Re: differences in deleting docs using IndexWriter and IndexModifier?

2008-04-18 Thread Michael McCandless
I'll add that warning to the javadocs. And, to make matters worse, IndexModifier.docCount() will internally call IndexWriter.docCount() if it has a writer open, else, IndexReader.numDocs(). So IndexModifier itself can be inconsistent. Mike Ulf Dittmer wrote: Thanks for the explanation.

Re: WildCardQuery and TooManyClauses

2008-04-18 Thread Joe K
Thank you very much Brian, this works for me! Chose On Mon, Apr 14, 2008 at 8:58 PM, Beard, Brian <[EMAIL PROTECTED]> wrote: > You can use your approach w/ or w/o the filter. > >td = indexSearcher.search(query, filter, maxnumhits); > > You need to use a filter for the wildcards

RE: Word split problems

2008-04-18 Thread Max Metral
It's probably about 100,000 entries per "thing that it would care about at once". -Original Message- From: Karl Wettin [mailto:[EMAIL PROTECTED] Sent: Thursday, April 17, 2008 3:17 PM To: java-user@lucene.apache.org Subject: Re: Word split problems Max Metral skrev: > > Lululemon Athlet

Re: Pooled searcher (was: Solid State Drives vs. RAMDirectory)

2008-04-18 Thread Otis Gospodnetic
Not sure if this email got answered. That's most likely due to the synchronized isDeleted call: ./org/apache/lucene/index/SegmentReader.java: public synchronized boolean isDeleted(int n) { Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: K

Re: Lucene Proximity Searches

2008-04-18 Thread Otis Gospodnetic
Couldn't you construct a phrase query with ngrams a la "fo oo ob ba ar", augmented with individual ngrams that are required, a la: "fo oo ob ba ar" fo oo ob ba ar If there are ORs between terms, missing ngrams won't prevent a match. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - N

Re: differences in deleting docs using IndexWriter and IndexModifier?

2008-04-18 Thread Ulf Dittmer
Thanks for the explanation. You're right, IndexReader reports the correct number of documents. That might be a worthwhile addition to the IndexModifier javadocs - that the IndexWriter method of the same name is not a drop-in replacement. Of course, that's moot if docCount gets deprecated anyway.

Re: Lucene Proximity Searches

2008-04-18 Thread Paul Elschot
Ana, Op Friday 18 April 2008 12:41:38 schreef Ana Rabade: > I am using ngrams and I need to force that a group of them are > together, but if any of them fails, I need that the document is also > scored. Perhaps you could help me to find the solution or give me a > reference of which changes I mus

Re: Lucene performance: benchmarktemplate.xml

2008-04-18 Thread Glen Newton
HI Anshum, A reasonable question. Answer: 64 bit architecture running 64 bit Java VM. It is great! :-) > Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_02-b05, mixed mode) > OS Version: Linux OpenSUSE 10.2 (64-bit X86-64) If you have any other questions, please let me know. :-) -Glen On 18/0

Re: Document ids collected from HitCollector.collect and used in FieldCache..

2008-04-18 Thread Shailendra Mudgal
Thanks a lot Erik. I just wanted to confirm this. Regards, On Fri, Apr 18, 2008 at 7:34 PM, Erick Erickson <[EMAIL PROTECTED]> wrote: > They'll be in synch forever unless and until you *change* the index. Once > you do anything with an IndexWriter, you have to be very careful about > relying on

Re: Document ids collected from HitCollector.collect and used in FieldCache..

2008-04-18 Thread Erick Erickson
They'll be in synch forever unless and until you *change* the index. Once you do anything with an IndexWriter, you have to be very careful about relying on doc IDs. But remember that opening a searcher takes a snapshot of the index and that reader/searcher will NOT see changes. So you could think

Re: indexing multiple pages and proximity search

2008-04-18 Thread Erick Erickson
I think they'd have to be in the same document. So you could read all the pages and add it to, say, a field named "text" and add the doc and all the pages at once in a single document. Be aware that by default Lucene only index the first 10,000 terms, but that can be changed with IndexWriter.setMa

Re: differences in deleting docs using IndexWriter and IndexModifier?

2008-04-18 Thread Michael McCandless
I believe your docs are being deleted. It's just that IndexWriter.docCount() does not count deleted docs. That method matches IndexReader.maxDoc(), not IndexReader.numDocs(). If you open an IndexReader and call numDocs() does it reflect the deletion? Really I think we should add "maxDoc()" and

Re: Document ids collected from HitCollector.collect and used in FieldCache..

2008-04-18 Thread Shailendra Mudgal
Hi Erik Thanks for you prompt reply. So if i refresh the searcher in every one hour and that time itself if i refresh this cache also, is this going to work? I mean will the document ids will be in sync for that one hour. On Fri, Apr 18, 2008 at 2:02 AM, Erick Erickson <[EMAIL PROTECTED]> wrote:

indexing multiple pages and proximity search

2008-04-18 Thread Chandan Tamrakar
Hi, I have a document and each page of this document is extracted into single text files For ex, document abc.doc have abc_page1.txt , abc_page2.txt ... abc_pageN.txt , is it possible to index them and still retain the Lucene proximity search because technically it is a single document

Benjamin Sznajder/Haifa/Contr/IBM is out of the office.

2008-04-18 Thread Benjamin Sznajder
I will be out of the office starting 10/04/2008 and will not return until 04/05/2008. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

differences in deleting docs using IndexWriter and IndexModifier?

2008-04-18 Thread Ulf Dittmer
Hello all- While adapting some code to use IndexWriter instead of IndexModifier (as indicated by the deprecation warnings), I stumbled upon an issue that I at first thought was a bug, but I'm sure it's only because I don't fully understand how Lucene works. Basically, I'm using the deleteDocument

Re: Lucene Proximity Searches

2008-04-18 Thread Ana Rabade
I am using ngrams and I need to force that a group of them are together, but if any of them fails, I need that the document is also scored. Perhaps you could help me to find the solution or give me a reference of which changes I must do. I am using SpanNearQuery, because the ngrams must be in order

Re: Does LUCENE-831) "Complete overhaul of FieldCache API" provide fieldcache offloading to disk?

2008-04-18 Thread Michael Busch
Chris Hostetter wrote: : But then the FieldCache is just starting to feel alot like column-stride : fields : (LUCENE-1231). that's what i've been thinking ... my goal with LUCENE-831 was to make it easier to manage FieldCache and hopefully the norms[] as well particularly in the case of reopen