RE: Lucene slow performance -- still broke

2013-03-20 Thread Scott Smith
Duh...it's supposed to be setMergeFactor(). Thanks Scott -Original Message- From: Simon Willnauer [mailto:simon.willna...@gmail.com] Sent: Wednesday, March 20, 2013 3:53 PM To: java-user@lucene.apache.org Subject: Re: Lucene slow performance -- still broke quick question, why on earth

Re: Lucene slow performance -- still broke

2013-03-20 Thread Simon Willnauer
quick question, why on earth do you set: lbsm.setMaxMergeDocs(10); if you have 10 docs in a segment you don't want to merge anymore? I don't think you should set this at all. simon On Wed, Mar 20, 2013 at 10:48 PM, Scott Smith wrote: > First, I decided I wasn't comfortable doing closes on t

RE: Lucene slow performance -- still broke

2013-03-20 Thread Scott Smith
First, I decided I wasn't comfortable doing closes on the IndexReader. So, I did what I hope is better. I create a singleton SearcherManager (out-of-the-box from the 4.1 release) and do acquire/releases. I assume that's more or less equivalent anyway. Second, it doesn't really matter as I am

Re: high memory usage by indexreader

2013-03-20 Thread ash nix
Thanks Ian. Number of documents in index is 381,153,828. The data set size is 1.9TB. The index size of this dataset is 290G. It is single index. The following are the fields indexed for each of the document. 1. Document id : It is StoredField and is generally around 128 chars or more. 2. Text fie

Re: high memory usage by indexreader

2013-03-20 Thread Ian Lea
Searching doesn't usually use that much memory, even on large indexes. What version of lucene are you on? How many docs in the index? What does a slow query look like (q.toString()) and what search method are you calling? Anything else relevant you forgot to tell us? Or google "lucene shardin

high memory usage by indexreader

2013-03-20 Thread ash nix
Hi Everybody, I have created a single compound index which is of size 250 Gigs. I open a single index reader to search simple boolean queries. The process is consuming lot of memory search painfully slow. It seems that I will have to create multiple indexes and have multiple index readers. Can an

Re: Overall doc-count in TermStats, during flush...

2013-03-20 Thread Ravikumar Govindarajan
Thanks Simon for the quick update... We always have uniform docs with same set of fields added and that led to the confusion. -- Ravi On Wed, Mar 20, 2013 at 6:33 PM, Simon Willnauer wrote: > The BitSet basically counts how many documents have one or more values > in this field. Some docs might

Re: Overall doc-count in TermStats, during flush...

2013-03-20 Thread Simon Willnauer
The BitSet basically counts how many documents have one or more values in this field. Some docs might not have values in this field. state.segmentInfo.getDocCount() is the # of docs in this segment but we are flushing a single field here. We pass down the cardinality here since we keep the statist