About lucene memory consumption

2014-06-27 Thread 308181687
Hi, all I fould that the memory consumption of ‍my lucene server is abnormal, and “jmap -histo ${pid}” show that the class of byte[] consume almost all of the memory. Is there memory leak in my app? Why so many byte[] instances? ‍ The following is the top output of jmap:‍ num

Searching on Large Indexes

2014-06-27 Thread Sandeep Khanzode
Hi, I have an index that runs into 200-300GB. It is not frequently updated. What are the best strategies to query on this index? 1.] Should I, at index time, split the content, like a hash based partition, into multiple separate smaller indexes and aggregate the results programmatically? 2.] Sh

Re: Searching on Large Indexes

2014-06-27 Thread Jigar Shah
Some points based on my experience. You can think of SolrCloud implementation, if you want to distribute your index over multiple servers. Use MMapDirectory locally for each Solr instance in cluster. Hit warm-up query on sever start-up. So most of the documents will be cached, you will start sav

Can Lucene based application be made to work with Scaled Elastic Beanstalk environemnt on Amazon Web Services

2014-06-27 Thread Paul Taylor
Hi I have a simple WAR based web application that uses lucene created indexes to provide search results in a xml format. It works fine locally but I want to deploy it using Elastic Beanstalk within Amazon Webservices Problem 1 is that WAR definition doesn't seem to provide a location for dat

Re: Searching on Large Indexes

2014-06-27 Thread Toke Eskildsen
On Fri, 2014-06-27 at 12:33 +0200, Sandeep Khanzode wrote: > I have an index that runs into 200-300GB. It is not frequently updated. "not frequently" means different things for different people. Could you give an approximate time span? If it is updated monthly, you might consider a full optimizati

RE: About lucene memory consumption

2014-06-27 Thread Uwe Schindler
Hi, The number of byte[] instances and the total size shows that each byte[] is approx. 1024 bytes long. This is exactly the size used by RAMDirectory for allocated heap blocks. So the important question: Do you use RAMDirectory to hold your index? This is not recommended, it is better to use M

Re:RE: About lucene memory consumption

2014-06-27 Thread 308181687
Hi, Thanks very much for your reply. Because we need near real time search, we decide to use NRTCachingDirectory instead of MMapDirectory. ‍ Code to create ‍Directory as follows :‍ Directory ‍indexDir = FSDirectory.open(new File(indexDirName)); NRTCachingDirectory cachedFSDir = new NR

Re:RE: About lucene memory consumption

2014-06-27 Thread Uwe Schindler
Could it be that you forgot to close older IndexReaders after getting a new NRT one? This would be a huge memory leak. I recommend to use SearcherManager to handle real time reopen correctly. Uwe Am 27. Juni 2014 16:05:19 MESZ, schrieb 308181687 <308181...@qq.com>: >Hi, > Thanks very much for

Re: Can Lucene based application be made to work with Scaled Elastic Beanstalk environemnt on Amazon Web Services

2014-06-27 Thread Tri Cao
I would just use S3 as a data push mechanism. In your servlet's init(), you could download the index from S3 and unpack it to a local directory, then initialize your Lucene searcher to that directory.  Downloading from S3 to EC2 instances is free, and 5G would take a minute or two. Also, if you p

Re:RE: About lucene memory consumption

2014-06-27 Thread 308181687
Hi, Uwe ‍ Actually‍ I have alreadly used SearcherManager, and re-open it every 5 seconds. I notice that the number of byte[] ‍instances ‍ is two times as the the number of java.util.LinkedHashMap$Entry instances. It seems that there is an big ‍LinkedHashMap instance which cache something.