Why do one big commit? You could do hard commits along the way but keep searcher open and not see the changes until the end.
Obviously a separate issue from memory consumption discussion, but thought I'll add it anyway. Regards, Alex On 05/09/2014 3:30 am, "Li, Ryan" <ryan...@sensis.com.au> wrote: > HI Shawn, > > Thanks for your reply. > > The memory setting of my Solr box is > > 12G physically memory. > 4G for java (-Xmx4096m) > The index size is around 4G in Solr 4.9, I think it was over 6G in Solr > 4.0. > > I do think the RAM size of java is one of the reasons for this slowness. > I'm doing one big commit and when the ingestion process finished 50%, I can > see the solr server already used over 90% of full memory. > > I'll try to assign more RAM to Solr Java. But from your experience, does > 4G sounds like a good number for Java heap size for my scenario? Is there > any way to reduce memory usage during index time? (One thing I know is do a > few commits instead of one commit. ) My concern is providing I have 12 G > in total, If I assign too much to Solr server, I may not have enough for > the OS to cache Solr index file. > > I had a look to solr config file, but couldn't find anything that > obviously wrong, Just wondering which part of that config file would impact > the index time? > > Thanks, > Ryan > > > > > > One possible source of problems with that particular upgrade is the fact > that stored field compression was added in 4.1, and termvector > compression was added in 4.2. They are on by default and cannot be > turned off. The compression is typically fast, but with very large > documents like yours, it might result in pretty major computational > overhead. It can also require additional java heap, which ties into > what follows: > > Another problem might be RAM-related. > > If your java heap is very large, or just a little bit too small, there > can be major performance issues from garbage collection. Based on the > fact that the earlier version performed well, a too-small heap is more > likely than a very large heap. > > If your index size is such that it can't be effectively cached by the > amount of total RAM on the machine (minus the java heap assigned to > Solr), that can cause performance problems. Your index size is likely > to be several gigabytes, and might even reach double-digit gigabytes. > Can you relate those numbers -- index size, java heap size, and total > system RAM? If you can, it would also be a good idea to share your > solrconfig.xml. > > Here's a wiki page that goes into more detail about possible performance > issues. It doesn't mention the possible compression problem: > > http://wiki.apache.org/solr/SolrPerformanceProblems > > Thanks, > Shawn >