Solrconfig.xml -> http://apaste.info/dsbv

Schema.xml -> http://apaste.info/67PI

This solrconfig.xml file has optimization enabled. I had another file which I 
can't locate at the moment, in which I defined a custom merge scheduler in 
order to disable optimization.

When I say 1000 segments, I mean that's the number I saw in Solr UI. I assume 
there were much more files than that.

Thanks,
Yoni



-----Original Message-----
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Sunday, June 02, 2013 22:53
To: solr-user@lucene.apache.org
Subject: Re: out of memory during indexing do to large incoming queue

On 6/2/2013 12:25 PM, Yoni Amir wrote:
> Hi Shawn and Shreejay, thanks for the response.
> Here is some more information:
> 1) The machine is a virtual machine on ESX server. It has 4 CPUs and 
> 8GB of RAM. I don't remember what CPU but something modern enough. It 
> is running Java 7 without any special parameters, and 4GB allocated 
> for Java (-Xmx)
> 2) After successful indexing, I have 2.5 Million documents, 117GB index size. 
> This is the size after it was optimized.
> 3) I plan to upgrade to 4.3 just didn't have time. 4.0 beta is what was 
> available at the time that we had a release deadline.
> 4) The setup with master-slave replication, not Solr Cloud. The server that I 
> am discussing is the indexing server, and in these tests there were actually 
> no slaves involved, and virtually zero searches performed.
> 5) Attached is my configuration. I tried to disable the warm-up and opening 
> of searchers, it didn't change anything. The commits are done by Solr, using 
> autocommit. The client sends the updates without a commit command.
> 6) I want to disable optimization, but when I disabled it, the OOME occurred 
> even faster. The number of segments reached around a thousand within an hour 
> or so. I don't know if it's normal or not, but at that point if I restarted 
> Solr it immediately took about 1GB of heap space just on start-up, instead of 
> the usual 50MB or so.
> 
> If I commit less frequently, don't I increase the risk of losing data, e.g., 
> if the power goes down, etc.?
> If I disable optimization, is it necessary to avoid such a large number of 
> segments? Is it possible?

Last part first: Losing data is much less of a risk with Solr 4.x, if you have 
enabled the updateLog.

We'll need some more info.  See the end of the message for specifics.

Right off the bat, I can tell you that with an index that's 117GB, you're going 
to need a LOT of RAM.

Each of my 4.2.1 servers has 42GB of index and about 37 million documents 
between all the index shards.  The web application never uses facets, which 
tend to use a lot of memory.  My index is a lot smaller than yours, and I need 
a 6GB heap, seeing OOM errors if it's only 4GB.
You probably need at least an 8GB heap, and possibly larger.

Beyond the amount of memory that Solr itself uses, for good performance you 
will also need a large amount of memory for OS disk caching.  Unless the server 
is using SSD, you need to allocate at least 64GB of real memory to the virtual 
machine.  If you've got your index on SSD, 32GB might be enough.  I've got 64GB 
total on my servers.

http://wiki.apache.org/solr/SolrPerformanceProblems

When you say that there are over 1000 segments, are you seeing 1000 files, or 
are there literally 1000 segments, giving you between 12000 and 15000 files?  
Even if your mergeFactor were higher than the default 10, that just shouldn't 
happen.

Can you share your solrconfig.xml and schema.xml?  Use a paste website like 
http://apaste.info and share the URLs.

Thanks,
Shawn


Confidentiality: This communication and any attachments are intended for the 
above-named persons only and may be confidential and/or legally privileged. Any 
opinions expressed in this communication are not necessarily those of NICE 
Actimize. If this communication has come to you in error you must take no 
action based on it, nor must you copy or show it to anyone; please 
delete/destroy and inform the sender by e-mail immediately.  
Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
Viruses: Although we have taken steps toward ensuring that this e-mail and 
attachments are free from any virus, we advise that in keeping with good 
computing practice the recipient should ensure they are actually virus free.

Reply via email to