I'm not sure I understand why this is important. Too much memory would just be unused.
This is what the heap looks now: Heap Configuration: MinHeapFreeRatio = 40 MaxHeapFreeRatio = 70 MaxHeapSize = 17179869184 (16384.0MB) NewSize = 21757952 (20.75MB) MaxNewSize = 283508736 (270.375MB) OldSize = 65404928 (62.375MB) NewRatio = 7 SurvivorRatio = 8 PermSize = 21757952 (20.75MB) MaxPermSize = 176160768 (168.0MB) Heap Usage: New Generation (Eden + 1 Survivor Space): capacity = 255197184 (243.375MB) used = 108828496 (103.78694152832031MB) free = 146368688 (139.5880584716797MB) 42.644865548359654% used Eden Space: capacity = 226885632 (216.375MB) used = 83498424 (79.63030242919922MB) free = 143387208 (136.74469757080078MB) 36.80198841326365% used >From Space: capacity = 28311552 (27.0MB) used = 25330072 (24.156639099121094MB) free = 2981480 (2.8433609008789062MB) 89.46903370044849% used To Space: capacity = 28311552 (27.0MB) used = 0 (0.0MB) free = 28311552 (27.0MB) 0.0% used concurrent mark-sweep generation: capacity = 16896360448 (16113.625MB) used = 12452710200 (11875.829887390137MB) free = 4443650248 (4237.795112609863MB) 73.70054775005708% used Perm Generation: capacity = 70578176 (67.30859375MB) used = 37652032 (35.90777587890625MB) free = 32926144 (31.40081787109375MB) 53.347981109627995% used Med venlig hilsen / Best regards *John Nielsen* Programmer *MCB A/S* Enghaven 15 DK-7500 Holstebro Kundeservice: +45 9610 2824 p...@mcb.dk www.mcb.dk On Thu, Nov 29, 2012 at 4:08 AM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > If this is caused by index segment merging you should be able to see that > very clearly on the Index report in SPM, where you would see sudden graph > movement at the time of spike and corresponding to CPU and disk activity. > I think uncommenting that infostream in solrconfig would also show it. > > Otis > -- > SOLR Performance Monitoring - http://sematext.com/spm > On Nov 28, 2012 9:20 PM, "Erick Erickson" <erickerick...@gmail.com> wrote: > > > Am I reading this right? All you're doing on varnish1 is replicating to > it? > > You're not searching or indexing? I'm sure I'm misreading this..... > > > > > > "The spike, which only lasts for a couple of minutes, sends the disks > > racing" This _sounds_ suspiciously like segment merging, especially the > > "disks racing" bit. Or possibly replication. Neither of which make much > > sense. But is there any chance that somehow multiple commits are being > > issued? Of course if varnish1 is a slave, that shouldn't be happening > > either. > > > > And the whole bit about nothing going to the logs is just bizarre. I'm > > tempted to claim hardware gremlins, especially if you see nothing similar > > on varnish2. Or some other process is pegging the machine. All of which > is > > a way of saying "I have no idea".... > > > > Yours in bewilderment, > > Erick > > > > > > > > On Wed, Nov 28, 2012 at 6:15 AM, John Nielsen <j...@mcb.dk> wrote: > > > > > I apologize for the late reply. > > > > > > The query load is more or less stable during the spikes. There are > always > > > fluctuations, but nothing on the order of magnitude that could explain > > this > > > spike. In fact, the latest spike occured last night when there were > > almost > > > noone using it. > > > > > > To test a hunch of mine, I tried to deactivate all caches by commenting > > out > > > all cache entries in solrconfig.xml. It still spikes, so I dont think > it > > > has anything to do with cache warming or hits/misses or anything of the > > > sort. > > > > > > One interesting thing GC though. This is our latest spike with cpu load > > > (this server has 8 cores, so a load higher than 8 is potentially > > > troublesome): > > > > > > 2012.Nov.27 19:58:18 2.27 > > > 2012.Nov.27 19:57:17 4.06 > > > 2012.Nov.27 19:56:18 8.95 > > > 2012.Nov.27 19:55:17 19.97 > > > 2012.Nov.27 19:54:17 32.27 > > > 2012.Nov.27 19:53:18 1.67 > > > 2012.Nov.27 19:52:17 1.6 > > > 2012.Nov.27 19:51:18 1.77 > > > 2012.Nov.27 19:50:17 1.89 > > > > > > This is what the GC was doing around that time: > > > > > > 2012-11-27T19:50:04.933+0100: [GC [PSYoungGen: > > 4777586K->277641K(4969216K)] > > > 8887542K->4387693K(9405824K), 0.0856360 secs] [Times: user=0.54 > sys=0.00, > > > real=0.09 secs] > > > 2012-11-27T19:50:30.785+0100: [GC [PSYoungGen: > > 4749769K->325171K(5068096K)] > > > 8859821K->4435320K(9504704K), 0.0992160 secs] [Times: user=0.63 > sys=0.00, > > > real=0.10 secs] > > > 2012-11-27T19:51:12.293+0100: [GC [PSYoungGen: > > 4911603K->306181K(5071168K)] > > > 9021752K->4416617K(9507776K), 0.0957890 secs] [Times: user=0.62 > sys=0.00, > > > real=0.09 secs] > > > 2012-11-27T19:51:52.817+0100: [GC [PSYoungGen: > > 4892613K->376175K(5075328K)] > > > 9003049K->4486755K(9511936K), 0.1099830 secs] [Times: user=0.79 > sys=0.01, > > > real=0.11 secs] > > > 2012-11-27T19:52:29.454+0100: [GC [PSYoungGen: > > 4972847K->271468K(4868160K)] > > > 9083427K->4383520K(9304768K), 0.0699660 secs] [Times: user=0.48 > sys=0.01, > > > real=0.07 secs] > > > 2012-11-27T19:53:08.176+0100: [GC [PSYoungGen: > > 4868140K->336421K(5090944K)] > > > 8980192K->4448572K(9527552K), 0.0824350 secs] [Times: user=0.56 > sys=0.01, > > > real=0.08 secs] > > > 2012-11-27T19:54:53.534+0100: [GC [PSYoungGen: > > 4950373K->340513K(5092864K)] > > > 9062524K->4468215K(9529472K), 0.1016770 secs] [Times: user=0.71 > sys=0.00, > > > real=0.10 secs] > > > 2012-11-27T19:55:02.906+0100: [GC [PSYoungGen: > > 4954465K->480488K(4952000K)] > > > 9082167K->4684537K(9388608K), 0.1813290 secs] [Times: user=1.23 > sys=0.09, > > > real=0.19 secs] > > > 2012-11-27T19:55:09.114+0100: [GC [PSYoungGen: > > 4951976K->560434K(5031936K)] > > > 9156025K->5075285K(9547072K), 0.3511090 secs] [Times: user=2.32 > sys=0.12, > > > real=0.35 secs] > > > 2012-11-27T19:55:09.465+0100: [Full GC [PSYoungGen: > > 560434K->0K(5031936K)] > > > [PSOldGen: 4514851K->2793342K(5047296K)] 5075285K->2793342K(10079232K) > > > [PSPermGen: 35285K->35285K(44864K)], 5.2310820 secs] [Times: user=5.23 > > > sys=0.00, real=5.23 secs] > > > 2012-11-27T19:55:21.359+0100: [GC [PSYoungGen: > > 4471488K->87249K(4647168K)] > > > 7264830K->2880592K(9694464K), 0.0426640 secs] [Times: user=0.27 > sys=0.00, > > > real=0.04 secs] > > > 2012-11-27T19:55:35.399+0100: [GC [PSYoungGen: > > 4173969K->171306K(4839552K)] > > > 6967312K->2964649K(9886848K), 0.0493570 secs] [Times: user=0.34 > sys=0.00, > > > real=0.05 secs] > > > 2012-11-27T19:55:41.397+0100: [GC [PSYoungGen: > > 4258026K->156193K(4802368K)] > > > 7051369K->2949536K(9849664K), 0.0643190 secs] [Times: user=0.37 > sys=0.00, > > > real=0.06 secs] > > > 2012-11-27T19:55:55.609+0100: [GC [PSYoungGen: > > 4254753K->156028K(4826880K)] > > > 7048096K->2949371K(9874176K), 0.0590410 secs] [Times: user=0.35 > sys=0.00, > > > real=0.06 secs] > > > 2012-11-27T19:56:20.519+0100: [GC [PSYoungGen: > > 4254588K->413148K(4860288K)] > > > 7047931K->3206491K(9907584K), 0.1189810 secs] [Times: user=0.91 > sys=0.01, > > > real=0.12 secs] > > > 2012-11-27T19:56:56.649+0100: [GC [PSYoungGen: > > 4564892K->412801K(4866432K)] > > > 7358235K->3206144K(9913728K), 0.1089870 secs] [Times: user=0.82 > sys=0.00, > > > real=0.11 secs] > > > 2012-11-27T19:57:24.304+0100: [GC [PSYoungGen: > > 4564545K->434702K(4870592K)] > > > 7357888K->3228045K(9917888K), 0.1172090 secs] [Times: user=0.86 > sys=0.00, > > > real=0.12 secs] > > > 2012-11-27T19:57:41.489+0100: [GC [PSYoungGen: > > 4591822K->500171K(4657344K)] > > > 7385165K->3293513K(9704640K), 0.1663600 secs] [Times: user=1.12 > sys=0.00, > > > real=0.16 secs] > > > 2012-11-27T19:58:24.062+0100: [GC [PSYoungGen: > > 4657291K->453204K(4849536K)] > > > 7450633K->3246547K(9896832K), 0.1099200 secs] [Times: user=0.83 > sys=0.01, > > > real=0.11 secs] > > > 2012-11-27T19:58:46.081+0100: [GC [PSYoungGen: > > 4560404K->533101K(4640320K)] > > > 7353747K->3326443K(9687616K), 0.1252060 secs] [Times: user=0.96 > sys=0.00, > > > real=0.13 secs] > > > 2012-11-27T19:59:01.488+0100: [GC [PSYoungGen: > > 4640301K->480969K(4822720K)] > > > 7433643K->3274312K(9870016K), 0.1495860 secs] [Times: user=0.99 > sys=0.01, > > > real=0.15 secs] > > > 2012-11-27T19:59:25.155+0100: [GC [PSYoungGen: > > 4534281K->572295K(4625664K)] > > > 7327624K->3365638K(9672960K), 0.1492270 secs] [Times: user=1.10 > sys=0.01, > > > real=0.15 secs] > > > 2012-11-27T19:59:39.923+0100: [GC [PSYoungGen: > > 4625607K->486964K(4800640K)] > > > 7418950K->3280306K(9847936K), 0.1244440 secs] [Times: user=0.94 > sys=0.00, > > > real=0.12 secs] > > > > > > So the GC was doing a full collection around that time, but that only > > > accounts for around 5 secs. Not the couple of minutes the spike lasts. > > > > > > I am really at a complete loss as to what this could be. Google has not > > > given me any clues. > > > > > > > > > Med venlig hilsen / Best regards > > > > > > *John Nielsen* > > > Programmer > > > > > > > > > > > > *MCB A/S* > > > Enghaven 15 > > > DK-7500 Holstebro > > > > > > Kundeservice: +45 9610 2824 > > > p...@mcb.dk > > > www.mcb.dk > > > > > > > > > > > > On Fri, Nov 23, 2012 at 1:56 PM, Otis Gospodnetic < > > > otis.gospodne...@gmail.com> wrote: > > > > > > > Strange indeed. What about query load/ayes during that time? What > about > > > GC? > > > > And does cache hit rate drop? > > > > > > > > Otis > > > > -- > > > > SOLR Performance Monitoring - http://sematext.com/spm > > > > On Nov 23, 2012 2:45 AM, "John Nielsen" <j...@mcb.dk> wrote: > > > > > > > > > Hi all, > > > > > > > > > > We are seeing a strange CPU spike on one of our solr4 servers which > > we > > > > are > > > > > unable to explain. The spike, which only lasts for a couple of > > minutes, > > > > > sends the disks racing. This happens a few times a times a day. > This > > is > > > > > what the load looks like: > > > > > > > > > > 2012.Nov.14 13:37:17 2.77 > > > > > 2012.Nov.14 13:36:17 3.65 > > > > > 2012.Nov.14 13:35:18 3.92 > > > > > 2012.Nov.14 13:34:17 3.95 > > > > > 2012.Nov.14 13:33:18 6.56 > > > > > 2012.Nov.14 13:32:17 10.79 > > > > > 2012.Nov.14 13:31:17 24.38 > > > > > 2012.Nov.14 13:30:17 63.35 > > > > > 2012.Nov.14 13:29:17 24.68 > > > > > 2012.Nov.14 13:28:17 2.44 > > > > > 2012.Nov.14 13:27:18 3.51 > > > > > 2012.Nov.14 13:26:17 5.26 > > > > > 2012.Nov.14 13:25:18 5.71 > > > > > 2012.Nov.14 13:24:17 2.7 > > > > > > > > > > The problem is that out of a 3 minute spike, I get about 40 seconds > > of > > > > > silence in the logs. This log usually adds like a thousand lines > > every > > > > > second. Not being able to communicate with the server for this > long, > > > > breaks > > > > > our use case. > > > > > > > > > > We have two servers, varnish01 and varnish02. We used to feed data > to > > > > > varnish02, replicate it to varnish01 where the data is then read > > from. > > > > When > > > > > we discovered this issue, we moved all traffic to varnish02 so that > > > data > > > > is > > > > > being replicated to varnish01, but other than that, gets zero > > traffic. > > > > The > > > > > spike did not disappear. > > > > > > > > > > The spike we are seeing is on varnish01 only. > > > > > > > > > > Please note that our use case requires us to continuously feed > large > > > > > amounts of data from our main system in the order of up to 1.000 > > > > registers > > > > > every minute. Our solrconfig.xml is attached. > > > > > > > > > > Has anyone seen this phenomenon before? > > > > > > > > > > Med venlig hilsen / Best regards > > > > > > > > > > *John Nielsen* > > > > > Programmer > > > > > > > > > > > > > > > > > > > > *MCB A/S* > > > > > Enghaven 15 > > > > > DK-7500 Holstebro > > > > > > > > > > Kundeservice: +45 9610 2824 > > > > > p...@mcb.dk > > > > > www.mcb.dk > > > > > > > > > > > > > > > > > > > >