Re: Single node causing cluster to be extremely slow (leveldb)

2014-01-10 Thread Matthew Von-Maszewski
Sean, I did some math based upon the app.config and LOG files. I am guessing that you are starting to thrash your file cache. This theory should be easy to prove / disprove. On that one node, change the cache_size and max_open_files to: cache_size 68435456 max_open_files 425 If I am

Re: Single node causing cluster to be extremely slow (leveldb)

2014-01-10 Thread Matthew Von-Maszewski
Sean, Also you mentioned concern about +S 6:6. 2i queries in 1.4 added sorting. Another heavy 2i user noticed that the sorting need more CPU for Erlang. They were happier after removing the +S. And finally, those 2i queries that return millions of results … how long do those queries take

Re: Single node causing cluster to be extremely slow (leveldb)

2014-01-10 Thread Martin May
Hi Matthew, We applied this change to node 4, started it up, and it seems much happier (no crazy CPU). We’re going to keep an eye on it for a little while, and then apply this setting to all the other nodes as well. Is there anything we can do to prevent this scenario in the future, or should

Re: Single node causing cluster to be extremely slow (leveldb)

2014-01-10 Thread Sean McKibben
We need all the results right away anyway, so we don't paginate, so once we get to 1.4.6+, being able to skip sorting ought to return some speed to us (and maybe we will leave +S at 6:6). With our small ring size and SSDs we see 3M keys returning in about 120 sec. While that case isn't rare, there

Re: Single node causing cluster to be extremely slow (leveldb)

2014-01-10 Thread Matthew Von-Maszewski
Martin, Assuming your business continues to grow, this problem will come back under 1.4 … but not for a while. We can push the cache_size as far down as 8Mbytes to make room for a little more file cache space if needed. The manual tunings I gave you and the subsequent block_size tuning I

Re: Single node causing cluster to be extremely slow (leveldb)

2014-01-10 Thread Martin May
Matthew, Thanks for the help and suggestions, we really appreciate it. We’re planning on giving Riak 2.0 a shot as soon as it’s released, and are looking forward to the new features. Best, Martin On Jan 10, 2014, at 7:51 AM, Matthew Von-Maszewski matth...@basho.com wrote: Martin,

Re: Single node causing cluster to be extremely slow (leveldb)

2014-01-10 Thread Sean McKibben
Excellent and informative explanation, thank you very much. We’re very happy that our adjustments have returned the cluster to its normal operating parameters. Also glad that Riak 2 will be handling this stuff programmatically, as prior to your spreadsheet and explanation it was pure voodoo for

Re: Single node causing cluster to be extremely slow (leveldb)

2014-01-09 Thread Matthew Von-Maszewski
Sean, This could be anything from hardware to a leveldb block size problem to a single bad .sst file causing an infinite loop. Standard questions: - would you send in a copy of the app.config file? - would you describe the hardware characteristics of your node? - would you describe roughly the

Re 2: Single node causing cluster to be extremely slow (leveldb)

2014-01-09 Thread Matthew Von-Maszewski
P.S. Notes on vnode repair are here: https://github.com/basho/leveldb/wiki/repair-notes … ok that is actually a discussion that references this at its end https://gist.github.com/gburd/b88aee6da7fee81dc036 On Jan 9, 2014, at 9:33 PM, Sean McKibben grap...@graphex.com wrote: We have a 5