Sean,
I did some math based upon the app.config and LOG files. I am guessing that
you are starting to thrash your file cache.
This theory should be easy to prove / disprove. On that one node, change the
cache_size and max_open_files to:
cache_size 68435456
max_open_files 425
If I am
Sean,
Also you mentioned concern about +S 6:6. 2i queries in 1.4 added sorting.
Another heavy 2i user noticed that the sorting need more CPU for Erlang. They
were happier after removing the +S.
And finally, those 2i queries that return millions of results … how long do
those queries take
Hi Matthew,
We applied this change to node 4, started it up, and it seems much happier (no
crazy CPU). We’re going to keep an eye on it for a little while, and then apply
this setting to all the other nodes as well.
Is there anything we can do to prevent this scenario in the future, or should
We need all the results right away anyway, so we don't paginate, so
once we get to 1.4.6+, being able to skip sorting ought to return some
speed to us (and maybe we will leave +S at 6:6). With our small ring
size and SSDs we see 3M keys returning in about 120 sec. While that
case isn't rare, there
Martin,
Assuming your business continues to grow, this problem will come back under 1.4
… but not for a while. We can push the cache_size as far down as 8Mbytes to
make room for a little more file cache space if needed.
The manual tunings I gave you and the subsequent block_size tuning I
Matthew,
Thanks for the help and suggestions, we really appreciate it. We’re planning on
giving Riak 2.0 a shot as soon as it’s released, and are looking forward to the
new features.
Best,
Martin
On Jan 10, 2014, at 7:51 AM, Matthew Von-Maszewski matth...@basho.com wrote:
Martin,
Excellent and informative explanation, thank you very much. We’re very happy
that our adjustments have returned the cluster to its normal operating
parameters. Also glad that Riak 2 will be handling this stuff programmatically,
as prior to your spreadsheet and explanation it was pure voodoo for
Sean,
This could be anything from hardware to a leveldb block size problem to a
single bad .sst file causing an infinite loop.
Standard questions:
- would you send in a copy of the app.config file?
- would you describe the hardware characteristics of your node?
- would you describe roughly the
P.S. Notes on vnode repair are here:
https://github.com/basho/leveldb/wiki/repair-notes
… ok that is actually a discussion that references this at its end
https://gist.github.com/gburd/b88aee6da7fee81dc036
On Jan 9, 2014, at 9:33 PM, Sean McKibben grap...@graphex.com wrote:
We have a 5