Do you have active entropy enabled: {anti_entropy, {on, []}} in app.config
file ?
If so, please check file count there also: {anti_entropy_data_dir, …
Matthew
On Mar 1, 2014, at 1:54 PM, Bryan Hughes wrote:
> Howdy,
>
> I am having a problem with Riak crashing (note, this not in productio
wrote:
> On 19 February 2014 03:18, Matthew Von-Maszewski wrote:
>> Riak 2.0 is coming. Hold your mass delete until then. The "bug" is within
>> Google's original leveldb architecture. Riak 2.0 sneaks around to get the
>> disk space freed.
>
> I'
oid overloading the cluster.
>
> Mixed feelings :S
>
>
>
> On 18 February 2014 15:45, Matthew Von-Maszewski wrote:
> Edgar,
>
> The first "concern" I have is that leveldb's delete does not free disk space.
> Others have executed mass delete operations
eleted data in Riak 2.0.
But that release is not quite ready for production usage.
What do you hope to achieve by the mass delete?
Matthew
On Feb 18, 2014, at 10:29 AM, Edgar Veiga wrote:
> Sorry, forgot that info!
>
> It's leveldb.
>
> Best regards
>
>
&
Which Riak backend are you using: bitcask, leveldb, multi?
Matthew
On Feb 18, 2014, at 10:17 AM, Edgar Veiga wrote:
> Hi all!
>
> I have a fairly trivial question regarding mass deletion on a riak cluster,
> but firstly let me give you just some context. My cluster is running with
> riak 1
P.S. I failed to include the link to the wiki page that discusses each of
Basho's optimizations for the Riak environment:
https://github.com/basho/leveldb/wiki
On Jan 27, 2014, at 8:49 PM, Matthew Von-Maszewski wrote:
> Basho's leveldb requirements have lead to different
Basho's leveldb requirements have lead to different optimizations. Facebook
has a captive hardware environment and usage case that does not match ours. I
am not saying their changes are better or worse, only different.
Basho needs:
- multiple databases running simultaneously: 6 to 64
- suppo
On Jan 27, 2014, at 3:06 PM, Elias Levy wrote:
> On Mon, Jan 27, 2014 at 11:57 AM, Matthew Von-Maszewski
> wrote:
>
> Google's designed leveldb to always assume it was not cleanly shutdown. If
> the startup can read the most recent MANIFEST file, leveldb cleans up the
On Jan 27, 2014, at 2:40 PM, Elias Levy wrote:
> Will Riak detect a LevelDB that was not cleanly shutdown when it starts up
> and start repair on its own?
>
>
Google's designed leveldb to always assume it was not cleanly shutdown. If the
startup can read the most recent MANIFEST file, leve
o all the other nodes as well.
>
> Is there anything we can do to prevent this scenario in the future, or should
> the settings you suggested take care of that?
>
> Thanks,
> Martin
>
> On Jan 10, 2014, at 6:42 AM, Matthew Von-Maszewski wrote:
>
>> Sean,
>>
Sean,
Also you mentioned concern about +S 6:6. 2i queries in 1.4 added "sorting".
Another heavy 2i user noticed that the sorting need more CPU for Erlang. They
were happier after removing the +S.
And finally, those 2i queries that return "millions of results" … how long do
those queries tak
Sean,
I did some math based upon the app.config and LOG files. I am guessing that
you are starting to thrash your file cache.
This theory should be easy to prove / disprove. On that one node, change the
cache_size and max_open_files to:
cache_size 68435456
max_open_files 425
If I am correct
P.S. Notes on vnode repair are here:
https://github.com/basho/leveldb/wiki/repair-notes
… ok that is actually a discussion that references this at its end
https://gist.github.com/gburd/b88aee6da7fee81dc036
On Jan 9, 2014, at 9:33 PM, Sean McKibben wrote:
> We have a 5 node cluster using
Sean,
This could be anything from hardware to a leveldb block size problem to a
single bad .sst file causing an infinite loop.
Standard questions:
- would you send in a copy of the app.config file?
- would you describe the hardware characteristics of your node?
- would you describe roughly the
ces about changing those app.config values? My cluster
> is running smoothly for the past 6 months and I don't want to start all over
> again :)
>
> Best Regards
>
>
> On 27 December 2013 18:56, Matthew Von-Maszewski wrote:
> Yes. Confirmed.
>
> There are optio
Yes. Confirmed.
There are options available in app.config to control how often this occurs and
how many vnodes rehash at once: defaults are every 7 days and two vnodes per
server at a time.
Matthew Von-Maszewski
On Dec 27, 2013, at 13:50, Edgar Veiga wrote:
> Hi!
>
> I've
On Dec 21, 2013, at 4:12 AM, Damien Krotkine wrote:
> First option is to use leveldb as storage backend. And use an external
> script to expire (delete) keys that are too old (one of the secondary
> index being the timestamp, it's easy). However it seems that deleted
> keys in leveldb may not be
proves the stuff I can see..
>
> Any other ideas?
>
> Cheers
> Simon
>
> On Wed, 11 Dec 2013 15:37:03 -0500
> Matthew Von-Maszewski wrote:
>
>> The real Riak developers have suggested this might be your problem with
>> stats being stuck:
>>
>>
and
> PUT (more on PUT).. so I would like to know how fast the things are..
> but "status" isn't working.. argh...
>
> Cheers
> Simon
>
>
> On Wed, 11 Dec 2013 14:32:07 -0500
> Matthew Von-Maszewski wrote:
>
>> An additional thought: if
- helps: try setting sst_block_size to 32768 in app.config
- does not help: try removing +S from vm.args
Matthew
On Dec 11, 2013, at 1:58 PM, Simon Effenberg wrote:
> Hi Matthew,
>
> On Wed, 11 Dec 2013 18:38:49 +0100
> Matthew Von-Maszewski wrote:
>
>> Simon,
>&
e sharedbuffers cached
>>> Mem: 23999 23759239 0184 16183
>>> -/+ buffers/cache: 7391 16607
>>> Swap:0 0 0
>>>
>>> We have 12 servers..
>>> datad
nto a spreadsheet.
Matthew Von-Maszewski
On Dec 11, 2013, at 7:09, Simon Effenberg wrote:
> Hi Matthew
>
> Memory: 23999 MB
>
> ring_creation_size, 256
> max_open_files, 100
>
> riak-admin status:
>
> memory_total : 276001360
> memory_processes : 1915063
RAM in 1.4. The memory accounting
for maz_open_files changed in 1.4.
Matthew Von-Maszewski
On Dec 11, 2013, at 6:28, Simon Effenberg wrote:
> Hi Matthew,
>
> it took around 11hours for the first node to finish the compaction. The
> second node is running already 12 hours and is
t;
> see inline..
>
> On Tue, 10 Dec 2013 10:38:03 -0500
> Matthew Von-Maszewski wrote:
>
>> The sad truth is that you are not the first to see this problem. And yes,
>> it has to do with your 950GB per node dataset. And no, nothing to do but
>> sit through i
The sad truth is that you are not the first to see this problem. And yes, it
has to do with your 950GB per node dataset. And no, nothing to do but sit
through it at this time.
While I did extensive testing around upgrade times before shipping 1.4,
apparently there are data configurations I di
Change them. Great for 1G too. I use the settings on all my servers, Basho and
otherwise. None have 10g networks.
Matthew
Sent from my iPhone
> On Dec 3, 2013, at 4:48 AM, Ingo Rockel
> wrote:
>
> Hi,
>
> we just updated riak cluster to 1.4.2 from 1.3.0 and as suggested in the
> rolling
Riak 1.4 upgrade reorganizes levels one and two upon upgrade. This likely
pushed data up into your level 5.
No, there currently is no control on compaction I/O.
Matthew
Sent from my iPhone
> On Dec 2, 2013, at 2:36 AM, Timo Gatsonides wrote:
>
>
> I have upgraded two nodes in my cluster f
Anita,
The "aggressive delete" feature should be coded by the end of this week. That
will make it a candidate for Riak 2.0. The 2.0 release date is not set.
Matthew
On Nov 11, 2013, at 11:40 PM, Anita Wong wrote:
> Happy to see Riak 2.0 being available.
> May I know if the issue that was f
hmm …128 partitions divide by 5 nodes is ~26 vnodes per server.AAE creates a parallel number of vnodes, so your servers have ~52 vnodes each.52 x 3,000 is 156,000 files … 156,000 > 65,536 ulimit. Sooner or later 65,536 will be too small. But ...Now, the primary account method in 1.4.2 is memory s
log files. Which log
> file should I be looking in?
>
> Cheers
> Matt
>
>
> On 8 November 2013 11:50, Matthew Von-Maszewski wrote:
> Agreed
>
> Matthew Von-Maszewski
>
>
> On Nov 7, 2013, at 19:44, kzhang wrote:
>
> > Thanks!
> >
Agreed
Matthew Von-Maszewski
On Nov 7, 2013, at 19:44, kzhang wrote:
> Thanks!
>
> The cluster has been in production for 4 months. I found this in leveldb log
> file:
> 013/10/25-12:43:10.678239 7fb895781700 compacted to: files[ 0 1 5 31 51 0 0
> ]
> 2013/10/29-16:57:2
Kathleen,
This is a tricky question. On a new database, the cache_size will help
performance. As your database grows, the file cache (max_open_files) becomes
more important than the cache_size because a miss in the file cache is much
more expensive (disk activity) than a miss in the block cac
inserting replies in original text
On Nov 7, 2013, at 12:49 PM, kzhang wrote:
> Thanks again!
>
> I have a few questions regarding the new spreadsheet.
>
> 'percent reserved', in the spreadsheet, it is set at 10%, we should really
> use 50%?
>
> 'vnode count', it is 'vnode count per server'?
Hmm, no.
I double checked the link. It has the spreadsheet for 1.2 (bad basho, bad bad
basho). Attached is the 1.4 spreadsheet.
Note there is a secondary column that computes sizing when anti_entropy is on
(default).
Matthew
leveldb_sizing_1.4.xls
Description: Binary data
On Nov 7, 2013
The 50% recommendation is to ensure there is memory available if this node has
to take over vnodes from a failing node. leveldb in Riak 1.4 has a static
memory allocation for each vnode. If you suddenly add a bunch of vnodes in a
failure scenario, this machine could run out of memory and also
responses inline in '[ ]'On Oct 21, 2013, at 1:20 PM, Dave King wrote:From the Riak LevelDB Pagehttp://docs.basho.com/riak/latest/ops/advanced/backends/leveldb/"Where server resources allow, the value of max_open_files should exceed the count of .sst table files within the vnode'
question is what happens when that limit is reached
> on a node?
>
> On 10/18/2013 02:21 PM, Matthew Von-Maszewski wrote:
>> The user has the option of setting a default memory limit in the app.config
>> / riak.conf file (either absolute number or percentage of total system
&g
to manage itself?
> Or does it require human babysitting?
>
>
> Sent from my Verizon Wireless 4G LTE Smartphone
>
>
>
> Original message ----
> From: Matthew Von-Maszewski
> Date: 10/18/2013 1:48 PM (GMT-05:00)
> To: Dave Martorana
> Cc: darren
wrote:
> Matthew,
>
> For we who don't quite understand, can you explain - does this mean
> mv-flexcache is a feature that just comes with 2.0, or is it something that
> will need to be turned on, etc?
>
> Thanks!
>
> Dave
>
>
> On Thu, Oct 17, 2013 at
n't riak smart enough to adjust itself to the available memory or
> lack thereof?
>
> No serious enterprise technology should just consume everything and crash.
>
>
> Sent from my Verizon Wireless 4G LTE Smartphone
>
>
>
> ---- Original message
Greetings,
The default config targets 5 servers and 16 to 32G of RAM. Yes, the app.config
needs some adjustment to achieve happiness for you:
- change ring_creation_size from 64 to 16 (remove the % from the beginning of
the line)
- add this line before "{data_root, }" in eleveldb section:
"{m
recommend value for Riak is zero.
On Oct 16, 2013, at 4:28 PM, Alex Rice wrote:
> Just an informal poll- what is preferred for Linux vm.swappiness
> setting for Riak in a *cloud* environment? The default is 60 - a lot
> of stuff gets swapped out. This is good for OS disk cache.
>
> I am thinkin
John,
If you are using leveldb as Riak's backend, keep in mind that each vnode
requires a fixed amount of memory (set via max_open_files and cache_size in
app.config). Recently had one user attempt too many vnodes on a 4Gbyte
machine. He populated data to the point of memory overflow … then m
gist.github.com/gburd/b88aee6da7fee81dc036
> 3. wait for AAE or do a repair by hand with
> http://docs.basho.com/riak/1.3.1/cookbooks/Repairing-KV-Indexes/ to
> get all missing (because deleted) keys?
>
> Is this true what I'm saying?
>
> Cheers
> Simon
>
> On Thu, 2
> Thanks Matthew,
>>
>> but one thing I didn't get. If I delete a sst file.. should I delete
>> (by hand) the MANIFEST file to trigger a repair or is it done
>> automatically within Riak if it detects that a sst file which is
>> referenced in the MANIFEST is missing?
Guido,The attached spreadsheet is best for 1.4. Not sure how I missed getting it posted to the web-site.Riak admin only reports memory usage as known by the Erlang virtual machine. leveldb is outside Erlang's awareness.The best memory measurement on a Linux machine is the /proc//status file. I t
longer retrieved for use
Matthew
On Sep 25, 2013, at 3:52 PM, Timo Gatsonides wrote:
>
>> Date: Wed, 25 Sep 2013 09:15:33 -0400
>> From: Matthew Von-Maszewski
>> ...
>>
>> You can continue to wait for the compactions to purge old data. If you are
>> r
Timo,
Your email again brought up the topic of fixing this issue within Basho's
leveldb. Previously there had always been a "bigger problem" and we did not
worry about tombstone (delete) purging. Today, we have all but one of the
"bigger problems" coded. I therefore have created a plan to tr
Matthew
On Sep 25, 2013, at 9:38 AM, Simon Effenberg wrote:
> On Wed, 25 Sep 2013 09:15:33 -0400
> Matthew Von-Maszewski wrote:
>
>> - run Riak repair on the vnode so that leveldb can create a MANIFEST that
>> matches the files remaining.
>
> what do you mean with
Timo,
Here is an important quote from the Goggle document mentioned in the prior
thread:
"They [compactions] also drop deletion markers if there are no higher
numbered levels that contain a file whose range overlaps the current key."
The quote is from here: http://leveldb.googlecode.com/sv
Timo,
Here are some rough concepts of leveldb:
- Compaction in leveldb is triggered by input activity, not by key age or key
delete.
- A key delete does not actually delete a key, it creates a "tombstone".
- Old values of a key only get purged if two versions of the key happen to
participate i
Shane,
Several points for your consideration:
- "cache_size" is broken in 1.3.x, fixed in 1.4.x. Any cache_size above the
default 8M can ruin performance because some idiot (me) put a linear walk in
the cache code. Reverted for 1.4 and cache_size tested to 2G.
- do you have AAE (active anti-
xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2
> ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida dtherm tpr_shadow vnmi
> flexpriority ept vpid cpufreq
> configuration: cores=4 enabledcores=4 threads=8
>
> Thanks,
>
> Guido.
>
> O
cores per socket + HT
>
> should it be +S 8:8 or +S 32:32 ?
>
> ---
> Jeremiah Peschka - Founder, Brent Ozar Unlimited
> MCITP: SQL Server 2008, MVP
> Cloudera Certified Developer for Apache Hadoop
>
>
> On Wed, Aug 14, 2013 at 4:14 AM, Matthew Von-Maszewski
ng, should I set +S to
> +S 4:4
>
> You hint that it doesn't matter, but I just wanted to trick you into
> explicitly saying something.
>
> ---
> Jeremiah Peschka - Founder, Brent Ozar Unlimited
> MCITP: SQL Server 2008, MVP
> Cloudera Certified Developer for Ap
** The following is copied from Basho's leveldb wiki page:
https://github.com/basho/leveldb/wiki/Riak-tuning-1
Summary:
leveldb has a higher read and write throughput in Riak if the Erlang scheduler
count is limited to half the number of CPU cores. Tests have demonstrated
improvements of 15%
Istvan,
"block_size" is not a "size", it is a threshold. Data is never split across
blocks. A single block contains one or more key/value pairs. leveldb starts a
new block only when the total size of all key/values in the current block
exceed the threshold.
Your must set block_size to a m
Try cutting your max open files in half. I am working from my iPad not my
workstation so my numbers are rough. Will get better ones to you in the
morning.
The math goes like this:
- vnode/partition heap usage is (4Mbytes * (max_open_files -10)) + 8Mbyte
- you have 18 vnodes per server (multi
going to swap
or OOM.
Matthew
On Jul 30, 2013, at 4:51 AM, Christian Rosnes
wrote:
>
>
> On Sun, Jul 28, 2013 at 10:08 PM, Matthew Von-Maszewski
> wrote:
>
> leveldb has two independent caches: file cache and data block cache. You
> have raised the data block cach
Christian,
leveldb has two independent caches: file cache and data block cache. You have
raised the data block cache from its default 8M to 256M per your earlier note.
I would recommend the follow:
{max_open_files, 50}, %% 50 * 4Mbytes allocation for file cache
{cache_size, 10485760
Vladimir Shabanov wrote:
> I prefer second option since it will show are the corrupted blocks related to
> race condition. First option needs to be run for a long time to be completely
> sure that it really fixes the issue.
>
>
> 2013/7/26 Matthew Von-Maszewski
> Vladimir,
Dave,
Glad you are happy.
The truth is that you gained space via the backup/restore process. The data
formats of 1.3.1 and 1.4 are the same.
leveldb only removes dead / old key/values during its background compaction.
It could be days, even weeks in some cases, between when you write fresh
here it was
> found.
>
> Is it possible to somehow find source of those BLOCKS.bad files? I'm building
> Riak from sources, maybe it's possible to enable some additional logging to
> find what these BLOCKS.bad are?
>
>
> 2013/7/25 Matthew Von-Maszewski
> Vladimir
Vladimir,
I can explain what happened, but not how to correct the problem. The gentleman
that can walk you through a repair is tied up on another project, but he
intends to respond as soon as he is able.
We recently discovered / realized that Google's leveldb code does not check the
CRC of ea
eleveldb option parameter is {compression, false}
On Jul 12, 2013, at 5:27 AM, Simon Effenberg wrote:
> Hi @list,
>
> is it somehow possible to disable the Snappy compression used in
> eLevelDB? I would like to try out ZFS compression but it's not that
> useful if snappy is used before.
>
> C
ould how I can run
> a function for all files in directory in Erlang? Running function manually
> for all partitions is too tiresome.
>
>
> 2013/6/13 Matthew Von-Maszewski
> Vladimir,
>
> Also, my colleague sent this to me after my first email. This is roughly the
&
more work to
isolate the bad input file. But I would be happy to be wrong and the above
work as is.
Matthew
On Jun 12, 2013, at 5:13 PM, Matthew Von-Maszewski wrote:
> Vladimir,
>
> I asked around the Basho chat room and you have a crash that has never been
> seen. This should be
Vladimir,
I asked around the Basho chat room and you have a crash that has never been
seen. This should be interesting.
The crash is happening during a compaction, specifically during the creation of
the bloom filter for a new .sst file. Maybe we can isolate the old file that
feeding this co
leveldb_read_block_error info updated within issue #427.
Matthew
On Jun 7, 2013, at 9:22 AM, Brian Shumate wrote:
> Hi Shane,
>
> Thanks for the feedback! I've added an issue[0] to our basho_docs
> repository[1] to get this information into the documentation.
>
> If you have other suggestions
reposting reply originally at: https://news.ycombinator.com/item?id=5833414
As the person doing the github:basho/leveldb work, I agree with the above (very
professional reply, not something you expect to see on the Internet, thank
you). We are optimizing to our individual environments. I do need
Greg,
Wait. Couple serious environmental issues here:
- " results are close enough to LevelDB, Bitcask, …": Bitcask is always 1.5x
to 2x the performance of LevelDB. Bitcask has a constant throughput until its
first merge. Your comment states all the databases are close. Bitcask and
Leveld
Brian
The comments in the ycombinator.com link are correct about leveldb stalling
during writes. That is an unacceptable trait for Riak. We have a custom
branch available to everyone at github.com: "basho/leveldb". The branch
contains all code changes through 1.9 of Google's code. It also
Also, in riak_kv section of app.config, change:
{anti_entropy, {on, []}},
to
{anti_entropy, {off, []}},
Matthew
On Apr 11, 2013, at 10:24 PM, thefosk wrote:
> OS process not running. I think that the whole cluster crashes because the
> other nodes suddenly experienc
The fastest fix is to lower the cache_size in app.config to 8,388,608 and see
what happens.
Next is to lower the max_open_files to 30 in app.config.
7G Ram is tight.
Matthew
On Apr 11, 2013, at 10:24 PM, thefosk wrote:
> OS process not running. I think that the whole cluster crashes becau
The "Latency" values are the amount of time it takes an operation to be sent to
riak and the response be received by Basho Bench. All values are from the view
point of Basho Bench as a client.
Yes, the 99% curve means that 99% of the request/receive operation cycles have
been below this valu
Paul,
There is a "tool" that will dump statistics for a given .sst file, including
compression ratio. It is really designed for in-house usage, so it can be a
pain to build (just being honest, and setting expectations).
1. you have download and install libsnappy. Just having it embedded withi
resting =) In production, my average document size is
> probably about 2k and I have tens of millions and soon to be hundreds of
> millions of them.
>
> Thanks!
> -Ben
>
>
> On Wed, Apr 10, 2013 at 6:22 AM, Matthew Von-Maszewski
> wrote:
> Greetings Ben,
>
&
Greetings Ben,
Also, leveldb stores data in "levels". The very first storage level and the
runtime data recovery log are not compressed.
That said, I agree with Tom that you are most likely seeing Riak store 3 copies
of your data versus only one for mongodb. It is possible to dumb down Riak s
If you are not adverse to building from source, you might like the leveldb
updates slated for 1.4 Riak. There is an emphasis on write throughput. The
best branch for trial would be mv-level-work2 in github's basho/leveldb
repository. I suspect you will find about 50% improvement in data inges
Ingo,
I have two guesses that might explain the symptoms:
- there is a bad drive in one the nodes, or
- one or more nodes begins to use swap space during a compaction or 2i
iteration.
I might be able to describe / isolate the problem by examining the "LOG" files
produced by leveldb. Wou
t hardware co-located at two major data centers.
>
> Does this help?
>
> Cheers,
> Bryan
>
>
> On 3/6/13 4:14 PM, Matthew Von-Maszewski wrote:
>> Just curious, what is the typical size and the overall range of sizes for
>> your image data?
>>
>>
Just curious, what is the typical size and the overall range of sizes for your
image data?
Matthew
On Mar 6, 2013, at 6:08 PM, Bryan Hughes wrote:
> Hi Everyone,
>
> I am building a new 5 node cluster with 1.3.0 and am transitioning from
> Bitcask to LevelDB (or perhaps a Mulit with LevelDB
client connections per server and with 6
> servers in a cluster probably not so much (but correct me if I'm wrong)
> interconnection links so I would use a bigger ulimit for sure but
> 30,000? :)
>
> Cheers,
> Simon
>
> On Mon, 4 Feb 2013 08:47:21 -0500
> Matthew Von
eduler whereas this post:
> http://riak-users.197444.n3.nabble.com/Riak-performance-problems-when-LevelDB-database-grows-beyond-16GB-tp4025608p4025622.html
> is talking about deadline for spinning disks (what we will have). So
> who is right or who is outdated?
>
> Thanks again f
> That would be the maximum amount of open files a server can handle
> (per vnode), am I right? But now, is this enough? Or how to calculate
> 50% temporary server loss (3 of 6) and how is the count of keys/values
> is taking into account? I'm somehow lost :(
>
> Cheers
&g
First: Step 2 is talking about how many vnodes exist on a physical server. If
your ring size is 256, but you have 8 servers … then your vnode count for step
2 is 32.
Second: the 2048 is a constant forced by Google's leveldb implementation. It
is the portion of a file covered by a single blo
What version of Riak?
Likely you need to take the node offline and run repair.
Matthew
On Jan 11, 2013, at 4:50 AM, Shane McEwan wrote:
> G'day!
>
> I posted this to the LevelDB mailing list with little success. Apologies if
> you've already seen this from there.
>
> We've started getting
FYI: my theory of the moment (until LOG files arrive) is that maybe a couple
of the machines are using the operating system swap file during the list_keys
operation. That would explain everything. But maybe you have already ruled
that out?
Matthew
On Jan 8, 2013, at 2:53 PM, Parnell Spring
Parnell,
I confirmed with the Basho team that "list_keys" is a read only process. Yes,
some read operations would initiate compactions in Riak 1.1, but you have
1.2.1. I therefore suspect that there is a secondary issue.
Would you mind gathering the LOG files from one of the machines that y
Parnell,
Would appreciate some configuration info:
- what version of Riak are you running?
- would you copy/paste the eleveldb section of your app.config?
- how many vnodes and physical servers are you running?
- what is hardware? cpu, memory, disk arrays
- are you seeing the work "waiting" i
hese
> errors in application layer? automatically by riak?
>
> Alex
>
>
> On Mon, Nov 26, 2012 at 2:09 PM, Matthew Von-Maszewski
> wrote:
> Alex,
>
> The eleveldb backend creates a CRC for every item placed on the disk. You
> can activate the test of th
On 11/26/12 1:09 PM, "Matthew Von-Maszewski" wrote:
>
>
>> Alex,
>>
>> The eleveldb backend creates a CRC for every item placed on the disk.
>> You can activate the test of the CRC on every read by adding:
>>
>> {verify_checksums, true},
Alex,
The eleveldb backend creates a CRC for every item placed on the disk. You can
activate the test of the CRC on every read by adding:
{verify_checksums, true},
to the "{eleveldb " portion of app.config. With riak 1.2, you must manually
monitor each vnode directory for the lost/BLOCKS
rithm.
Matthew
On Nov 21, 2012, at 10:52 AM, wrote:
> -- Původní zpráva --
>> Od: Matthew Von-Maszewski
>> Datum: 22. 10. 2012
>> Předmět: Re: Riak performance problems when LevelDB database grows beyond
>> 16GB
>> Jan,
>>
>> ...
>&g
See:
http://docs.basho.com/riak/latest/tutorials/choosing-a-backend/LevelDB/
Look for the section titled "Parameter Planning". It has the best content.
Keep in mind that leveldb maps most of its files into memory. So the RSS
(resident set size) is BOTH the memory you allocate via parameters a
on 'waiting'. Interesting.
> I'll send the merged log separately.
>
> Dave
>
> --
> Dave Lowell
> d...@connectv.com
>
> On Nov 14, 2012, at 10:43 AM, Matthew Von-Maszewski wrote:
>
>> Dave,
>>
>> Ok, heavy writes. Let's s
> of parallel processes generating those writes (so not a lot of flow control
> if Riak bogs down, at least not yet). It's probably pushing Riak fairly hard.
>
> Dave
>
> --
> Dave Lowell
> d...@connectv.com
>
> On Nov 14, 2012, at 8:51 AM, Matthew Von-Maszewski wr
Dave,
Just getting my head back into the game. Was away for a few days. Random
thought, maybe there is a hard drive with a read problem. That can cause
issues similar to this. 1.2.1 does NOT percolate the read errors seen in
leveldb to riak-admin (yes, that should start to happen in 1.3).
6 physical drives.
>
> Thanks again for looking into this.
>
> D
>
>
> On Tue, Nov 6, 2012 at 3:19 PM, Matthew Von-Maszewski
> wrote:
> Dietrich,
>
> I finally reviewed your LOG.all today. The basic analysis is:
>
> - you have a really fast disk subsys
running 9 physical notes with a ring size
> of 64. Each with a 2.93Ghz 8-core Xeon and ~2 TB of RAID0 SSD storage across
> 6 physical drives.
>
> Thanks again for looking into this.
>
> D
>
>
> On Tue, Nov 6, 2012 at 3:19 PM, Matthew Von-Maszewski
> wrote:
>
101 - 200 of 210 matches
Mail list logo