Re: big cache vs. many partitions and replicas placement

2013-01-29 Thread Simon Effenberg
st 99.9 percent is > > distributed over different physical nodes? > > > > Cheers, > > Simon > > ___ > > riak-users mailing list > > riak-users@lists.basho.com > > http://lists.basho.com/mailman/listinfo/riak-u

Re: [ANNC] Riak 1.3.0 RC1

2013-01-30 Thread Simon Effenberg
On Wed, 30 Jan 2013 21:22:33 -0800 Joseph Blomstedt wrote: > The main issue is that Search data > doesn't include anything equivalent to vector clocks. If two replicas > have divergent data, who wins? so what happens if I have (because of a temporary split brain and a write to the key on both br

Parameter Planning (eleveldb)

2013-02-03 Thread Simon Effenberg
Hi, I'm not sure if I understand this all well to calculate the memory usage per file and other stuff. The webpage tells me some steps but I'm completly unsure if I understand all parameters. "Step 1: Calculate Available Working Memory" taking the example: leveldb_working_memory = 32G * (1 -

Re: Parameter Planning (eleveldb)

2013-02-03 Thread Simon Effenberg
ck size. The block_size only governs the minimum written > (aggregate size of small values that must be written as one unit at minimum). > > Use 104Mbyte for your average sst file size. It is "good enough" > > > I am not following the question stream for S

Re: Parameter Planning (eleveldb)

2013-02-03 Thread Simon Effenberg
> What questions remain? > > Matthew > > > On Feb 3, 2013, at 5:44 PM, Simon Effenberg wrote: > > > Hi Matthew, > > > > thanks a lot! > > > > So now I have: > > > > 6 nodes each having 32GB RAM: > > > > vnode_workin

Re: Parameter Planning (eleveldb)

2013-02-05 Thread Simon Effenberg
line. Different people have told me they found each of them best for > spinning hard drives. However, there seems to be more on-line discussion in > recent months for using deadline for spinning and noop (plus other settings) > for SSD drives. Again, I feel your biggest gain is in not

Re: Parameter Planning (eleveldb)

2013-02-05 Thread Simon Effenberg
> Am 05.02.2013 14:10, schrieb Matthew Von-Maszewski: > > 30,000: So that you ever have to think about it again. > > > > Matthew > > > > > > On Feb 5, 2013, at 3:54, Simon Effenberg wrote: > > > >> Hey Matthew, > >> > >> tha

How is the Java client handling a crashed node which comes back with an empty datadir?

2013-02-21 Thread Simon Effenberg
Hi list, while testing riak in our environment I reinstalled one riak servers out of 2 and when it was back again it didn't know anything about it's old cluster. So my situation was: 1 node knowing both 1 node knowing only itself the java client was writing into the nodes (both of them) so that

Re: [ANN] Riak 1.3 Released

2013-02-21 Thread Simon Effenberg
t; Now, onto 1.4... > > The Basho Team > twitter.com/basho > > [1] http://basho.com/introducing-riak-1-3/ > > [2] http://info.basho.com/IntroToRiakMarch7.html > > [3] http://ricon.io/east.html -- Simon Effenberg | Site Ops Engineer | mobile.international GmbH Fon:

Re: [ANN] Riak 1.3 Released

2013-02-21 Thread Simon Effenberg
ulimit, you can easily modify the > riakscript to not warn you. > > -Jared > > On Thu, Feb 21, 2013 at 9:00 AM, Simon Effenberg > wrote: > > > Cool! > > > > But now I see this WARNING: > > > > WARNING: ulimit -n is 1024; 4096 is the recommende

after raising of n_val all keys exists multiple times in ?keys=true

2013-03-06 Thread Simon Effenberg
Hi, we changed the n_val of a bucket from 3 to 12. If we are now doing this: riak:8098/riak/config?keys=true or riak:8098/buckets/config/keys?keys=true we get some keys multiple times. Getting the content works well but we can't rely on the output (or have to sort/uniq the output). Is this a no

Re: after raising of n_val all keys exists multiple times in ?keys=true

2013-03-07 Thread Simon Effenberg
ak/config?keys=true > > or > > riak:8098/buckets/config/keys?keys=true > > > > we get some keys multiple times. Getting the content works > > well but we can't rely on the output (or have to sort/uniq the output). > > > > Is this a normal behavior or is it a b

Re: after raising of n_val all keys exists multiple times in ?keys=true

2013-03-07 Thread Simon Effenberg
again to 3 could be the problem. But I have no clue why this should be the case. Cheers, Simon On Thu, 7 Mar 2013 11:30:40 +0100 Simon Effenberg wrote: > Hi Mark, > > we have 12 Riak nodes running. The exact command for getting keys is: > > curl http://localhost:8098/buckets/c

Re: after raising of n_val all keys exists multiple times in ?keys=true

2013-03-07 Thread Simon Effenberg
Any idea what happened? We have had to remove the riak db and started from scratch to get rid of the ghost keys... On Thu, 7 Mar 2013 11:35:12 +0100 Simon Effenberg wrote: > Now we see only 3 occurrences of the keys. So maybe the reducing of the > n_val could be a problem.. after we remov

stalled handoffs with riak 1.3.1 and eleveldb

2013-05-11 Thread Simon Effenberg
Hi list, this morning I did a "riak-admin transfers" on the riak machines and saw this: [root@kriak46-1:~]# riak-admin transfers Attempting to restart script through sudo -H -u riak 'riak@10.47.109.206' waiting to handoff 30 partitions 'riak@10.47.109.202' waiting to handoff 3 partitions 'riak@10

Re: stalled handoffs with riak 1.3.1 and eleveldb

2013-05-13 Thread Simon Effenberg
UTC =ERROR REPORT ** Node 'riak@10.47.109.205' not responding ** this "could" be the start of the problem and we have had some weird network issues between to DC's at this timeframe with some broken TCP connections. But it looks like Riak wasn't able to get o

Disable "snappy" compression within eLevelDB

2013-07-12 Thread Simon Effenberg
Hi @list, is it somehow possible to disable the Snappy compression used in eLevelDB? I would like to try out ZFS compression but it's not that useful if snappy is used before. Cheers Simon ___ riak-users mailing list riak-users@lists.basho.com http://l

reducing n_val

2013-07-16 Thread Simon Effenberg
Hi @list, is it possible to reduce the n_val? I mean you can change it but what happens then? - is old data removed or is it laying around? (or maybe deleted after some time thanks to AAE?) - is it causing problems like: - reducing n_val from 3 to 2 - update data - sleep 10minutes - re

TCP recv timeout and handoffs almost all the time

2013-07-18 Thread Simon Effenberg
Hi @list, I see sometimes logs talking about "hinted_handoff transfer of .. failed because of TCP recv timeout". Also riak-admin transfers shows me many handoffs (is it possible to give some insights about "how many" handoffs happened through "riak-admin status"?). - Is it a normal behavior to

Re: TCP recv timeout and handoffs almost all the time

2013-07-18 Thread Simon Effenberg
ng to handoff 12 partitions 'riak@10.46.109.205' waiting to handoff 12 partitions 'riak@10.46.109.204' waiting to handoff 13 partitions 'riak@10.46.109.203' waiting to handoff 12 partitions 'riak@10.46.109.202' waiting to handoff 17 partitions 'riak@10.46.109

Re: TCP recv timeout and handoffs almost all the time

2013-07-18 Thread Simon Effenberg
30:51.644 UTC [info] <0.8497.66>@riak_core_handoff_sender:start_fold:192 hinted_handoff transfer of riak_kv_vnode from 'riak@10.46.109.207' 713623846352979940529142984724747568191373312000 to 'riak@10.46.109.206' 713623846352979940529142984724747568191373312000 completed: sent 55 objects in 287

Re: TCP recv timeout and handoffs almost all the time

2013-07-18 Thread Simon Effenberg
ng is > similar to what happened to us when we upgraded to 1.4. > > HTH, > > Guido. > > On 18/07/13 19:21, Simon Effenberg wrote: > > Hi @list, > > > > I see sometimes logs talking about "hinted_handoff transfer of .. failed > > because of TCP recv

Re: TCP recv timeout and handoffs almost all the time

2013-07-19 Thread Simon Effenberg
-admin diag > and see the new recommended kernel parameters, also, on vm.args > uncomment the +zdbbl 32768 parameter, since what you are describing is > similar to what happened to us when we upgraded to 1.4. > > HTH, > > Guido. > > On 18/07/13 19:21, Simon Effenberg wrote:

Re: TCP recv timeout and handoffs almost all the time

2013-07-19 Thread Simon Effenberg
other once.. I need to have a look and probably try to enforce an "unlimited" process limit. Cheers Simon On Fri, 19 Jul 2013 09:24:07 +0200 Simon Effenberg wrote: > The +zdbbl parameter helped a lot but the hinted handoffs didn't > disappear completely. I have no more bus

Re: TCP recv timeout and handoffs almost all the time

2013-07-19 Thread Simon Effenberg
adding the > following line to the vm.args file: > > +P 262144 > > Best regards, > > Christian > > > > On 19 Jul 2013, at 08:24, Simon Effenberg wrote: > > > The +zdbbl parameter helped a lot but the hinted handoffs didn't > > disappe

Re: TCP recv timeout and handoffs almost all the time

2013-07-19 Thread Simon Effenberg
me case for Bitcask? Cheers Simon On Fri, 19 Jul 2013 10:25:05 +0200 Simon Effenberg wrote: > once again with the list included... argh > > Hey Christian, > > so it could be also a erlang limit? I found out why my riak instances > are all having different processlimi

Re: TCP recv timeout and handoffs almost all the time

2013-07-19 Thread Simon Effenberg
bytes Max nice priority 00 Max realtime priority 00 Max realtime timeout unlimitedunlimitedus On Fri, 19 Jul 2013 16:08:44 +0200 Simon Effenberg wrote:

Re: TCP recv timeout and handoffs almost all the time

2013-07-19 Thread Simon Effenberg
only after restarting the Riak instance on this node the awaiting handoffs where processed.. this is weird :( On Fri, 19 Jul 2013 15:55:43 +0200 Simon Effenberg wrote: > It looked good for some hours but now again we got > > 2013-07-19 13:27:07.800 UTC [error] >

Re: TCP recv timeout and handoffs almost all the time

2013-07-20 Thread Simon Effenberg
[{1404411729622664522961353393938303214200622678016,notfound}],[{{1398702738851840683437120250060505233655091691520,'riak@10.46.109.206'},primary},{{1404411729622664522961353393938303214200622678016,'riak@10.47.109.209'},primary},{{1410120720393488362485586537816101194746153664512,'riak@10.47.109.

find big object crashing riak

2013-08-28 Thread Simon Effenberg
Hi, we have suddenly (regarding to riak stats) really big objects (100th percentile of object size) of 400MB to 900MB. We have no clue from where or how this could be.. is it somehow possible to identify them easily? Cheers Simon ___ riak-users mailin

Re: find big object crashing riak

2013-08-28 Thread Simon Effenberg
://docs.basho.com/riak/latest/ops/running/stats-and-monitoring/ > > -- > Sam Elliott > Engineer > sam.elli...@basho.com (mailto:sam.elli...@basho.com) > -- > > > On Wednesday, 28 August 2013 at 6:50PM, Simon Effenberg wrote: > > > Hi, > > > > we

Re: find big object crashing riak

2013-08-29 Thread Simon Effenberg
bject even if I search with the exakt createdat_int index which is returned by a HEAD/GET request to the object itself. Any help is much appreciated.. Simon On Thu, 29 Aug 2013 07:26:31 +0200 Simon Effenberg wrote: > allow_multi is on and I looked into the graphs.. we have had some > sib

Re: find big object crashing riak

2013-08-29 Thread Simon Effenberg
it be possible to do the GET with a R=1 on the node with the big object so that it is not a huge problem? Cheers Simon > > Sam > > -- > Sam Elliott > Engineer > sam.elli...@basho.com > -- > > > On Thursday, 29 August 2013 at 3:18AM, Simon Effenberg wr

Re: find big object crashing riak

2013-08-29 Thread Simon Effenberg
equesting with an R=1 may not give you exactly what you want, as N requests > will be made, but only R will be waited-upon. > > Sam > > -- > Sam Elliott > Engineer > sam.elli...@basho.com > -- > > > On Thursday, 29 August 2013 at 8:53AM, Simon Effenberg w

Re: find big object crashing riak

2013-08-29 Thread Simon Effenberg
Sadly one riak node after another is dying because of "too less" ram.. any other idea besides the r_object would be cool.. On Thu, 29 Aug 2013 15:49:47 +0200 Simon Effenberg wrote: > Thanks! But could you explain in detail how you came to this numbers > from my previous mail?

Re: find big object crashing riak

2013-08-29 Thread Simon Effenberg
ystem, or allow you to write one yourself. > > Sam > > [1]: http://learnyousomeerlang.com/starting-out-for-real > [2]: https://github.com/basho/riak_kv > [3]: https://github.com/basho/riak_kv/blob/develop/src/riak_object.erl#L44-L51 > [4]: https://gist.github.com/lenary/63785

Re: Deleting data from LevelDB backend

2013-09-25 Thread Simon Effenberg
On Wed, 25 Sep 2013 09:15:33 -0400 Matthew Von-Maszewski wrote: > - run Riak repair on the vnode so that leveldb can create a MANIFEST that > matches the files remaining. what do you mean with this? Wait for AAE? Request each key to trigger read repair or do I miss something? We are in a simil

Re: Deleting data from LevelDB backend

2013-09-26 Thread Simon Effenberg
uite reasonable by 1.3 and 1.4. > > A discussion about repair can be found here: > > https://github.com/basho/leveldb/wiki/repair-notes > > Matthew > > > On Sep 25, 2013, at 9:38 AM, Simon Effenberg > wrote: > > > On Wed, 25 Sep 2013 09:15:33 -0400 &g

Re: Deleting data from LevelDB backend

2013-09-26 Thread Simon Effenberg
hu, 26 Sep 2013 12:53:40 +0200 Simon Effenberg wrote: > Thanks Matthew, > > but one thing I didn't get. If I delete a sst file.. should I delete > (by hand) the MANIFEST file to trigger a repair or is it done > automatically within Riak if it detects that a sst file which is &g

Re: Deleting data from LevelDB backend

2013-09-26 Thread Simon Effenberg
ttps://gist.github.com/gburd/b88aee6da7fee81dc036 > > This repair sequence is different than the one you are quoting. > > > You do not need to manually delete the MANIFEST file. The repair sequence > will do that anyway. > > Matthew > > > On Sep 26, 2013, at 7:04 AM, Si

Re: Ownership question.

2013-09-27 Thread Simon Effenberg
___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com -- Simon Effenberg | Site Ops Engineer | mobile.international GmbH Fon: + 49-(0)30-8109 - 7173 Fax: + 49-(0)30-8109 - 7131 Mail: seffenb...@tea

Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-10 Thread Simon Effenberg
Hi @list, I'm trying to upgrade our Riak cluster from 1.3.1 to 1.4.2 .. after upgrading the first node (out of 12) this node seems to do many merges. the sst_* directories changes in size "rapidly" and the node is having a disk utilization of 100% all the time. I know that there is something like

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-10 Thread Simon Effenberg
e compaction cascades, but that > is not going to help you today. > > Matthew > > On Dec 10, 2013, at 10:26 AM, Simon Effenberg > wrote: > > > Hi @list, > > > > I'm trying to upgrade our Riak cluster from 1.3.1 to 1.4.2 .. after > > upgrading the fi

Re: Stalled handoffs on a prod cluster after server crash

2013-12-10 Thread Simon Effenberg
z reçu ce courriel par erreur, veuillez nous en aviser immédiatement en > y répondant, puis supprimer ce message de votre système. Veuillez ne pas le > copier, l’utiliser pour quelque raison que ce soit ni divulguer son contenu > à quiconque. > This email is confidential and may als

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-10 Thread Simon Effenberg
nes through the levels to > free up disk space much more quickly (especially if you perform block deletes > every now and then). > > Matthew > > > On Dec 10, 2013, at 10:44 AM, Simon Effenberg > wrote: > > > Hi Matthew, > > > > see inline.. > >

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-11 Thread Simon Effenberg
node_put_fsm_time_median : 1614 node_put_fsm_time_95 : 8789 node_put_fsm_time_99 : 38258 node_put_fsm_time_100 : 384372 any clue why this could/should be? Cheers Simon On Tue, 10 Dec 2013 17:21:07 +0100 Simon Effenberg wrote: > Hi Matthew, > > thanks!.. that answers my questions! > > Cheers >

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-11 Thread Simon Effenberg
in each node? What are > your settings for max_open_files and cache_size in the app.config file? > Maybe this is as simple as leveldb using too much RAM in 1.4. The memory > accounting for maz_open_files changed in 1.4. > > Matthew Von-Maszewski > > > On Dec 11, 2

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-11 Thread Simon Effenberg
r your leveldb file cache is thrashing (opening and closing > multiple files per request). > > How many servers do you have and do you use Riak's active anti-entropy > feature? I am going to plug all of this into a spreadsheet. > > Matthew Von-Maszewski > > > On

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-11 Thread Simon Effenberg
f response time for put (avg over all nodes): before upgrade: 60ms after upgrade: 1548ms but this is only because of 2 of 12 nodes are on 1.4.2 and are really slow (17000ms) Cheers, Simon On Wed, 11 Dec 2013 13:45:56 +0100 Simon Effenberg wrote: > Sorry I forgot t

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-11 Thread Simon Effenberg
hat can I do? Cheers Simon PS: transfers output: 'riak@10.46.109.202' waiting to handoff 17 partitions 'riak@10.46.109.201' waiting to handoff 19 partitions (these are the 1.4.2 nodes) On Wed, 11 Dec 2013 14:39:58 +0100 Simon Effenberg wrote: > Also some side notes: &g

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-11 Thread Simon Effenberg
nd normally we access mostly new/active/hot data and not all the old ones.. besides this we have a job doing a 2i query every 5mins and another one doing this maybe once an hour.. both don't work while the upgrade is ongoing (2i isn't working). Cheers Simon > > Matthew > >

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-11 Thread Simon Effenberg
RL_FULLSWEEP_AFTER' => 0, + # Force the erlang VM to use SMP + '-smp' => 'enable', + # Cheers Simon > > Matthew > > > > On Dec 11, 2013, at 9:48 AM, Simon Effenberg > wrote:

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-11 Thread Simon Effenberg
t; the Erlang layer. > > Summary: > - try increasing max_open_files to 170 > - helps: try setting sst_block_size to 32768 in app.config > - does not help: try removing +S from vm.args > > Matthew > > On Dec 11, 2013, at 1:58 PM, Simon Effenberg > wrote: >

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-11 Thread Simon Effenberg
s included in the upcoming 1.4.4 maintenance release (which is > overdue so I am not going to bother guessing when it will actually arrive). > > Matthew > > On Dec 11, 2013, at 2:47 PM, Simon Effenberg > wrote: > > > I will do.. > > > > but one other thing:

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-11 Thread Simon Effenberg
the stats > process. It will then be automatically started, without the stuck data. > > exit(whereis(riak_core_stat_calc_sup), kill), profit(). > > Matthew > > On Dec 11, 2013, at 4:50 PM, Simon Effenberg > wrote: > > > So I think I have no real cha

restarted node tells me about to handoff partitions (changing format to v1)

2013-12-16 Thread Simon Effenberg
Hi, after an upgrade from 1.3.1 to 1.4.2 I changed the object format from v0 to v1 and began a rolling restart. But restarting the first node shows me some weird outputs: 'riak@10.46.109.201' waiting to handoff 132 partitions this is the "restarted" node and the partitions is switching between 1

Re: Load balancer

2014-02-05 Thread Simon Effenberg
Hi, I think the only "save" way is to fetch a key out of a bucket. Otherwise a fresh reinstalled node with a fresh installed riak without any data would accept _writes_ to it even before it is in the cluster. So I would always do the "fetch key" check. (The backend has to be up as well for that,

Re: RIAK 1.4.6 - Mass key deletion

2014-07-20 Thread Simon Effenberg
etrics > regarding riak or event the servers where it's running? > > > > Best regards! > > ___ > > riak-users mailing