batch_size_warn_threshold_in_kb

2014-12-11 Thread Mohammed Guller
Hi - The cassandra.yaml file has property called batch_size_warn_threshold_in_kb. The default size is 5kb and according to the comments in the yaml file, it is used to log WARN on any batch size exceeding this value in kilobytes. It says caution should be taken on increasing the size of this

Re: batch_size_warn_threshold_in_kb

2014-12-11 Thread Ryan Svihla
on, this helps flag those cases of misuse. On Thu, Dec 11, 2014 at 2:43 PM, Mohammed Guller wrote: > > Hi – > > The cassandra.yaml file has property called *batch_size_warn_threshold_in_kb. > * > > The default size is 5kb and according to the comments in the yaml file, it >

Re: batch_size_warn_threshold_in_kb

2014-12-11 Thread Shane Hansen
roperty called *batch_size_warn_threshold_in_kb. > * > > The default size is 5kb and according to the comments in the yaml file, it > is used to log WARN on any batch size exceeding this value in kilobytes. It > says caution should be taken on increasing the size of this threshold as it &

RE: batch_size_warn_threshold_in_kb

2014-12-11 Thread Mohammed Guller
@cassandra.apache.org Subject: Re: batch_size_warn_threshold_in_kb Nothing magic, just put in there based on experience. You can find the story behind the original recommendation here https://issues.apache.org/jira/browse/CASSANDRA-6487 Key reasoning for the desire comes from Patrick McFadden: "Yes

Re: batch_size_warn_threshold_in_kb

2014-12-11 Thread Jens Rantil
g those cases of misuse. > On Thu, Dec 11, 2014 at 2:43 PM, Mohammed Guller > wrote: >> >> Hi – >> >> The cassandra.yaml file has property called *batch_size_warn_threshold_in_kb. >> * >> >> The default size is 5kb and according to the comments in the ya

Re: batch_size_warn_threshold_in_kb

2014-12-12 Thread Ryan Svihla
ohammed > > > > *From:* Ryan Svihla [mailto:rsvi...@datastax.com] > *Sent:* Thursday, December 11, 2014 12:56 PM > *To:* user@cassandra.apache.org > *Subject:* Re: batch_size_warn_threshold_in_kb > > > > Nothing magic, just put in there based on experience. Y

Re: batch_size_warn_threshold_in_kb

2014-12-12 Thread Ryan Svihla
up for debate." >> >> It's totally changeable, however, it's there in no small part because so >> many people confuse the BATCH keyword as a performance optimization, this >> helps flag those cases of misuse. >> >> On Thu, Dec 11, 2014 at 2:43 PM, M

Re: batch_size_warn_threshold_in_kb

2014-12-12 Thread Jonathan Haddad
> > > *From:* Ryan Svihla [mailto:rsvi...@datastax.com] > *Sent:* Thursday, December 11, 2014 12:56 PM > *To:* user@cassandra.apache.org > *Subject:* Re: batch_size_warn_threshold_in_kb > > > > Nothing magic, just put in there based on experience. You can find the > s

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Jack Krupansky
, 2014 12:58 PM To: user@cassandra.apache.org ; Ryan Svihla Subject: Re: batch_size_warn_threshold_in_kb The really important thing to really take away from Ryan's original post is that batches are not there for performance. The only case I consider batches to be useful for is when you absolut

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Eric Stevens
gt;> >> >> In addition, Patrick is saying that he does not recommend more than 100 >> mutations per batch. So why not warn users just on the # of mutations in a >> batch? >> >> >> >> Mohammed >> >> >> >> *From:* Ryan Svi

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Jonathan Haddad
ore clear statement of intent and > non-intent for BATCH. > > -- Jack Krupansky > > *From:* Jonathan Haddad > *Sent:* Friday, December 12, 2014 12:58 PM > *To:* user@cassandra.apache.org ; Ryan Svihla > *Subject:* Re: batch_size_warn_threshold_in_kb > > The really important thing

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Ryan Svihla
Ryan, >>> >>> Thanks for the quick response. >>> >>> >>> >>> I did see that jira before posting my question on this list. However, I >>> didn’t see any information about why 5kb+ data will cause instability. 5kb >>> or e

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Ryan Svihla
verhead - in fact it'll do the >>> opposite. If you're trying to do that, instead perform many async >>> queries. The overhead of batches in cassandra is significant and you're >>> going to hit a lot of problems if you use them excessively (timeou

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Jonathan Haddad
connections, and to have that be dynamic based >> on overall cluster load. >> >> I would also note that the example in the spec has multiple inserts with >> different partition key values, which flies in the face of the admonition >> to to refrain from using server-sid

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Ryan Svihla
verall cluster load. >>> >>> I would also note that the example in the spec has multiple inserts with >>> different partition key values, which flies in the face of the admonition >>> to to refrain from using server-side distribution of requests. >>> >>> At a minimum the CQL

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Eric Stevens
atch”, which is >>>> simply a way to collect “batches” of operations in the client/driver and >>>> then let the driver determine what degree of batching and asynchronous >>>> operation is appropriate. >>>> >>>> It might also be nice t

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Jonathan Haddad
ptimize performance. Using batches to optimize performance is usually not >>>>> successful, as described in Using and misusing batches section. For >>>>> information about the fastest way to load data, see "Cassandra: Batch >>>>> loading without th

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Eric Stevens
formance is usually not >>>>> successful, as described in Using and misusing batches section. For >>>>> information about the fastest way to load data, see "Cassandra: Batch >>>>> loading without the Batch keyword."” >>>>> >>

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Eric Stevens
er >>>>>> coordinator/replicas. However, because of the distributed nature of >>>>>> Cassandra, spread requests across nearby nodes as much as possible to >>>>>> optimize performance. Using batches to optimize performance is usually >>>>>>

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Eric Stevens
t with your >>>>>>> statement. >>>>>>> >>>>>>> See: >>>>>>> https://cassandra.apache.org/doc/cql3/CQL.html >>>>>>> >>>>>>> I see the spec as gospel – if it’s not accurate, let’s propose a >

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Jonathan Haddad
ps between the client and the server (and sometimes >>>>>>> between the server coordinator and the replicas) when batching multiple >>>>>>> updates.” Is the spec inaccurate? I mean, it seems in conflict with your >>>>>>> stateme

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Jonathan Haddad
t;>>>> trying to lump queries together to reduce network & server overhead - >>>>>>>> in >>>>>>>> fact it'll do the opposite”, but I would note that the CQL3 spec says “ >>>>>>>> The BATCH statement ... serves

Re: batch_size_warn_threshold_in_kb

2014-12-15 Thread Eric Stevens
>>>> Jonathan says “It is absolutely not going to help you if you're >>>>>>>> trying to lump queries together to reduce network & server overhead - >>>>>>>> in >>>>>>>> fact it'll do the opposite”, but I w

Re: batch_size_warn_threshold_in_kb

2014-12-15 Thread Jonathan Haddad
gt;> have 100 servers, and perform a mutation on 100 partitions, you could >>>>>>>> have >>>>>>>> a coordinator that's >>>>>>>> >>>>>>>> 1) talking to every machine in the cluster and >>&

Re: batch_size_warn_threshold_in_kb

2014-12-16 Thread Eric Stevens
tax.com/dev/blog/cassandra-2-1-now- >>>>>>>> over-50-faster) which massively helps performance. It provides >>>>>>>> the benefit of batches but without the coordinator overhead. >>>>>>>> >>>>>>>> Ca