Good to know, thanks Peter. I am worried about client-to-node latency if I have to do 20,000 individual queries, but that makes it clearer that at least batching in smaller sizes is a good idea.
On Wed, Jun 11, 2014 at 6:34 PM, Peter Sanford <psanf...@retailnext.net> wrote: > On Wed, Jun 11, 2014 at 10:12 AM, Jeremy Jongsma <jer...@barchart.com> > wrote: > >> The big problem seems to have been requesting a large number of row keys >> combined with a large number of named columns in a query. 20K rows with 20K >> columns destroyed my cluster. Splitting it into slices of 100 sequential >> queries fixed the performance issue. >> >> When updating 20K rows at a time, I saw a different issue - >> BrokenPipeException from all nodes. Splitting into slices of 1000 fixed >> that issue. >> >> Is there any documentation on this? Obviously these limits will vary by >> cluster capacity, but for new users it would be great to know that you can >> run into problems with large queries, and how they present themselves when >> you hit them. The errors I saw are pretty opaque, and took me a couple days >> to track down. >> >> > The first thing that comes to mind is the Multiget section on the Datastax > anti-patterns page: > http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningAntiPatterns_c.html?scroll=concept_ds_emm_hwl_fk__multiple-gets > > > > -psanford > > >