Good to know, thanks Peter. I am worried about client-to-node latency if I
have to do 20,000 individual queries, but that makes it clearer that at
least batching in smaller sizes is a good idea.


On Wed, Jun 11, 2014 at 6:34 PM, Peter Sanford <psanf...@retailnext.net>
wrote:

> On Wed, Jun 11, 2014 at 10:12 AM, Jeremy Jongsma <jer...@barchart.com>
> wrote:
>
>> The big problem seems to have been requesting a large number of row keys
>> combined with a large number of named columns in a query. 20K rows with 20K
>> columns destroyed my cluster. Splitting it into slices of 100 sequential
>> queries fixed the performance issue.
>>
>> When updating 20K rows at a time, I saw a different issue -
>> BrokenPipeException from all nodes. Splitting into slices of 1000 fixed
>> that issue.
>>
>> Is there any documentation on this? Obviously these limits will vary by
>> cluster capacity, but for new users it would be great to know that you can
>> run into problems with large queries, and how they present themselves when
>> you hit them. The errors I saw are pretty opaque, and took me a couple days
>> to track down.
>>
>>
> The first thing that comes to mind is the Multiget section on the Datastax
> anti-patterns page:
> http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningAntiPatterns_c.html?scroll=concept_ds_emm_hwl_fk__multiple-gets
>
>
>
> -psanford
>
>
>

Reply via email to