Re: really bad select performance

David Leimbach Thu, 05 Apr 2012 21:15:35 -0700

But now when you set to 0 that index row will get very wide as it collects
everything completed.  You may want to consider deleting the indexed column
for completed rows when done.


Cassandra is not a great queue to use with built in indexes.  Yo cold write
your own index here and potentially do better.

On Thursday, April 5, 2012, Chris Hart wrote:

> Thanks for all the help everyone.  The values were meant to be binary.  I
> ended making the possible values between 0 and 50 instead of just 0 or 1.
>  That way no single index row gets that wide.  I now run queries for
> everything from 1 to 50 to get 'queued' items and set the value to 0 when
> I'm done (I will never query for row_loaded = 0).  It's unfortunate
> Cassandra doesn't delegate the query execution to a node that had the index
> row on it, but rather tries to move the entire index row to the node that
> is queried.
>
> -Chris
>
> ----- Original Message -----
> From: "David Leimbach" <leim...@gmail.com <javascript:;>>
> To: user@cassandra.apache.org <javascript:;>
> Sent: Monday, April 2, 2012 8:51:46 AM
> Subject: Re: really bad select performance
>
>
> This is all very hypothetical, but I've been bitten by this before.
>
> Does row_loaded happen to be a binary or boolean value? If so the
> secondary index generated by Cassandra will have at most 2 rows, and
> they'll be REALLY wide if you have a lot of entries. Since Cassandra
> doesn't distribute columns over rows, those potentially very wide index
> rows, and their replicas, must live in SSTables in their entirety on the
> nodes that own them (and their replicas).
>
>
> Even though you limit 1, I'm not sure what "behind the scenes" things
> Cassandra does. I've received advice to avoid the built in secondary
> indexes in Cassandra for some of these reasons. Also if row_loaded is meant
> to implement some kind of queuing behavior, it could be the wrong problem
> space for Cassandra as a result of all of the above.
>
>
>
>
>
>
>
>
>
> On Sat, Mar 31, 2012 at 12:22 PM, aaron morton < 
> aa...@thelastpickle.com<javascript:;>> wrote:
>
>
>
>
> Is there anything in the logs when you run the queries ?
>
>
> Try turning the logging up to DEBUG on the node that fails to return and
> see what happens. You will see it send messages to other nodes and do work
> itself.
>
> One thing to note, a query that uses secondary indexes runs on a node for
> each token range. So it will use more than CL number of nodes.
>
>
> Cheers
>
>
>
>
>
>
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
>
> On 30/03/2012, at 11:52 AM, Chris Hart wrote:
>
>
>
> Hi,
>
> I have the following cluster:
>
> 136112946768375385385349842972707284580
> <ip address> MountainViewRAC1 Up Normal 1.86 GB 20.00% 0
> <ip address> MountainViewRAC1 Up Normal 2.17 GB 33.33%
> 56713727820156410577229101238628035242
> <ip address> MountainViewRAC1 Up Normal 2.41 GB 33.33%
> 113427455640312821154458202477256070485
> <ip address> Rackspace RAC1 Up Normal 3.9 GB 13.33%
> 136112946768375385385349842972707284580
>
> The following query runs quickly on all nodes except 1 MountainView node:
>
> select * from Access_Log where row_loaded = 0 limit 1;
>
> There is a secondary index on row_loaded. The query usually doesn't
> complete (but sometimes does) on the bad node and returns very quickly on
> all other nodes. I've upping the rpc timeout to a full minute
> (rpc_timeout_in_ms: 60000) in the yaml, but it still often doesn't complete
> in a minute. It seems just as likely to complete and takes about the same
> amount of time whether the limit is 1, 100 or 1000.
>
>
> Thanks for any help,
> Chris
>
>
>

Re: really bad select performance

Reply via email to