It is possible this is CPU bound. In 2.1 we have optimised the comparison
of clustering columns (CASSANDRA-5417
<https://issues.apache.org/jira/browse/CASSANDRA-5417>), but in 2.0 it is
quite expensive. So for a large row with several million comparisons to
perform (to merge, filter, etc.) it could be a significant proportion of
the cost. Note that these costs for a given query are all bound by a single
core, there is no parallelism, since the assumption is we are serving more
queries at once than there are cores (in general Cassandra is not designed
to serve workloads consisting of single large queries, at least not yet)

On Thu, Sep 18, 2014 at 7:29 AM, Mohammed Guller <moham...@glassbeam.com>
wrote:

>  Chris,
>
> I agree that reading 250k row is a bit excessive and that breaking up the
> partition would help reduce the query time. That part is well understood.
> The part that we can’t figure out is why read time did not change when we
> switched from a slow Network Attached Storage (AWS EBS) to local SSD.
>
>
>
> One possibility is that the read is not bound by disk i/o, but it is not
> cpu or memory bound either. So where is it spending all that time? Another
> possibility is that even though it is returning only 193311 cells, C* reads
> the entire partition, which may have a lot more cells. But even in that
> case reading from a local SSD should have been a lot faster than reading
> from non-provisioned EBS.
>
>
>
> Mohammed
>
>
>
> *From:* Chris Lohfink [mailto:clohf...@blackbirdit.com]
> *Sent:* Wednesday, September 17, 2014 7:17 PM
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: no change observed in read latency after switching from
> EBS to SSD storage
>
>
>
> "Read 193311 live and 0 tombstoned cells "
>
>
>
> is your killer.  returning 250k rows is a bit excessive, you should really
> page this in smaller chunks, what client are you using to access the data?
>  This partition (a, b, c, d, e, f) may be too large as well (can check
> partition max size from output of *nodetool cfstats*), may be worth
> including g to break it up more - but I dont know enough about your data
> model.
>
>
>
> ---
>
> Chris Lohfink
>
>
>
> On Sep 17, 2014, at 4:53 PM, Mohammed Guller <moham...@glassbeam.com>
> wrote:
>
>
>
>   Thank you all for your responses.
>
>
>
> Alex –
>
>   Instance (ephemeral) SSD
>
>
>
> Ben –
>
> the query reads data from just one partition. If disk i/o is the
> bottleneck, then in theory, if reading from EBS takes 10 seconds, then it
> should take lot less when reading the same amount of data from local SSD.
> My question is not about why it is taking 10 seconds, but why is the read
> time same for both EBS (network attached storage) and local SSD?
>
>
>
> Tony –
>
> if the data was cached in memory, then a read should not take 10 seconds
> just for 20MB data
>
>
>
> Rob –
>
> Here is the schema, query, and trace. I masked the actual column names to
> protect the innocents J
>
>
>
> create table dummy(
>
>   a   varchar,
>
>   b   varchar,
>
>   c   varchar,
>
>   d   varchar,
>
>   e   varchar,
>
>   f   varchar,
>
>   g   varchar,
>
>   h   timestamp,
>
>   i   int,
>
>   non_key1   varchar,
>
>   ...
>
>   non_keyN   varchar,
>
>   PRIMARY KEY ((a, b, c, d, e, f), g, h, i)
>
> ) WITH CLUSTERING ORDER BY (g ASC, h DESC, i ASC)
>
>
>
> SELECT h, non_key100, non_key200 FROM dummy WHERE a='aaaa' AND b='bbbbbb'
> AND c='ccc' AND d='dd' AND e='eeeeeeeeeeee' AND f='ffffffffff' AND
> g='ggggggggg'AND h >='2014-09-10T00:00:00' AND h<='2014-09-10T23:40:41';
>
>
>
> The above query returns around 250,000 CQL rows.
>
>
>
> cqlsh trace:
>
>
>
> activity | timestamp    | source      | source_elapsed
>
>
> -------------------------------------------------------------------------------------
>
> execute_cql3_query | 21:57:16,830 | 10.10.100.5 |              0
>
> Parsing query; | 21:57:16,830 | 10.10.100.5 |            673
>
> Preparing statement | 21:57:16,831 | 10.10.100.5 |           1602
>
> Executing single-partition query on event | 21:57:16,845 | 10.10.100.5
> |          14871
>
> Acquiring sstable references | 21:57:16,845 | 10.10.100.5 |          14896
>
> Merging memtable tombstones | 21:57:16,845 | 10.10.100.5 |          14954
>
> Bloom filter allows skipping sstable 1049 | 21:57:16,845 | 10.10.100.5
> |          15090
>
> Bloom filter allows skipping sstable 989 | 21:57:16,845 | 10.10.100.5
> |          15146
>
> Partition index with 0 entries found for sstable 937 | 21:57:16,845 |
> 10.10.100.5 |          15565
>
> Seeking to partition indexed section in data file | 21:57:16,845 |
> 10.10.100.5 |          15581
>
> Partition index with 7158 entries found for sstable 884 | 21:57:16,898 |
> 10.10.100.5 |          68644
>
> Seeking to partition indexed section in data file | 21:57:16,899 |
> 10.10.100.5 |          69014
>
> Partition index with 20819 entries found for sstable 733 | 21:57:16,916 |
> 10.10.100.5 |          86121
>
> Seeking to partition indexed section in data file | 21:57:16,916 |
> 10.10.100.5 |          86412
>
> Skipped 1/6 non-slice-intersecting sstables, included 0 due to tombstones
> | 21:57:16,916 | 10.10.100.5 |          86494
>
> Merging data from memtables and 3 sstables | 21:57:16,916 | 10.10.100.5
> |          86522
>
> Read 193311 live and 0 tombstoned cells | 21:57:24,552 | 10.10.100.5
> |        7722425
>
> Request complete | 21:57:29,074 | 10.10.100.5 |       12244832
>
>
>
>
>
> Mohammed
>
>
>
> *From:* Alex Major [mailto:al3...@gmail.com <al3...@gmail.com>]
> *Sent:* Wednesday, September 17, 2014 3:47 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: no change observed in read latency after switching from
> EBS to SSD storage
>
>
>
> When you say you moved from EBS to SSD, do you mean the EBS HDD drives to
> EBS SSD drives? Or instance SSD drives? The m3.large only comes with 32GB
> of instance based SSD storage. If you're using EBS SSD drives then network
> will still be the slowest thing so switching won't likely make much of a
> difference.
>
>
>
> On Wed, Sep 17, 2014 at 6:00 AM, Mohammed Guller <moham...@glassbeam.com>
> wrote:
>
> Rob,
>
> The 10 seconds latency that I gave earlier is from CQL tracing. Almost 5
> seconds out of that was taken up by the “merge memtable and sstables” step.
> The remaining 5 seconds are from “read live and tombstoned cells.”
>
>
>
> I too first thought that maybe disk is not the bottleneck and Cassandra is
> serving everything from cache, but in that case, it should not take 10
> seconds for reading just 20MB data.
>
>
>
> Also, I narrowed down the query to limit it to a single partition read and
> I ran the query in cqlsh running on the same node. I turned on tracing,
> which shows that all the steps got executed on the same node. htop shows
> that CPU and memory are not the bottlenecks. Network should not come into
> play since the cqlsh is running on the same node.
>
>
>
> Is there any performance tuning parameter in the cassandra.yaml file for
> large reads?
>
>
>
> Mohammed
>
>
>
> *From:* Robert Coli [mailto:rc...@eventbrite.com]
> *Sent:* Tuesday, September 16, 2014 5:42 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: no change observed in read latency after switching from
> EBS to SSD storage
>
>
>
> On Tue, Sep 16, 2014 at 5:35 PM, Mohammed Guller <moham...@glassbeam.com>
> wrote:
>
> Does anyone have insight as to why we don't see any performance impact on
> the reads going from EBS to SSD?
>
>
>
> What does it say when you enable tracing on this CQL query?
>
>
>
> 10 seconds is a really long time to access anything in Cassandra. There
> is, generally speaking, a reason why the default timeouts are lower than
> this.
>
>
>
> My conjecture is that the data in question was previously being served
> from the page cache and is now being served from SSD. You have, in
> switching from EBS-plus-page-cache to SSD successfully proved that SSD and
> RAM are both very fast. There is also a strong suggestion that whatever
> access pattern you are using is not bounded by disk performance.
>
>
>
> =Rob
>
>
>

Reply via email to