Re: Retrieving a column from a fat row vs retrieving a single row

aaron morton Wed, 08 Jun 2011 15:51:20 -0700

Just to make things less clear, if you have one row that you are continually 
writing it may end up spread out over several SSTables. Compaction helps here 
to reduce the number of files that must be accessed so long as is can keep up. 
But if you want to read column X and the row is fragmented over 5 SSTables then 
each one must be accessed.


 https://issues.apache.org/jira/browse/CASSANDRA-2319  is open to try and 
reduce the number of seeks. 

For now take a look at nodetool cfhistograms to see how many sstables are read 
for your queries. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 9 Jun 2011, at 04:50, Peter Schuller wrote:

>> As far as I know, to read a single column cassandra will deserialize a
>> bunch of them and then pick the correct one (64KB of data right?)
> 
> Assuming the default setting of 64kb, the average amount deserialized
> given random column access should be 8 kb (not true with row cache,
> but with large rows presumably you don't have row cache).
> 
>> Would it be faster to have a row for each id I want to translate? This
>> would make keycache less effective, but the amount of data read should
>> be smaller.
> 
> It depends on what bottlenecks you're optimizing for. A key is
> "expensive" in the sense that if (1) increases the size of bloom
> filters for the column family, and it (2) increases the memory cost of
> index sampling, and (3) increases the total data size (typically)
> because the row size is duplicated in both the index and data files.
> 
> The cost of deserialization the same data repeatedly is CPU. So if
> you're nowhere near bottlenecking on disk and the memory trade-off is
> reasonable, it may be a suitable optimization. However, consider that
> unless you're doing order preserving partitioning, accessing those
> rows will be effectively random w.r.t. the locations on disk you're
> reading from so you're adding a lot of overhead in terms of disk I/O
> unless your data set fits comfortably in memory.
> 
> -- 
> / Peter Schuller

Re: Retrieving a column from a fat row vs retrieving a single row

Reply via email to