> As far as I know, to read a single column cassandra will deserialize a
> bunch of them and then pick the correct one (64KB of data right?)

Assuming the default setting of 64kb, the average amount deserialized
given random column access should be 8 kb (not true with row cache,
but with large rows presumably you don't have row cache).

> Would it be faster to have a row for each id I want to translate? This
> would make keycache less effective, but the amount of data read should
> be smaller.

It depends on what bottlenecks you're optimizing for. A key is
"expensive" in the sense that if (1) increases the size of bloom
filters for the column family, and it (2) increases the memory cost of
index sampling, and (3) increases the total data size (typically)
because the row size is duplicated in both the index and data files.

The cost of deserialization the same data repeatedly is CPU. So if
you're nowhere near bottlenecking on disk and the memory trade-off is
reasonable, it may be a suitable optimization. However, consider that
unless you're doing order preserving partitioning, accessing those
rows will be effectively random w.r.t. the locations on disk you're
reading from so you're adding a lot of overhead in terms of disk I/O
unless your data set fits comfortably in memory.

-- 
/ Peter Schuller

Reply via email to