> As far as I know, to read a single column cassandra will deserialize a > bunch of them and then pick the correct one (64KB of data right?)
Assuming the default setting of 64kb, the average amount deserialized given random column access should be 8 kb (not true with row cache, but with large rows presumably you don't have row cache). > Would it be faster to have a row for each id I want to translate? This > would make keycache less effective, but the amount of data read should > be smaller. It depends on what bottlenecks you're optimizing for. A key is "expensive" in the sense that if (1) increases the size of bloom filters for the column family, and it (2) increases the memory cost of index sampling, and (3) increases the total data size (typically) because the row size is duplicated in both the index and data files. The cost of deserialization the same data repeatedly is CPU. So if you're nowhere near bottlenecking on disk and the memory trade-off is reasonable, it may be a suitable optimization. However, consider that unless you're doing order preserving partitioning, accessing those rows will be effectively random w.r.t. the locations on disk you're reading from so you're adding a lot of overhead in terms of disk I/O unless your data set fits comfortably in memory. -- / Peter Schuller