Yes, I understand that. However, what I'm trying to understand is the
internal structure of partition index. When a record associate with the
same partition key is updated, we have two different records with different
timestamps. There are chances of these two records being split across two
different SSTables (of course as long as compaction is not merging them
into one SSTable eventually). How partition index looks like in such case?
For the same key, we have two different records in different SSTables. How
does partition index store such information? Can it have repeated partition
keys with different disk offsets pointing to different SSTables?

On Tue, Mar 21, 2017 at 10:09 AM, Jonathan Haddad <j...@jonhaddad.com> wrote:

> The partition index is never updated, as sstables are immutable.
>
> On Tue, Mar 21, 2017 at 9:40 AM preetika tyagi <preetikaty...@gmail.com>
> wrote:
>
>> Thank you Jan & Jeff for the responses. That was really useful.
>>
>> Jan - I have one follow-up question. When the data is spread over more
>> than one SSTable in case of updates as you mentioned, we will need two
>> seeks per SSTable (one for partition index and another for SSTable itself).
>> I'm curious to know how partition index is structured internally. I was
>> assuming it to be a table with <key, disk offset> pairs. In case of an
>> update to the same key for several times, how it is recorded in the
>> partition index?
>>
>> Thanks,
>> Preetika
>>
>> On Mon, Mar 20, 2017 at 10:37 PM, <j.kes...@enercast.de> wrote:
>>
>> Hi,
>>
>>
>>
>> youre right – one seek with hit in the partition key cache and two if not.
>>
>>
>>
>> Thats the theory – but two thinge to mention:
>>
>>
>>
>> First, you need two seeks per sstable not per entire read. So if you data
>> is spread over multiple sstables on disk you obviously need more then two
>> reads. Think of often updated partition keys – in combination with memory
>> preassure you can easily end up with maaany sstables (ok they will be
>> compacted some time in the future).
>>
>>
>>
>> Second, there could be fragmentation on disk which leads to seeks during
>> sequential reads.
>>
>>
>>
>> Jan
>>
>>
>>
>> Gesendet von meinem Windows 10 Phone
>>
>>
>>
>> *Von: *preetika tyagi <preetikaty...@gmail.com>
>> *Gesendet: *Montag, 20. März 2017 21:18
>> *An: *user@cassandra.apache.org
>> *Betreff: *question on maximum disk seeks
>>
>>
>>
>> I'm trying to understand the maximum number of disk seeks required in a
>> read operation in Cassandra. I looked at several online articles including
>> this one: https://docs.datastax.com/en/cassandra/3.0/
>> cassandra/dml/dmlAboutReads.html
>>
>> As per my understanding, two disk seeks are required in the worst case.
>> One is for reading the partition index and another is to read the actual
>> data from the compressed partition. The index of the data in compressed
>> partitions is obtained from the compression offset tables (which is stored
>> in memory). Am I on the right track here? Will there ever be a case when
>> more than 1 disk seek is required to read the data?
>>
>> Thanks,
>>
>> Preetika
>>
>>
>>
>>
>>
>>

Reply via email to