Great articles, I did not find those before !
*
SSTable Index - yes I mean column Index.

*I would like to understand, how many disk seeks might be required to find
column in single SSTable.

I am assuming positive bloom filter on row key. Now Cassandra needs to find
out whenever given SSTable contains column name, and this might require few
disk seeks:
1) Check key cache, if found go to 5)
2) Rad from disk all row keys, in order to find one (binary search)
3) Found row key contains disk offset to its column index
4) Read from disk column index for our row key. Index contains also bloom
filter on column names
5) Use bloom filter on column name, to find out whenever this SSTable might
contain our column
6) Read column to finally make sure that is exists

As I understand, in the worst case, we can have three disk seeks (2, 4, 6)
pro SSTable in order to check whenever it contains given column, it that
correct ?

I would expect, that sorted row keys (from point 2) ) already contain bloom
filter for their columns. But bloom filter is stored together with column
index, is that correct?


Cheers,
Maciej

On Fri, Aug 17, 2012 at 12:06 AM, aaron morton <aa...@thelastpickle.com>wrote:

> What about SSTable index,
>
> Not sure what you are referring to there. Each row has a in a SStable has
> a bloom filter and may have an index of columns. This is not cached.
>
> See http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ or
> http://www.slideshare.net/aaronmorton/cassandra-sf-2012-technical-deep-dive-query-performance
>
>  and Metadata?
>
> This is the meta data we hold in memory for every open sstable
>
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/SSTableMetadata.java
>
> Cheers
>
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 16/08/2012, at 7:34 PM, Maciej Miklas <mac.mik...@gmail.com> wrote:
>
> Hi all,
>
> bloom filter for row keys is always in RAM. What about SSTable index, and
> Metadata?
>
> Is it cached by Cassandra, or it relays on memory mapped files?
>
>
> Thanks,
> Maciej
>
>
>

Reply via email to