[ 
https://issues.apache.org/jira/browse/CASSANDRA-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13189299#comment-13189299
 ] 

Pavel Yaskevich edited comment on CASSANDRA-2392 at 1/19/12 7:18 PM:
---------------------------------------------------------------------

bq. Will do, The initial idea was to save some disk space as they keys in some 
cases can be really long  and with the index seeks was not that bad in my 
initial tests but i will save it in v2.

I'm thinking here from the I/O perpective because if we just read one file 
sequentially we will get page cache read-head working for us populating it with 
useful data but if you read from two files and do random I/O on one of them 
that will lead to slower I/O + page cache populated with useless data which 
could cost performance when node finishes start-up and starts to serve reads. 
Index intervals are almost all the time big enough so space taken by keys 
negligible comparing to I/O benefits it would give us.
 
bq. I am not sure how saving dataPosition will help as we only have summaries 
between 128Keys or more and how will we mark a boundary with it? For example 
each row is 100MB big.

Oh yes, you are right, we really need all boundary information from segmented 
files, my bad.

                
      was (Author: xedin):
    bq. I am not sure how saving dataPosition will help as we only have 
summaries between 128Keys or more and how will we mark a boundary with it? For 
example each row is 100MB big.

Oh yes, you are right, we really need all boundary information from segmented 
files, my bad.

                  
> Saving IndexSummaries to disk
> -----------------------------
>
>                 Key: CASSANDRA-2392
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2392
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Chris Goffinet
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-re-factor-first-and-last.patch, 
> 0001-save-summaries-to-disk.patch, 0002-save-summaries-to-disk.patch
>
>
> For nodes with millions of keys, doing rolling restarts that take over 10 
> minutes per node can be painful if you have 100 node cluster. All of our time 
> is spent on doing index summary computations on startup. It would be great if 
> we could save those to disk as well. Our indexes are quite large.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to