[ https://issues.apache.org/jira/browse/CASSANDRA-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13189299#comment-13189299 ]
Pavel Yaskevich edited comment on CASSANDRA-2392 at 1/19/12 7:18 PM: --------------------------------------------------------------------- bq. Will do, The initial idea was to save some disk space as they keys in some cases can be really long and with the index seeks was not that bad in my initial tests but i will save it in v2. I'm thinking here from the I/O perpective because if we just read one file sequentially we will get page cache read-head working for us populating it with useful data but if you read from two files and do random I/O on one of them that will lead to slower I/O + page cache populated with useless data which could cost performance when node finishes start-up and starts to serve reads. Index intervals are almost all the time big enough so space taken by keys negligible comparing to I/O benefits it would give us. bq. I am not sure how saving dataPosition will help as we only have summaries between 128Keys or more and how will we mark a boundary with it? For example each row is 100MB big. Oh yes, you are right, we really need all boundary information from segmented files, my bad. was (Author: xedin): bq. I am not sure how saving dataPosition will help as we only have summaries between 128Keys or more and how will we mark a boundary with it? For example each row is 100MB big. Oh yes, you are right, we really need all boundary information from segmented files, my bad. > Saving IndexSummaries to disk > ----------------------------- > > Key: CASSANDRA-2392 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2392 > Project: Cassandra > Issue Type: Improvement > Reporter: Chris Goffinet > Assignee: Vijay > Priority: Minor > Fix For: 1.1 > > Attachments: 0001-re-factor-first-and-last.patch, > 0001-save-summaries-to-disk.patch, 0002-save-summaries-to-disk.patch > > > For nodes with millions of keys, doing rolling restarts that take over 10 > minutes per node can be painful if you have 100 node cluster. All of our time > is spent on doing index summary computations on startup. It would be great if > we could save those to disk as well. Our indexes are quite large. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira