>> upgradetables re-writes every sstable to have the same contents in the >> newest format. Agree. In the world of compaction, and excluding upgrades, have older sstables is expected.
Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 21/11/2012, at 11:45 AM, Edward Capriolo <edlinuxg...@gmail.com> wrote: > On Tue, Nov 20, 2012 at 5:23 PM, aaron morton <aa...@thelastpickle.com> wrote: >> My understanding of the compaction process was that since data files keep >> continuously merging we should not have data files with very old last >> modified timestamps >> >> It is perfectly OK to have very old SSTables. >> >> But performing an upgradesstables did decrease the number of data files and >> removed all the data files with the old timestamps. >> >> upgradetables re-writes every sstable to have the same contents in the >> newest format. >> >> Cheers >> >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> New Zealand >> >> @aaronmorton >> http://www.thelastpickle.com >> >> On 19/11/2012, at 4:57 PM, Ananth Gundabattula <agundabatt...@gmail.com> >> wrote: >> >> Hello Aaron, >> >> Thanks a lot for the reply. >> >> Looks like the documentation is confusing. Here is the link I am referring >> to: http://www.datastax.com/docs/1.1/operations/tuning#tuning-compaction >> >> >>> It does not disable compaction. >> As per the above url, " After running a major compaction, automatic minor >> compactions are no longer triggered, frequently requiring you to manually >> run major compactions on a routine basis." ( Just before the heading Tuning >> Column Family compression in the above link) >> >> With respect to the replies below : >> >> >>> it creates one big file, which will not be compacted until there are (by >>> default) 3 other very big files. >> This is for the minor compaction and major compaction should theoretically >> result in one large file irrespective of the number of data files initially? >> >>> This is not something you have to worry about. Unless you are seeing >>> 1,000's of files using the default compaction. >> >> Well my worry has been because of the large amount of node movements we have >> done in the ring. We started off with 6 nodes and increased the capacity to >> 12 with disproportionate increases every time which resulted in a lot of >> clean of data folders except system, run repair and then a cleanup with an >> aborted attempt in between. >> >> There were some data.db files older by more than 2 weeks and were not >> modified since then. My understanding of the compaction process was that >> since data files keep continuously merging we should not have data files >> with very old last modified timestamps (assuming there is a good amount of >> writes to the table continuously) I did not have a for sure way of telling >> if everything is alright with the compaction looking at the last modified >> timestamps of all the data.db files. >> >>> What are the compaction issues you are having ? >> Your replies confirm that the timestamps should not be an issue to worry >> about. So I guess I should not be calling them as issues any more. But >> performing an upgradesstables did decrease the number of data files and >> removed all the data files with the old timestamps. >> >> >> >> Regards, >> Ananth >> >> >> On Mon, Nov 19, 2012 at 6:54 AM, aaron morton <aa...@thelastpickle.com> >> wrote: >>> >>> As per datastax documentation, a manual compaction forces the admin to >>> start compaction manually and disables the automated compaction (atleast for >>> major compactions but not minor compactions ) >>> >>> It does not disable compaction. >>> it creates one big file, which will not be compacted until there are (by >>> default) 3 other very big files. >>> >>> >>> 1. Does a nodetool stop compaction also force the admin to manually run >>> major compaction ( I.e. disable automated major compactions ? ) >>> >>> No. >>> Stop just stops the current compaction. >>> Nothing is disabled. >>> >>> 2. Can a node restart reset the automated major compaction if a node gets >>> into a manual mode compaction for whatever reason ? >>> >>> Major compaction is not automatic. It is the manual nodetool compact >>> command. >>> Automatic (minor) compaction is controlled by min_compaction_threshold and >>> max_compaction_threshold (for the default compaction strategy). >>> >>> 3. What is the ideal number of SSTables for a table in a keyspace ( I >>> mean are there any indicators as to whether my compaction is alright or not >>> ? ) >>> >>> This is not something you have to worry about. >>> Unless you are seeing 1,000's of files using the default compaction. >>> >>> For example, I have seen SSTables on the disk more than 10 days old >>> wherein there were other SSTables belonging to the same table but much >>> younger than the older SSTables ( >>> >>> No problems. >>> >>> 4. Does a upgradesstables fix any compaction issues ? >>> >>> What are the compaction issues you are having ? >>> >>> >>> Cheers >>> >>> ----------------- >>> Aaron Morton >>> Freelance Cassandra Developer >>> New Zealand >>> >>> @aaronmorton >>> http://www.thelastpickle.com >>> >>> On 18/11/2012, at 1:18 AM, Ananth Gundabattula <agundabatt...@gmail.com> >>> wrote: >>> >>> >>> We have a cluster running cassandra 1.1.4. On this cluster, >>> >>> 1. We had to move the nodes around a bit when we were adding new nodes >>> (there was quite a good amount of node movement ) >>> >>> 2. We had to stop compactions during some of the days to save some disk >>> space on some of the nodes when they were running very very low on disk >>> spaces. (via nodetool stop COMPACTION) >>> >>> >>> As per datastax documentation, a manual compaction forces the admin to >>> start compaction manually and disables the automated compaction (atleast for >>> major compactions but not minor compactions ) >>> >>> >>> Here are the questions I have regarding compaction: >>> >>> 1. Does a nodetool stop compaction also force the admin to manually run >>> major compaction ( I.e. disable automated major compactions ? ) >>> >>> 2. Can a node restart reset the automated major compaction if a node gets >>> into a manual mode compaction for whatever reason ? >>> >>> 3. What is the ideal number of SSTables for a table in a keyspace ( I >>> mean are there any indicators as to whether my compaction is alright or not >>> ? ) . For example, I have seen SSTables on the disk more than 10 days old >>> wherein there were other SSTables belonging to the same table but much >>> younger than the older SSTables ( The node movement and repair and cleanup >>> happened between the older SSTables and the new SSTables being >>> touched/modified) >>> >>> 4. Does a upgradesstables fix any compaction issues ? >>> >>> Regards, >>> Ananth >>> >>> >> >> > > "it is perfectly OK to have old sstables." > > Except for the fact that you can not repair and join new nodes until > the cluster is on all on the same version all on the same files. > > Your gc_grace_time defaults to 10 days. This means that if you don't > repair every node every 10 days something wonky can happen if you do > deletes. > > Also in the past there was an issue if you upgraded from 0.8.X to > 1.0.X. 1.0.X did not read some 0.8.X bloom filter files correctly. So > you could get bad reads until you upgraded tables. > > These factors cause me to upgrade sstables as soon as possible after an > upgrade.