Re: Query regarding SSTable timestamps and counts

aaron morton Tue, 20 Nov 2012 17:19:09 -0800

>> upgradetables re-writes every sstable to have the same contents in the
>> newest format.
Agree. 
 In the world of compaction, and excluding upgrades, have older sstables is 
expected.


Cheers
 
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 21/11/2012, at 11:45 AM, Edward Capriolo <edlinuxg...@gmail.com> wrote:

> On Tue, Nov 20, 2012 at 5:23 PM, aaron morton <aa...@thelastpickle.com> wrote:
>> My understanding of the compaction process was that since data files keep
>> continuously merging we should not have data files with very old last
>> modified timestamps
>> 
>> It is perfectly OK to have very old SSTables.
>> 
>> But performing an upgradesstables did decrease the number of data files and
>> removed all the data files with the old timestamps.
>> 
>> upgradetables re-writes every sstable to have the same contents in the
>> newest format.
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 19/11/2012, at 4:57 PM, Ananth Gundabattula <agundabatt...@gmail.com>
>> wrote:
>> 
>> Hello Aaron,
>> 
>> Thanks a lot for the reply.
>> 
>> Looks like the documentation is confusing. Here is the link I am referring
>> to:  http://www.datastax.com/docs/1.1/operations/tuning#tuning-compaction
>> 
>> 
>>> It does not disable compaction.
>> As per the above url, " After running a major compaction, automatic minor
>> compactions are no longer triggered, frequently requiring you to manually
>> run major compactions on a routine basis." ( Just before the heading Tuning
>> Column Family compression in the above link)
>> 
>> With respect to the replies below :
>> 
>> 
>>> it creates one big file, which will not be compacted until there are (by
>>> default) 3 other very big files.
>> This is for the minor compaction and major compaction should theoretically
>> result in one large file irrespective of the number of data files initially?
>> 
>>> This is not something you have to worry about. Unless you are seeing
>>> 1,000's of files using the default compaction.
>> 
>> Well my worry has been because of the large amount of node movements we have
>> done in the ring. We started off with 6 nodes and increased the capacity to
>> 12 with disproportionate increases every time which resulted in a lot of
>> clean of data folders except system, run repair and then a cleanup with an
>> aborted attempt in between.
>> 
>> There were some data.db files older by more than 2 weeks and were not
>> modified since then. My understanding of the compaction process was that
>> since data files keep continuously merging we should not have data files
>> with very old last modified timestamps (assuming there is a good amount of
>> writes to the table continuously) I did not have a for sure way of telling
>> if everything is alright with the compaction looking at the last modified
>> timestamps of all the data.db files.
>> 
>>> What are the compaction issues you are having ?
>> Your replies confirm that the timestamps should not be an issue to worry
>> about. So I guess I should not be calling them as issues any more.  But
>> performing an upgradesstables did decrease the number of data files and
>> removed all the data files with the old timestamps.
>> 
>> 
>> 
>> Regards,
>> Ananth
>> 
>> 
>> On Mon, Nov 19, 2012 at 6:54 AM, aaron morton <aa...@thelastpickle.com>
>> wrote:
>>> 
>>> As per datastax documentation, a manual compaction forces the admin to
>>> start compaction manually and disables the automated compaction (atleast for
>>> major compactions but not minor compactions )
>>> 
>>> It does not disable compaction.
>>> it creates one big file, which will not be compacted until there are (by
>>> default) 3 other very big files.
>>> 
>>> 
>>> 1. Does a nodetool stop compaction also force the admin to manually run
>>> major compaction ( I.e. disable automated major compactions ? )
>>> 
>>> No.
>>> Stop just stops the current compaction.
>>> Nothing is disabled.
>>> 
>>> 2. Can a node restart reset the automated major compaction if a node gets
>>> into a manual mode compaction for whatever reason ?
>>> 
>>> Major compaction is not automatic. It is the manual nodetool compact
>>> command.
>>> Automatic (minor) compaction is controlled by min_compaction_threshold and
>>> max_compaction_threshold (for the default compaction strategy).
>>> 
>>> 3. What is the ideal  number of SSTables for a table in a keyspace ( I
>>> mean are there any indicators as to whether my compaction is alright or not
>>> ? )
>>> 
>>> This is not something you have to worry about.
>>> Unless you are seeing 1,000's of files using the default compaction.
>>> 
>>> For example, I have seen SSTables on the disk more than 10 days old
>>> wherein there were other SSTables belonging to the same table but much
>>> younger than the older SSTables (
>>> 
>>> No problems.
>>> 
>>> 4. Does a upgradesstables fix any compaction issues ?
>>> 
>>> What are the compaction issues you are having ?
>>> 
>>> 
>>> Cheers
>>> 
>>> -----------------
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> New Zealand
>>> 
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 18/11/2012, at 1:18 AM, Ananth Gundabattula <agundabatt...@gmail.com>
>>> wrote:
>>> 
>>> 
>>> We have a cluster  running cassandra 1.1.4. On this cluster,
>>> 
>>> 1. We had to move the nodes around a bit  when we were adding new nodes
>>> (there was quite a good amount of node movement )
>>> 
>>> 2. We had to stop compactions during some of the days to save some disk
>>> space on some of the nodes when they were running very very low on disk
>>> spaces. (via nodetool stop COMPACTION)
>>> 
>>> 
>>> As per datastax documentation, a manual compaction forces the admin to
>>> start compaction manually and disables the automated compaction (atleast for
>>> major compactions but not minor compactions )
>>> 
>>> 
>>> Here are the questions I have regarding compaction:
>>> 
>>> 1. Does a nodetool stop compaction also force the admin to manually run
>>> major compaction ( I.e. disable automated major compactions ? )
>>> 
>>> 2. Can a node restart reset the automated major compaction if a node gets
>>> into a manual mode compaction for whatever reason ?
>>> 
>>> 3. What is the ideal  number of SSTables for a table in a keyspace ( I
>>> mean are there any indicators as to whether my compaction is alright or not
>>> ? )  . For example, I have seen SSTables on the disk more than 10 days old
>>> wherein there were other SSTables belonging to the same table but much
>>> younger than the older SSTables ( The node movement and repair and cleanup
>>> happened between the older SSTables and the new SSTables being
>>> touched/modified)
>>> 
>>> 4. Does a upgradesstables fix any compaction issues ?
>>> 
>>> Regards,
>>> Ananth
>>> 
>>> 
>> 
>> 
> 
> "it is perfectly OK to have old sstables."
> 
> Except for the fact that you can not repair and join new nodes until
> the cluster is on all on the same version all on the same files.
> 
> Your gc_grace_time defaults to 10 days. This means that if you don't
> repair every node every 10 days something wonky can happen if you do
> deletes.
> 
> Also in the past there was an issue if you upgraded from 0.8.X to
> 1.0.X. 1.0.X did not read some 0.8.X bloom filter files correctly. So
> you could get bad reads until you upgraded tables.
> 
> These factors cause me to upgrade sstables as soon as possible after an 
> upgrade.

Re: Query regarding SSTable timestamps and counts

Reply via email to