I looked at the source code for GNU tar, and it looks for a change in the create time or (more likely) a change in the size.
This seems very strange to me — I would think that creating a snapshot would cause a flush and then once the SSTables are written, hardlinks would be created and the SSTables wouldn't be written to after that. Our solution is to wait 5 minutes and retry the tar if an error occurs. This isn't ideal - but it's the best I could come up with. :-/ Thanks Jeff & others for your responses. - Max > On May 25, 2018, at 5:05pm, Elliott Sims <elli...@backblaze.com> wrote: > > I've run across this problem before - it seems like GNU tar interprets > changes in the link count as changes to the file, so if the file gets > compacted mid-backup it freaks out even if the file contents are unchanged. > I worked around it by just using bsdtar instead. > > On Thu, May 24, 2018 at 6:08 AM, Nitan Kainth <nitankai...@gmail.com > <mailto:nitankai...@gmail.com>> wrote: > Jeff, > > Shouldn't Snapshot get consistent state of sstables? -tmp file shouldn't > impact backup operation right? > > > Regards, > Nitan K. > Cassandra and Oracle Architect/SME > Datastax Certified Cassandra expert > Oracle 10g Certified > > On Wed, May 23, 2018 at 6:26 PM, Jeff Jirsa <jji...@gmail.com > <mailto:jji...@gmail.com>> wrote: > In versions before 3.0, sstables were written with a -tmp filename and > copied/moved to the final filename when complete. This changes in 3.0 - we > write into the file with the final name, and have a journal/log to let uss > know when it's done/final/live. > > Therefore, you can no longer just watch for a -Data.db file to be created and > uploaded - you have to watch the log to make sure it's not being written. > > > On Wed, May 23, 2018 at 2:18 PM, Max C. <mc_cassan...@core43.com > <mailto:mc_cassan...@core43.com>> wrote: > Hi Everyone, > > We’ve noticed a few times in the last few weeks that when we’re doing > backups, tar has complained with messages like this: > > tar: > /var/lib/cassandra/data/mars/test_instances_by_test_id-6a9440a04cc111e8878675f1041d7e1c/snapshots/backup_20180523_024502/mb-63-big-Data.db: > file changed as we read it > > Any idea what might be causing this? > > We’re running Cassandra 3.0.8 on RHEL 7. Here’s rough pseudocode of our > backup process: > > <cronjob set to fire same script at same time on all nodes> > SNAPSHOT_NAME=backup_YYYMMDD_HHMMSS > nodetool snapshot -t $SNAPSHOT_NAME > > for each keyspace > - dump schema to “schema.cql" > - tar -czf /file_server/backup_$HOSTNAME_$KEYSPACE_YYYYMMDD_HHMMSS.tgz > schema.cql /var/lib/cassandra/data/$KEYSPACE/*/snapshots/$SNAPSHOT_NAME > > nodetool clearsnapshot -t $SNAPSHOT_NAME > > Thanks. > > - Max > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > <mailto:user-unsubscr...@cassandra.apache.org> > For additional commands, e-mail: user-h...@cassandra.apache.org > <mailto:user-h...@cassandra.apache.org> > > > >