Re: Snapshots are not working after upgrading to 4.15.0

Andrei Mikhailovsky Thu, 17 Jun 2021 03:31:04 -0700

Hi Suresh,

This is what I've answered on the db tables:


    The table snapshots has NULL under the removed column in all snapshots that 
I've
    removed. The table snapshot_store_ref has no such column, but the state 
shown
    as Destroyed.


I've done some more checking under the ssvm itself, which look ok:


root@s-2536-VM:/usr/local/cloud/systemvm# 
/usr/local/cloud/systemvm/ssvm-check.sh
================================================
First DNS server is  192.168.169.254
PING 192.168.169.254 (192.168.169.254): 56 data bytes
64 bytes from 192.168.169.254: icmp_seq=0 ttl=64 time=0.520 ms
64 bytes from 192.168.169.254: icmp_seq=1 ttl=64 time=0.294 ms
--- 192.168.169.254 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.294/0.407/0.520/0.113 ms
Good: Can ping DNS server
================================================
Good: DNS resolves cloudstack.apache.org
================================================
nfs is currently mounted
Mount point is /mnt/SecStorage/ceb27169-9a58-32ef-81b4-33b0b12e9aa2
Good: Can write to mount point
================================================
Management server is 192.168.169.13. Checking connectivity.
Good: Can connect to management server 192.168.169.13 port 8250
================================================
Good: Java process is running
================================================
Tests Complete. Look for ERROR or WARNING above.


The management server does show errors like these, without any further details:

2021-06-17 10:31:06,197 DEBUG [c.c.s.StorageManagerImpl] 
(StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to delete 
snapshot: 55183 from storage
2021-06-17 10:31:06,280 DEBUG [o.a.c.s.s.SnapshotObject] 
(StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to update 
state:com.cloud.utils.fsm.NoTransitionException: Unable to transition to a new 
state from Destroyed via DestroyRequested
2021-06-17 10:31:06,281 DEBUG [c.c.s.StorageManagerImpl] 
(StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to delete 
snapshot: 84059 from storage
2021-06-17 10:31:06,363 DEBUG [o.a.c.s.s.SnapshotObject] 
(StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to update 
state:com.cloud.utils.fsm.NoTransitionException: Unable to transition to a new 
state from Destroyed via DestroyRequested


Regarding the bug 4797. I can't really comment as it has very little technical 
details without the management log errors, etc. But essentially, at the high 
level, the snapshots are not deleted from the backend in my case, just like in 
the bug 4797.


TBH, I am very much surprised that a bug in such an important function of ACS 
has slipped through the testing methods for the 4.15.0 release and despite 
being discovered over 3 months ago, it hasn't been scheduled for the fix in 
4.15.1 bug fix release. Does that sound right to you? I think this issue should 
be revisited and corrected as it will cause a fill up of the secondary storage 
and ultimately cause all sorts of issues with creation of snapshots.

Andrei


----- Original Message -----
> From: "Suresh Anaparti" <suresh.anapa...@shapeblue.com>
> To: "users" <users@cloudstack.apache.org>
> Sent: Thursday, 17 June, 2021 11:16:59
> Subject: Re: Snapshots are not working after upgrading to 4.15.0

> Hi Andrei,
> 
> Have you checked the 'status' and 'removed' timestamp in snapshots table, and
> 'state' in snapshot_store_ref table for these snapshots.
> 
> Similar issue logged (by Ed, as mentioned in his email) here:
> https://github.com/apache/cloudstack/issues/4797. Is it the same issue?
> 
> Regards,
> Suresh
> 
>On 17/06/21, 2:18 PM, "Andrei Mikhailovsky" <and...@arhont.com.INVALID> wrote:
> 
>    Hi Suresh, Please see below the answers to your questions.
> 
>    
> 
> 
> ----- Original Message -----
>    > From: "Suresh Anaparti" <suresh.anapa...@shapeblue.com>
>    > To: "users" <users@cloudstack.apache.org>
>    > Sent: Thursday, 17 June, 2021 06:36:27
>    > Subject: Re: Snapshots are not working after upgrading to 4.15.0
> 
>    > Hi Andrei,
>    > 
>    > Can you check if the storage garbage collector is enabled or not in your 
> env
>    > (specified using the global setting 'storage.cleanup.enabled'). If it is
>    > enabled, check the interval & delay setting: 'storage.cleanup.interval' 
> and
>    > 'storage.cleanup.delay', and see the logs to confirm cleanup is 
> performed or
>    > not.
> 
>    storage.cleanup.enabled is true
>    storage.cleanup.interval is 3600
>    storage.cleanup.delay is 360086400
> 
>    > 
>    > Also, check the snapshot status / state in snapshots & 
> snapshot_store_ref tables
>    > for the snapshots that are not deleted during the cleanup. Is 'removed'
>    > timestamp set for them in snapshots table?
>    > 
> 
> 
>    The table snapshots has NULL under the removed column in all snapshots 
> that I've
>    removed. The table snapshot_store_ref has no such column, but the state 
> shown
>    as Destroyed.
> 
> 
> 
> 
>    > Regards,
>    > Suresh
>    > 
>    >On 16/06/21, 9:46 PM, "Andrei Mikhailovsky" <and...@arhont.com.INVALID> 
> wrote:
>    > 
>    >    Hello,
>    > 
>    >    I've done some more investigation and indeed, the snapshots were not 
> taken
>    >    because the secondary storage was over 90% used. I have started 
> cleaning some
>    >    of the older volumes and noticed another problem. After removing 
> snapshots,
>    >    they do not seem to be removed from the secondary storage. I've 
> removed all
>    >    snapshots over 24 hours ago and it looks like  the disk space hasn't 
> been freed
>    >    up at all.
>    > 
>    >    Looks like there are issues with snapshotting function after all.
>    > 
>    >    Andrei
>    > 
>    > 
>    > 
>    >    
>    > 
>    > 
>    > ----- Original Message -----
>    >    > From: "Harikrishna Patnala" <harikrishna.patn...@shapeblue.com>
>    >    > To: "users" <users@cloudstack.apache.org>
>    >    > Sent: Tuesday, 8 June, 2021 03:33:57
>    >    > Subject: Re: Snapshots are not working after upgrading to 4.15.0
>    > 
>    >    > Hi Andrei,
>    >    > 
>    >    > Can you check the following things and let us know?
>    >    > 
>    >    > 
>    >    >  1.  Can you try creating a new volume and then create snapshot of 
> that, to check
>    >    >  if this an issue with old entries
>    >    >  2.  For the snapshots which are failing can you check if you are 
> seeing any
>    >    >  error messages like this "Can't find an image storage in zone with 
> less than".
>    >    >  This is to check if secondary storage free space check failed.
>    >    >  3.  For the snapshots which are failing and if it is delta 
> snapshot can you
>    >    >  check if its parent's snapshot entry exists in 
> "snapshot_store_ref" table with
>    >    >  'parent_snapshot_id' of the current snapshot with 'store_role' 
> "Image". This is
>    >    >  to find the secondary storage where the parent snapshot backup is 
> located.
>    >    > 
>    >    > Regards,
>    >    > Harikrishna
>    >    > ________________________________
>    >    > From: Andrei Mikhailovsky <and...@arhont.com.INVALID>
>    >    > Sent: Monday, June 7, 2021 7:00 PM
>    >    > To: users <users@cloudstack.apache.org>
>    >    > Subject: Snapshots are not working after upgrading to 4.15.0
>    >    > 
>    >    > Hello everyone,
>    >    > 
>    >    > I am having an issue with volume snapshots since I've upgraded to 
> 4.15.0. None
>    >    > of the volumes are being snapshotted regardless if the snapshot is 
> initiated
>    >    > manually or from the schedule. The strange thing is that if I 
> manually take the
>    >    > snapshot, the GUI shows Success status, but the Storage>Snapshots 
> show an Error
>    >    > status. Here is what I see in the management server logs:
>    >    > 
>    >    > 2021-06-07 13:55:20,022 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>    >    > (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) 
> (logid:be34ce01) Done
>    >    > executing com.cloud.vm.VmWorkTakeVolumeSnapshot for job-86143
>    >    > 2021-06-07 13:55:20,024 INFO [o.a.c.f.j.i.AsyncJobMonitor]
>    >    > (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) 
> (logid:be34ce01) Remove
>    >    > job-86143 from job monitoring
>    >    > 2021-06-07 13:55:20,094 DEBUG [o.a.c.s.s.SnapshotServiceImpl]
>    >    > (BackupSnapshotTask-3:ctx-744796da) (logid:607dbb0e) Failed to copy 
> snapshot
>    >    > com.cloud.utils.exception.CloudRuntimeException: can not find an 
> image stores
>    >    > at
>    >    > 
> org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.backupSnapshot(SnapshotServiceImpl.java:271)
>    >    > at
>    >    > 
> org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.backupSnapshot(DefaultSnapshotStrategy.java:171)
>    >    > at
>    >    > 
> com.cloud.storage.snapshot.SnapshotManagerImpl$BackupSnapshotTask.runInContext(SnapshotManagerImpl.java:1238)
>    >    > at
>    >    > 
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48)
>    >    > at
>    >    > 
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55)
>    >    > at
>    >    > 
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102)
>    >    > at
>    >    > 
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52)
>    >    > at
>    >    > 
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45)
>    >    > at
>    >    > 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>    >    > at 
> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>    >    > at
>    >    > 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
>    >    > at
>    >    > 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>    >    > at
>    >    > 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>    >    > at java.base/java.lang.Thread.run(Thread.java:829)
>    >    > 2021-06-07 13:55:20,152 DEBUG [c.c.s.s.SnapshotManagerImpl]
>    >    > (BackupSnapshotTask-3:ctx-744796da) (logid:607dbb0e) Backing up of 
> snapshot
>    >    > failed, for snapshot with ID 53531, left with 2 more attempts
>    >    > 
>    >    > 
>    >    > I've checked and the Secondary storage is configured and visible in 
> the GUI. I
>    >    > can also mount it manually from the management server and a couple 
> of host
>    >    > servers that I've tested. In addition, I can successfully upload an 
> ISO image
>    >    > and that registers just fine and I can create new VMs using the 
> newly uploaded
>    >    > ISO image.
>    >    > 
>    >    > I've had no such problems with 4.13.x ACS, so the issue seems to 
> have been
>    >    > introduced after doing the upgrade to 4.15.0.
>    >    > 
>    >    > Could you please let me know how do I fix the issue?
>    >    > 
>    >    > Cheers
>    >    > 
>     >     > andrei

Re: Snapshots are not working after upgrading to 4.15.0

Reply via email to