Re: Snapshots are not working after upgrading to 4.15.0

Gabriel Bräscher Thu, 17 Jun 2021 06:23:36 -0700

Hi Andrei,

I appreciate all the efforts and the help in narrowing down this issue. It
looks similar and probably it is related to bug #4797 indeed.
This bug is for some time to be fixed and I perfectly understand why you
are not happy.


I am speaking for myself here and I am not the Release Manager (RM) of
4.15.1.0 but In my point of view, this does not necessarily impact on
blocking 4.15.1.0.

Fixing it has been proving a bit trickier and also requires manual tests
with different environment configurations and some time to debug and
develop.
I myself had no time to fix it for 4.15.1.0 thus decided to not hold
4.15.1.0 as it would mean that many users would not have several bug fixes
due to this one.

To give some context. I work for a hosting company that has been
contributing to bug fixes and new features for a long time.
We even fixed bugs that do not impact us directly (e.g. issues that affect
storage systems we do not use, or a hypervisor we do not use, etc).
This means that I, as a contributor, sometimes have less time for some
tasks than other ones.

With that said, I will be re-checking this issue soon(ish) but I cannot
guarantee that I will be able to bring a fix in time for 4.15.1.0.
If any contributor has time to fix it I would be happy to help with review
and testing.

Best regards,
Gabriel.

Em qui., 17 de jun. de 2021 às 07:31, Andrei Mikhailovsky
<and...@arhont.com.invalid> escreveu:

> Hi Suresh,
>
> This is what I've answered on the db tables:
>
>     The table snapshots has NULL under the removed column in all snapshots
> that I've
>     removed. The table snapshot_store_ref has no such column, but the
> state shown
>     as Destroyed.
>
>
> I've done some more checking under the ssvm itself, which look ok:
>
>
> root@s-2536-VM:/usr/local/cloud/systemvm#
> /usr/local/cloud/systemvm/ssvm-check.sh
> ================================================
> First DNS server is  192.168.169.254
> PING 192.168.169.254 (192.168.169.254): 56 data bytes
> 64 bytes from 192.168.169.254: icmp_seq=0 ttl=64 time=0.520 ms
> 64 bytes from 192.168.169.254: icmp_seq=1 ttl=64 time=0.294 ms
> --- 192.168.169.254 ping statistics ---
> 2 packets transmitted, 2 packets received, 0% packet loss
> round-trip min/avg/max/stddev = 0.294/0.407/0.520/0.113 ms
> Good: Can ping DNS server
> ================================================
> Good: DNS resolves cloudstack.apache.org
> ================================================
> nfs is currently mounted
> Mount point is /mnt/SecStorage/ceb27169-9a58-32ef-81b4-33b0b12e9aa2
> Good: Can write to mount point
> ================================================
> Management server is 192.168.169.13. Checking connectivity.
> Good: Can connect to management server 192.168.169.13 port 8250
> ================================================
> Good: Java process is running
> ================================================
> Tests Complete. Look for ERROR or WARNING above.
>
>
> The management server does show errors like these, without any further
> details:
>
> 2021-06-17 10:31:06,197 DEBUG [c.c.s.StorageManagerImpl]
> (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to delete
> snapshot: 55183 from storage
> 2021-06-17 10:31:06,280 DEBUG [o.a.c.s.s.SnapshotObject]
> (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to update
> state:com.cloud.utils.fsm.NoTransitionException: Unable to transition to a
> new state from Destroyed via DestroyRequested
> 2021-06-17 10:31:06,281 DEBUG [c.c.s.StorageManagerImpl]
> (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to delete
> snapshot: 84059 from storage
> 2021-06-17 10:31:06,363 DEBUG [o.a.c.s.s.SnapshotObject]
> (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to update
> state:com.cloud.utils.fsm.NoTransitionException: Unable to transition to a
> new state from Destroyed via DestroyRequested
>
>
> Regarding the bug 4797. I can't really comment as it has very little
> technical details without the management log errors, etc. But essentially,
> at the high level, the snapshots are not deleted from the backend in my
> case, just like in the bug 4797.
>
>
> TBH, I am very much surprised that a bug in such an important function of
> ACS has slipped through the testing methods for the 4.15.0 release and
> despite being discovered over 3 months ago, it hasn't been scheduled for
> the fix in 4.15.1 bug fix release. Does that sound right to you? I think
> this issue should be revisited and corrected as it will cause a fill up of
> the secondary storage and ultimately cause all sorts of issues with
> creation of snapshots.
>
> Andrei
>
>
> ----- Original Message -----
> > From: "Suresh Anaparti" <suresh.anapa...@shapeblue.com>
> > To: "users" <users@cloudstack.apache.org>
> > Sent: Thursday, 17 June, 2021 11:16:59
> > Subject: Re: Snapshots are not working after upgrading to 4.15.0
>
> > Hi Andrei,
> >
> > Have you checked the 'status' and 'removed' timestamp in snapshots
> table, and
> > 'state' in snapshot_store_ref table for these snapshots.
> >
> > Similar issue logged (by Ed, as mentioned in his email) here:
> > https://github.com/apache/cloudstack/issues/4797. Is it the same issue?
> >
> > Regards,
> > Suresh
> >
> >On 17/06/21, 2:18 PM, "Andrei Mikhailovsky" <and...@arhont.com.INVALID>
> wrote:
> >
> >    Hi Suresh, Please see below the answers to your questions.
> >
> >
> >
> >
> > ----- Original Message -----
> >    > From: "Suresh Anaparti" <suresh.anapa...@shapeblue.com>
> >    > To: "users" <users@cloudstack.apache.org>
> >    > Sent: Thursday, 17 June, 2021 06:36:27
> >    > Subject: Re: Snapshots are not working after upgrading to 4.15.0
> >
> >    > Hi Andrei,
> >    >
> >    > Can you check if the storage garbage collector is enabled or not in
> your env
> >    > (specified using the global setting 'storage.cleanup.enabled'). If
> it is
> >    > enabled, check the interval & delay setting:
> 'storage.cleanup.interval' and
> >    > 'storage.cleanup.delay', and see the logs to confirm cleanup is
> performed or
> >    > not.
> >
> >    storage.cleanup.enabled is true
> >    storage.cleanup.interval is 3600
> >    storage.cleanup.delay is 360086400
> >
> >    >
> >    > Also, check the snapshot status / state in snapshots &
> snapshot_store_ref tables
> >    > for the snapshots that are not deleted during the cleanup. Is
> 'removed'
> >    > timestamp set for them in snapshots table?
> >    >
> >
> >
> >    The table snapshots has NULL under the removed column in all
> snapshots that I've
> >    removed. The table snapshot_store_ref has no such column, but the
> state shown
> >    as Destroyed.
> >
> >
> >
> >
> >    > Regards,
> >    > Suresh
> >    >
> >    >On 16/06/21, 9:46 PM, "Andrei Mikhailovsky"
> <and...@arhont.com.INVALID> wrote:
> >    >
> >    >    Hello,
> >    >
> >    >    I've done some more investigation and indeed, the snapshots were
> not taken
> >    >    because the secondary storage was over 90% used. I have started
> cleaning some
> >    >    of the older volumes and noticed another problem. After removing
> snapshots,
> >    >    they do not seem to be removed from the secondary storage. I've
> removed all
> >    >    snapshots over 24 hours ago and it looks like  the disk space
> hasn't been freed
> >    >    up at all.
> >    >
> >    >    Looks like there are issues with snapshotting function after all.
> >    >
> >    >    Andrei
> >    >
> >    >
> >    >
> >    >
> >    >
> >    >
> >    > ----- Original Message -----
> >    >    > From: "Harikrishna Patnala" <harikrishna.patn...@shapeblue.com
> >
> >    >    > To: "users" <users@cloudstack.apache.org>
> >    >    > Sent: Tuesday, 8 June, 2021 03:33:57
> >    >    > Subject: Re: Snapshots are not working after upgrading to
> 4.15.0
> >    >
> >    >    > Hi Andrei,
> >    >    >
> >    >    > Can you check the following things and let us know?
> >    >    >
> >    >    >
> >    >    >  1.  Can you try creating a new volume and then create
> snapshot of that, to check
> >    >    >  if this an issue with old entries
> >    >    >  2.  For the snapshots which are failing can you check if you
> are seeing any
> >    >    >  error messages like this "Can't find an image storage in zone
> with less than".
> >    >    >  This is to check if secondary storage free space check failed.
> >    >    >  3.  For the snapshots which are failing and if it is delta
> snapshot can you
> >    >    >  check if its parent's snapshot entry exists in
> "snapshot_store_ref" table with
> >    >    >  'parent_snapshot_id' of the current snapshot with
> 'store_role' "Image". This is
> >    >    >  to find the secondary storage where the parent snapshot
> backup is located.
> >    >    >
> >    >    > Regards,
> >    >    > Harikrishna
> >    >    > ________________________________
> >    >    > From: Andrei Mikhailovsky <and...@arhont.com.INVALID>
> >    >    > Sent: Monday, June 7, 2021 7:00 PM
> >    >    > To: users <users@cloudstack.apache.org>
> >    >    > Subject: Snapshots are not working after upgrading to 4.15.0
> >    >    >
> >    >    > Hello everyone,
> >    >    >
> >    >    > I am having an issue with volume snapshots since I've upgraded
> to 4.15.0. None
> >    >    > of the volumes are being snapshotted regardless if the
> snapshot is initiated
> >    >    > manually or from the schedule. The strange thing is that if I
> manually take the
> >    >    > snapshot, the GUI shows Success status, but the
> Storage>Snapshots show an Error
> >    >    > status. Here is what I see in the management server logs:
> >    >    >
> >    >    > 2021-06-07 13:55:20,022 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> >    >    > (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143)
> (logid:be34ce01) Done
> >    >    > executing com.cloud.vm.VmWorkTakeVolumeSnapshot for job-86143
> >    >    > 2021-06-07 13:55:20,024 INFO [o.a.c.f.j.i.AsyncJobMonitor]
> >    >    > (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143)
> (logid:be34ce01) Remove
> >    >    > job-86143 from job monitoring
> >    >    > 2021-06-07 13:55:20,094 DEBUG [o.a.c.s.s.SnapshotServiceImpl]
> >    >    > (BackupSnapshotTask-3:ctx-744796da) (logid:607dbb0e) Failed to
> copy snapshot
> >    >    > com.cloud.utils.exception.CloudRuntimeException: can not find
> an image stores
> >    >    > at
> >    >    >
> org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.backupSnapshot(SnapshotServiceImpl.java:271)
> >    >    > at
> >    >    >
> org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.backupSnapshot(DefaultSnapshotStrategy.java:171)
> >    >    > at
> >    >    >
> com.cloud.storage.snapshot.SnapshotManagerImpl$BackupSnapshotTask.runInContext(SnapshotManagerImpl.java:1238)
> >    >    > at
> >    >    >
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48)
> >    >    > at
> >    >    >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55)
> >    >    > at
> >    >    >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102)
> >    >    > at
> >    >    >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52)
> >    >    > at
> >    >    >
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45)
> >    >    > at
> >    >    >
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> >    >    > at
> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> >    >    > at
> >    >    >
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
> >    >    > at
> >    >    >
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> >    >    > at
> >    >    >
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> >    >    > at java.base/java.lang.Thread.run(Thread.java:829)
> >    >    > 2021-06-07 13:55:20,152 DEBUG [c.c.s.s.SnapshotManagerImpl]
> >    >    > (BackupSnapshotTask-3:ctx-744796da) (logid:607dbb0e) Backing
> up of snapshot
> >    >    > failed, for snapshot with ID 53531, left with 2 more attempts
> >    >    >
> >    >    >
> >    >    > I've checked and the Secondary storage is configured and
> visible in the GUI. I
> >    >    > can also mount it manually from the management server and a
> couple of host
> >    >    > servers that I've tested. In addition, I can successfully
> upload an ISO image
> >    >    > and that registers just fine and I can create new VMs using
> the newly uploaded
> >    >    > ISO image.
> >    >    >
> >    >    > I've had no such problems with 4.13.x ACS, so the issue seems
> to have been
> >    >    > introduced after doing the upgrade to 4.15.0.
> >    >    >
> >    >    > Could you please let me know how do I fix the issue?
> >    >    >
> >    >    > Cheers
> >    >    >
> >     >     > andrei
>

Re: Snapshots are not working after upgrading to 4.15.0

Reply via email to