We are still having this problem and we can not figure out what to do, i sent the logs already as download, can i do anything else to help ?
On 04/06/15 17:08, "Soeren Malchow" <soeren.malc...@mcon.net> wrote: >Hi, > >I would send those, but unfortunately we did not think about the journals >getting deleted after a reboot. > >I just made the journals persistent on the servers, we are trying to >trigger the error again last time we only got half way through the VM’s >when removing the snapshots so we have a good chance that it comes up >again. > >Also the libvirt logs to the journal not to libvirtd.log, i would send the >journal directly to you and Eric via our data exchange servers > > >Soeren > >On 04/06/15 16:17, "Adam Litke" <ali...@redhat.com> wrote: > >>On 04/06/15 13:08 +0000, Soeren Malchow wrote: >>>Hi Adam, Hi Eric, >>> >>>We had this issue again a few minutes ago. >>> >>>One machine went down exactly the same way as described, the machine had >>>only one snapshot and it was the only snapshot that was removed, before >>>that in the same scriptrun we deleted the snapshots of 15 other Vms, >>>some >>>without, some with 1 and some with several snapshots. >>> >>>Can i provide anything from the logs that helps ? >> >>Let's start with the libvirtd.log on that host. It might be rather >>large so we may need to find a creative place to host it. >> >>> >>>Regards >>>Soeren >>> >>> >>> >>>On 03/06/15 18:07, "Soeren Malchow" <soeren.malc...@mcon.net> wrote: >>> >>>>Hi, >>>> >>>>This is not happening every time, the last time i had this, it was a >>>>script runnning, and something like th 9. Vm and the 23. Vm had a >>>>problem, >>>>and it is not always the same VMS, it is not about the OS (happen for >>>>Windows and Linux alike) >>>> >>>>And as i said it also happened when i tried to remove the snapshots >>>>sequentially, here is the code (i know it is probably not the elegant >>>>way, >>>>but i am not a developer) and the code actually has correct indentions. >>>> >>>><― snip ―> >>>> >>>>print "Snapshot deletion" >>>>try: >>>> time.sleep(300) >>>> Connect() >>>> vms = api.vms.list() >>>> for vm in vms: >>>> print ("Deleting snapshots for %s ") % vm.name >>>> snapshotlist = vm.snapshots.list() >>>> for snapshot in snapshotlist: >>>> if snapshot.description != "Active VM": >>>> time.sleep(30) >>>> snapshot.delete() >>>> try: >>>> while >>>>api.vms.get(name=vm.name).snapshots.get(id=snapshot.id).snapshot_status >>>>== >>>>"locked": >>>> print("Waiting for snapshot %s on %s deletion >>>>to >>>>finish") % (snapshot.description, vm.name) >>>> time.sleep(60) >>>> except Exception as e: >>>> print ("Snapshot %s does not exist anymore") % >>>>snapshot.description >>>> print ("Snapshot deletion for %s done") % vm.name >>>> print ("Deletion of snapshots done") >>>> api.disconnect() >>>>except Exception as e: >>>> print ("Something went wrong when deleting the snapshots\n%s") % >>>>str(e) >>>> >>>> >>>> >>>><― snip ―> >>>> >>>> >>>>Cheers >>>>Soeren >>>> >>>> >>>> >>>> >>>> >>>>On 03/06/15 15:20, "Adam Litke" <ali...@redhat.com> wrote: >>>> >>>>>On 03/06/15 07:36 +0000, Soeren Malchow wrote: >>>>>>Dear Adam >>>>>> >>>>>>First we were using a python script that was working on 4 threads and >>>>>>therefore removing 4 snapshots at the time throughout the cluster, >>>>>>that >>>>>>still caused problems. >>>>>> >>>>>>Now i took the snapshot removing out of the threaded part an i am >>>>>>just >>>>>>looping through each snapshot on each VM one after another, even with >>>>>>³sleeps² inbetween, but the problem remains. >>>>>>But i am getting the impression that it is a problem with the amount >>>>>>of >>>>>>snapshots that are deleted in a certain time, if i delete manually >>>>>>and >>>>>>one >>>>>>after another (meaning every 10 min or so) i do not have problems, if >>>>>>i >>>>>>delete manually and do several at once and on one VM the next one >>>>>>just >>>>>>after one finished, the risk seems to increase. >>>>> >>>>>Hmm. In our lab we extensively tested removing a snapshot for a VM >>>>>with 4 disks. This means 4 block jobs running simultaneously. Less >>>>>than 10 minutes later (closer to 1 minute) we would remove a second >>>>>snapshot for the same VM (again involving 4 block jobs). I guess we >>>>>should rerun this flow on a fully updated CentOS 7.1 host to see about >>>>>local reproduction. Seems your case is much simpler than this though. >>>>>Is this happening every time or intermittently? >>>>> >>>>>>I do not think it is the number of VMS because we had this on hosts >>>>>>with >>>>>>only 3 or 4 Vms running >>>>>> >>>>>>I will try restarting the libvirt and see what happens. >>>>>> >>>>>>We are not using RHEL 7.1 only CentOS 7.1 >>>>>> >>>>>>Is there anything else we can look at when this happens again ? >>>>> >>>>>I'll defer to Eric Blake for the libvirt side of this. Eric, would >>>>>enabling debug logging in libvirtd help to shine some light on the >>>>>problem? >>>>> >>>>>-- >>>>>Adam Litke >>>> >>>>_______________________________________________ >>>>Users mailing list >>>>Users@ovirt.org >>>>http://lists.ovirt.org/mailman/listinfo/users >>> >> >>-- >>Adam Litke > >_______________________________________________ >Users mailing list >Users@ovirt.org >http://lists.ovirt.org/mailman/listinfo/users _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users