On Mon, Oct 16, 2017 at 4:51 PM, Erekle Magradze <erekle.magra...@recogizer.de> wrote: > That's the problem, at that time nobody has restarted the server.
Please provide engine log from this time so we could see whether it was trigger by it. > > Is there any scenario when the hypervisor is restarted by engine? > > Cheers > > Erekle > > > > On 10/16/2017 04:45 PM, Piotr Kliczewski wrote: >> >> Erekle, >> >> For the time period you mentioned I do not see anything wrong on vdsm >> side except of a restart at 2017-10-15 16:28:50,993+0200. It looks >> like manual restart. >> The engine log starts at 2017-10-16 03:49:04,092+02 so not able to say >> whether there was anything else except of heartbeat issue caused by >> the restart. >> >> The restart was the cause of "connection reset by peer" on mom side. >> >> Thanks, >> Piotr >> >> On Mon, Oct 16, 2017 at 4:21 PM, Erekle Magradze >> <erekle.magra...@recogizer.de> wrote: >>> >>> Hi Piotr, >>> >>> Several times I've restarted vdsm daemon on certain nods, that could be >>> the >>> reason. >>> >>> The failure, I've mentioned, has happened yesterday from 15:00 to 17:00 >>> >>> Cheers >>> >>> Erekle >>> >>> >>> >>> On 10/16/2017 04:13 PM, Piotr Kliczewski wrote: >>>> >>>> Erekle, >>>> >>>> In the logs you provided I see: >>>> >>>> IOError: [Errno 5] _handleRequests._checkForMail - Could not read >>>> mailbox: >>>> >>>> /rhev/data-center/6d52512e-1c02-4509-880a-bf57cbad4bdf/mastersd/dom_md/inbox >>>> >>>> and >>>> >>>> StorageDomainMasterError: Error validating master storage domain: ('MD >>>> read error',) >>>> >>>> which seems to be cause for vdsm being killed by sanlock which caused >>>> connection reset by peer. >>>> >>>> After vdsm restart storage looks good. >>>> >>>> @Nir can you take a look? >>>> >>>> Thanks, >>>> Piotr >>>> >>>> On Mon, Oct 16, 2017 at 3:59 PM, Erekle Magradze >>>> <erekle.magra...@recogizer.de> wrote: >>>>> >>>>> Hi, >>>>> >>>>> The issue is the following, after installation of ovirt 4.1 on three >>>>> nodes >>>>> with glusterFS as a storage, oVirt engine reported the failed events, >>>>> with >>>>> the following message >>>>> >>>>> VDSM hostname command GetStatsVDS failed: Connection reset by peer >>>>> >>>>> after that oVirt was trying to fence the affected host and it was >>>>> excluded >>>>> from production, luckily I am not running any VMs on it yet. >>>>> >>>>> The logs are attached, don't be surprised with the hostnames :) >>>>> >>>>> Thanks in advance >>>>> >>>>> Cheers >>>>> >>>>> Erekle >>>>> >>>>> >>>>> On 10/16/2017 03:37 PM, Dafna Ron wrote: >>>>> >>>>> Hi, >>>>> >>>>> Can you please tell us what is the issue that you are actually facing? >>>>> :) >>>>> it >>>>> would be easier to debug an issue and not an error message that can be >>>>> cause >>>>> by several things. >>>>> >>>>> Also, can you provide the engine and the vdsm logs? >>>>> >>>>> thank you, >>>>> Dafna >>>>> >>>>> >>>>> On 10/16/2017 02:30 PM, Erekle Magradze wrote: >>>>> >>>>> It's was a typo in the failure message, >>>>> >>>>> that's what I was getting: >>>>> >>>>> VDSM hostname command GetStatsVDS failed: Connection reset by peer >>>>> >>>>> >>>>> On 10/16/2017 03:21 PM, Erekle Magradze wrote: >>>>> >>>>> Hi, >>>>> >>>>> It's getting clear now, indeed momd service is disabled >>>>> >>>>> ● momd.service - Memory Overcommitment Manager Daemon >>>>> Loaded: loaded (/usr/lib/systemd/system/momd.service; static; >>>>> vendor >>>>> preset: disabled) >>>>> Active: inactive (dead) >>>>> >>>>> mom-vdsm is enable and running. >>>>> >>>>> ● mom-vdsm.service - MOM instance configured for VDSM purposes >>>>> Loaded: loaded (/usr/lib/systemd/system/mom-vdsm.service; enabled; >>>>> vendor >>>>> preset: enabled) >>>>> Active: active (running) since Mon 2017-10-16 15:14:35 CEST; 1min >>>>> 3s >>>>> ago >>>>> Main PID: 27638 (python) >>>>> CGroup: /system.slice/mom-vdsm.service >>>>> └─27638 python /usr/sbin/momd -c /etc/vdsm/mom.conf >>>>> >>>>> The reason why I came up with digging in mom problems is the following >>>>> problem >>>>> >>>>> >>>>> VDSM hostname command GetStatsVDSThanks failed: Connection reset by >>>>> peer >>>>> >>>>> that is causing fencing of the node where the failure is happening, >>>>> what >>>>> could be the reason of GetStatsVDS failure? >>>>> >>>>> Best Regards >>>>> Erekle >>>>> >>>>> >>>>> On 10/16/2017 03:11 PM, Martin Sivak wrote: >>>>> >>>>> Hi, >>>>> >>>>> how do you start MOM? MOM is supposed to talk to vdsm, we do not talk >>>>> to libvirt directly. The line you posted comes from vdsm and vdsm is >>>>> telling you it can't talk to MOM. >>>>> >>>>> Which MOM service is enabled? Because there are two momd and mom-vdsm, >>>>> the second one is the one that should be enabled. >>>>> >>>>> Best regards >>>>> >>>>> Martin Sivak >>>>> >>>>> >>>>> On Mon, Oct 16, 2017 at 3:04 PM, Erekle Magradze >>>>> <erekle.magra...@recogizer.de> wrote: >>>>> >>>>> Hi Martin, >>>>> >>>>> Thanks for the answer, unfortunately this warning message persists, >>>>> does >>>>> it >>>>> mean that mom cannot communicate with libvirt? how critical is it? >>>>> >>>>> Best >>>>> >>>>> Erekle >>>>> >>>>> >>>>> >>>>> On 10/16/2017 03:03 PM, Martin Sivak wrote: >>>>> >>>>> Hi, >>>>> >>>>> it is just a warning, there is nothing you have to solve unless it >>>>> does not resolve itself within a minute or so. If it happens only once >>>>> or twice after vdsm or mom restart then you are fine. >>>>> >>>>> Best regards >>>>> >>>>> -- >>>>> Martin Sivak >>>>> SLA / oVirt >>>>> >>>>> On Mon, Oct 16, 2017 at 2:44 PM, Erekle Magradze >>>>> <erekle.magra...@recogizer.de> wrote: >>>>> >>>>> Hi, >>>>> >>>>> after running >>>>> >>>>> systemctl status vdsm I am getting that it's running and this message >>>>> at >>>>> the >>>>> end. >>>>> >>>>> Oct 16 14:26:52 hostname vdsmd[2392]: vdsm throttled WARN MOM not >>>>> available. >>>>> Oct 16 14:26:52 hostname vdsmd[2392]: vdsm throttled WARN MOM not >>>>> available, >>>>> KSM stats will be missing. >>>>> Oct 16 14:26:57 hostname vdsmd[2392]: vdsm root WARN ping was >>>>> deprecated >>>>> in >>>>> favor of ping2 and confirmConnectivity >>>>> >>>>> how critical it is? and how to solve that warning? >>>>> >>>>> I am using libvirt >>>>> >>>>> Cheers >>>>> >>>>> _______________________________________________ >>>>> Users mailing list >>>>> Users@ovirt.org >>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list >>>>> Users@ovirt.org >>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>> >>>>> >>>>> -- >>>>> Recogizer Group GmbH >>>>> >>>>> Dr.rer.nat. Erekle Magradze >>>>> Lead Big Data Engineering & DevOps >>>>> Rheinwerkallee 2, 53227 Bonn >>>>> Tel: +49 228 29974555 >>>>> >>>>> E-Mail erekle.magra...@recogizer.de >>>>> Web: www.recogizer.com >>>>> >>>>> Recogizer auf LinkedIn https://www.linkedin.com/company-beta/10039182/ >>>>> Folgen Sie uns auf Twitter https://twitter.com/recogizer >>>>> >>>>> ----------------------------------------------------------------- >>>>> Recogizer Group GmbH >>>>> Geschäftsführer: Oliver Habisch, Carsten Kreutze >>>>> Handelsregister: Amtsgericht Bonn HRB 20724 >>>>> Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993 >>>>> >>>>> Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte >>>>> Informationen. >>>>> Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich >>>>> erhalten haben, >>>>> informieren Sie bitte sofort den Absender und löschen Sie diese Mail. >>>>> Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und >>>>> der >>>>> darin enthaltenen Informationen ist nicht gestattet. >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list >>>>> Users@ovirt.org >>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>> >>>>> >>>>> >>>>> -- >>>>> Recogizer Group GmbH >>>>> >>>>> Dr.rer.nat. Erekle Magradze >>>>> Lead Big Data Engineering & DevOps >>>>> Rheinwerkallee 2, 53227 Bonn >>>>> Tel: +49 228 29974555 >>>>> >>>>> E-Mail erekle.magra...@recogizer.de >>>>> Web: www.recogizer.com >>>>> >>>>> Recogizer auf LinkedIn https://www.linkedin.com/company-beta/10039182/ >>>>> Folgen Sie uns auf Twitter https://twitter.com/recogizer >>>>> >>>>> ----------------------------------------------------------------- >>>>> Recogizer Group GmbH >>>>> Geschäftsführer: Oliver Habisch, Carsten Kreutze >>>>> Handelsregister: Amtsgericht Bonn HRB 20724 >>>>> Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993 >>>>> >>>>> Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte >>>>> Informationen. >>>>> Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich >>>>> erhalten haben, >>>>> informieren Sie bitte sofort den Absender und löschen Sie diese Mail. >>>>> Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und >>>>> der >>>>> darin enthaltenen Informationen ist nicht gestattet. >>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list >>>>> Users@ovirt.org >>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>> >>> -- >>> Recogizer Group GmbH >>> >>> Dr.rer.nat. Erekle Magradze >>> Lead Big Data Engineering & DevOps >>> Rheinwerkallee 2, 53227 Bonn >>> Tel: +49 228 29974555 >>> >>> E-Mail erekle.magra...@recogizer.de >>> Web: www.recogizer.com >>> Recogizer auf LinkedIn https://www.linkedin.com/company-beta/10039182/ >>> Folgen Sie uns auf Twitter https://twitter.com/recogizer >>> ----------------------------------------------------------------- >>> Recogizer Group GmbH >>> Geschäftsführer: Oliver Habisch, Carsten Kreutze >>> Handelsregister: Amtsgericht Bonn HRB 20724 >>> Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993 >>> Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte >>> Informationen. >>> Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich >>> erhalten haben, >>> informieren Sie bitte sofort den Absender und löschen Sie diese Mail. >>> Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und >>> der >>> darin enthaltenen Informationen ist nicht gestattet. >>> > > -- > Recogizer Group GmbH > > Dr.rer.nat. Erekle Magradze > Lead Big Data Engineering & DevOps > Rheinwerkallee 2, 53227 Bonn > Tel: +49 228 29974555 > > E-Mail erekle.magra...@recogizer.de > Web: www.recogizer.com > Recogizer auf LinkedIn https://www.linkedin.com/company-beta/10039182/ > Folgen Sie uns auf Twitter https://twitter.com/recogizer > ----------------------------------------------------------------- > Recogizer Group GmbH > Geschäftsführer: Oliver Habisch, Carsten Kreutze > Handelsregister: Amtsgericht Bonn HRB 20724 > Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993 > Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich > erhalten haben, > informieren Sie bitte sofort den Absender und löschen Sie diese Mail. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der > darin enthaltenen Informationen ist nicht gestattet. > _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users