Been reinstalling to stocj CentOS 6.5 last night, all successful. Until roughly midnight GMT, 2 out of 4 hosts were showing the same errors.
Any more suggestions? On Sat, Feb 22, 2014 at 8:57 PM, Nir Soffer <nsof...@redhat.com> wrote: > ----- Original Message ----- > > From: "Johan Kooijman" <m...@johankooijman.com> > > To: "Nir Soffer" <nsof...@redhat.com> > > Cc: "users" <users@ovirt.org> > > Sent: Wednesday, February 19, 2014 2:34:36 PM > > Subject: Re: [Users] Nodes lose storage at random > > > > Messages: https://t-x.dignus.nl/messages.txt > > Sanlock: https://t-x.dignus.nl/sanlock.log.txt > > We can see in /var/log/messages, that sanlock failed to write to > the ids lockspace [1], which after 80 seconds [2], caused vdsm to loose > its host id lease. In this case, sanlock kill vdsm [3], which die after 11 > retries [4]. Then vdsm is respawned again [5]. This is expected. > > We don't know why sanlock failed to write to the storage, but in [6] the > kernel tell us that the nfs server is not responding. Since the nfs server > is accessible from other machines, it means you have some issue with this > host. > > Later the machine reboots [7], and nfs server is still not accessible. Then > you have lot of WARN_ON call traces [8], that looks related to network > code. > > We can see that you are not running most recent kernel [7]. We experienced > various > nfs issues during the 6.5 beta. > > I would try to get help from kernel folks about this. > > [1] Feb 18 10:47:46 hv5 sanlock[14753]: 2014-02-18 10:47:46+0000 1251833 > [21345]: s2 delta_renew read rv -202 offset 0 > /rhev/data-center/mnt/10.0.24.1: > _santank_ovirt-data/e9f70496-f181-4c9b-9ecb-d7f780772b04/dom_md/ids > > [2] Feb 18 10:48:35 hv5 sanlock[14753]: 2014-02-18 10:48:35+0000 1251882 > [14753]: s2 check_our_lease failed 80 > > [3] Feb 18 10:48:35 hv5 sanlock[14753]: 2014-02-18 10:48:35+0000 1251882 > [14753]: s2 kill 19317 sig 15 count 1 > > [4] Feb 18 10:48:45 hv5 sanlock[14753]: 2014-02-18 10:48:45+0000 1251892 > [14753]: dead 19317 ci 3 count 11 > > [5] Feb 18 10:48:45 hv5 respawn: slave '/usr/share/vdsm/vdsm' died, > respawning slave > > [6] Feb 18 10:57:36 hv5 kernel: nfs: server 10.0.24.1 not responding, > timed out > > [7] > Feb 18 11:03:01 hv5 kernel: imklog 5.8.10, log source = /proc/kmsg started. > Feb 18 11:03:01 hv5 kernel: Linux version 2.6.32-358.18.1.el6.x86_64 ( > mockbu...@c6b10.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat > 4.4.7-3) (GCC) ) #1 SMP Wed Aug 28 17:19:38 UTC 2013 > > [8] > Feb 18 18:29:53 hv5 kernel: ------------[ cut here ]------------ > Feb 18 18:29:53 hv5 kernel: WARNING: at net/core/dev.c:1759 > skb_gso_segment+0x1df/0x2b0() (Not tainted) > Feb 18 18:29:53 hv5 kernel: Hardware name: X9DRW > Feb 18 18:29:53 hv5 kernel: igb: caps=(0x12114bb3, 0x0) len=1596 > data_len=0 ip_summed=0 > Feb 18 18:29:53 hv5 kernel: Modules linked in: ebt_arp nfs fscache > auth_rpcgss nfs_acl bonding softdog ebtable_nat ebtables bnx2fc fcoe > libfcoe libfc scsi_transport_fc scsi_tgt > lockd sunrpc bridge ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 > iptable_filter ip_tables xt_physdev ip6t_REJECT nf_conntrack_ipv6 > nf_defrag_ipv6 xt_state nf_conntrack xt_multi > port ip6table_filter ip6_tables ext4 jbd2 8021q garp stp llc > sha256_generic cbc cryptoloop dm_crypt aesni_intel cryptd aes_x86_64 > aes_generic vhost_net macvtap macvlan tun kvm_ > intel kvm sg sb_edac edac_core iTCO_wdt iTCO_vendor_support ioatdma shpchp > dm_snapshot squashfs ext2 mbcache dm_round_robin sd_mod crc_t10dif isci > libsas scsi_transport_sas 3w_ > sas ahci ixgbe igb dca ptp pps_core dm_multipath dm_mirror dm_region_hash > dm_log dm_mod be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi > cxgb3 mdio libiscsi_tcp qla4xx > x iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: > scsi_wait_scan] > Feb 18 18:29:53 hv5 kernel: Pid: 5462, comm: vhost-5458 Not tainted > 2.6.32-358.18.1.el6.x86_64 #1 > Feb 18 18:29:53 hv5 kernel: Call Trace: > Feb 18 18:29:53 hv5 kernel: <IRQ> [<ffffffff8106e3e7>] ? > warn_slowpath_common+0x87/0xc0 > Feb 18 18:29:53 hv5 kernel: [<ffffffff8106e4d6>] ? > warn_slowpath_fmt+0x46/0x50 > Feb 18 18:29:53 hv5 kernel: [<ffffffffa020bd62>] ? > igb_get_drvinfo+0x82/0xe0 [igb] > Feb 18 18:29:53 hv5 kernel: [<ffffffff81448e7f>] ? > skb_gso_segment+0x1df/0x2b0 > Feb 18 18:29:53 hv5 kernel: [<ffffffff81449260>] ? > dev_hard_start_xmit+0x1b0/0x530 > Feb 18 18:29:53 hv5 kernel: [<ffffffff8146773a>] ? > sch_direct_xmit+0x15a/0x1c0 > Feb 18 18:29:53 hv5 kernel: [<ffffffff8144d0c0>] ? > dev_queue_xmit+0x3b0/0x550 > Feb 18 18:29:53 hv5 kernel: [<ffffffffa04af65c>] ? > br_dev_queue_push_xmit+0x6c/0xa0 [bridge] > Feb 18 18:29:53 hv5 kernel: [<ffffffffa04af6e8>] ? > br_forward_finish+0x58/0x60 [bridge] > Feb 18 18:29:53 hv5 kernel: [<ffffffffa04af79a>] ? __br_forward+0xaa/0xd0 > [bridge] > Feb 18 18:29:53 hv5 kernel: [<ffffffff81474f34>] ? nf_hook_slow+0x74/0x110 > Feb 18 18:29:53 hv5 kernel: [<ffffffffa04af81d>] ? br_forward+0x5d/0x70 > [bridge] > Feb 18 18:29:53 hv5 kernel: [<ffffffffa04b0609>] ? > br_handle_frame_finish+0x179/0x2a0 [bridge] > Feb 18 18:29:53 hv5 kernel: [<ffffffffa04b08da>] ? > br_handle_frame+0x1aa/0x250 [bridge] > Feb 18 18:29:53 hv5 kernel: [<ffffffffa0331690>] ? pit_timer_fn+0x0/0x80 > [kvm] > Feb 18 18:29:53 hv5 kernel: [<ffffffff81448929>] ? > __netif_receive_skb+0x529/0x750 > Feb 18 18:29:53 hv5 kernel: [<ffffffff81448bea>] ? > process_backlog+0x9a/0x100 > Feb 18 18:29:53 hv5 kernel: [<ffffffff8144d453>] ? > net_rx_action+0x103/0x2f0 > Feb 18 18:29:53 hv5 kernel: [<ffffffff810770b1>] ? __do_softirq+0xc1/0x1e0 > Feb 18 18:29:53 hv5 kernel: [<ffffffff8100c1cc>] ? call_softirq+0x1c/0x30 > Feb 18 18:29:53 hv5 kernel: <EOI> [<ffffffff8100de05>] ? > do_softirq+0x65/0xa0 > Feb 18 18:29:53 hv5 kernel: [<ffffffff8144d8d8>] ? netif_rx_ni+0x28/0x30 > Feb 18 18:29:53 hv5 kernel: [<ffffffffa02b7749>] ? tun_sendmsg+0x229/0x4ec > [tun] > Feb 18 18:29:53 hv5 kernel: [<ffffffffa037bcf5>] ? handle_tx+0x275/0x5e0 > [vhost_net] > Feb 18 18:29:53 hv5 kernel: [<ffffffffa037c095>] ? > handle_tx_kick+0x15/0x20 [vhost_net] > Feb 18 18:29:53 hv5 kernel: [<ffffffffa037955c>] ? vhost_worker+0xbc/0x140 > [vhost_net] > Feb 18 18:29:53 hv5 kernel: [<ffffffffa03794a0>] ? vhost_worker+0x0/0x140 > [vhost_net] > Feb 18 18:29:53 hv5 kernel: [<ffffffff81096a36>] ? kthread+0x96/0xa0 > Feb 18 18:29:53 hv5 kernel: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20 > Feb 18 18:29:53 hv5 kernel: [<ffffffff810969a0>] ? kthread+0x0/0xa0 > Feb 18 18:29:53 hv5 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 > Feb 18 18:29:53 hv5 kernel: ---[ end trace 2ae4b3142333fe7d ]--- > > -- Met vriendelijke groeten / With kind regards, Johan Kooijman T +31(0) 6 43 44 45 27 F +31(0) 162 82 00 01 E m...@johankooijman.com
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users