Hi,

So, on Thursday we had the worst scenario occur. All hosts in the 4 node 
cluster we've had these issues with went non responsive and starting looping 
through various states. Spread across these hosts are the 45 guests we have the 
45 storage domains for. As we have a responsibility to the end users, we had to 
make the decision to stop trying to bring this cluster online and scrap it 
based on the information you've provided. We've now split the cluster in half 
and created two clusters with the guests spread between them (around 20 on 
each). I've also taken the step of starting to present a few 2 TB storage 
domains and am migrating the guest disks from their individual storage domains 
onto grouped shared domains.

This immediately reduces the number of storage domains by half on the clusters 
and will reduce it further as we consolidate the storage. We obviously still 
have the same number of guest disks so will still have a large number of 
logical volumes, we just reduce the number of physical volumes presented to 
each host (and storage domains within Ovirt). We'll just have to see if that 
improves things.

Thanks for your assistance and focus with the problem and I'm glad we helped 
squash at least one bug. I would have liked to actually get to the bottom of 
the problem with that specific cluster, but events took a turn for the worse 
and forced our hand.

At the moment the clusters are both behaving but it's early days yet. We 
haven't changed any of the iSCSI settings on the new clusters but we have kept 
the modified monitor.py.

Regards,
Mark

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to