#33785: cannot create new machines in ganeti cluster -------------------------------------------------+------------------------- Reporter: anarcat | Owner: anarcat Type: defect | Status: | assigned Priority: High | Milestone: Component: Internal Services/Tor Sysadmin Team | Version: Severity: Major | Resolution: Keywords: | Actual Points: Parent ID: | Points: Reviewer: | Sponsor: -------------------------------------------------+-------------------------
Comment (by anarcat): some feedback from a ganeti maintainer: {{{ 03:40:48 <apoikos> failure reasons: FailMem: 1, FailN1: 4 03:41:18 <apoikos> part indicates that there's no N+1 redundancy, probably due to not enough memory being available on the cluster to accommodate it 03:42:05 <apoikos> You can try a manual allocation, or passing flags like --ignore-soft-errors and --no-capacity-checks to hail [...] 10:36:12 <apoikos> I doubt rebalancing will fix it 10:36:31 <apoikos> The thing is, the whole htools logic was built around Xen which does hard commit on memory [...] 10:37:08 <apoikos> That's the -14GB of RAm you're seeing 10:37:11 <anarchat> so what you're saying is that i *am* effectively using too much memory 10:37:13 <anarchat> oh weird 10:37:24 <anarchat> like the memory use from /proc doesn't match what ganeti expects? 10:37:28 <apoikos> no, I'm saying you're using less memory than Ganeti thinks 10:37:34 <apoikos> exactly 10:37:41 <apoikos> because KVMs VSZ != RSS [...] 10:38:17 <apoikos> Let's say it computes the worst-case scenario 10:38:46 <apoikos> And in the worst-case scenario, where each instance will indeed use all of its configured memory and KSM won't save you, you don't have N+1 10:39:08 <apoikos> As for the 162GB of disk, these are probably your root LVs, if they live on the same LVM VG as the Ganeti instance disks 10:39:39 <anarchat> well there's also a secondary VG (vg_ganeti_hdd) for spinning rust that we don't see in gnt-node-list 10:39:48 <anarchat> i wonder if that's related 10:39:52 <apoikos> nope 10:40:08 <apoikos> If your primary VG has anything else than Ganeti VMs on it, you'll see that message 10:40:20 <anarchat> darn 10:40:27 <anarchat> so i'd need to rebuild my nodes to fix this 10:40:34 <apoikos> the good news is, you can tell ganeti to ignore specific LVs using gnt-cluster modify --reserved-lvs 10:40:41 <anarchat> oh cool 10:41:19 <anarchat> so i'd ignore what... vg_ganeti/root and vg_ganeti/swap i guess 10:41:29 <apoikos> I guess 10:41:50 <apoikos> The option --reserved-lvs specifies a list (comma- separated) of logical volume group names (regular expressions) that will be ignored by the cluster verify operation 10:41:53 <anarchat> i alreayd have lvm reserved volumes: vg_ganeti/root, vg_ganeti/swap 10:42:19 <anarchat> oh but maybe i have extra LVs on those nodes, that's true 10:43:47 <anarchat> on fsn-node-03 and fsn-node-05, but not fsn-node-04 }}} they also noted [https://github.com/ganeti/ganeti/issues/1399 upstream issue 1399] which is that the Sinst field is incorrect in `gnt-node list`. -- Ticket URL: <https://trac.torproject.org/projects/tor/ticket/33785#comment:3> Tor Bug Tracker & Wiki <https://trac.torproject.org/> The Tor Project: anonymity online
_______________________________________________ tor-bugs mailing list tor-bugs@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs