> From: Paul Heinlein [mailto:[email protected]] > Sent: Saturday, January 11, 2014 10:55 AM > > Likewise with a complex entity like Rackspace. My experience is completely > different than Ed's: the VMs that I manage there have had superb storage > characteristics. It's the networking bandwidth that's more often the issue.
Just try sticking some scripts into cron. Here's one that produces failures for us: # Having witnessed more than enough failures of ntpd, run ntpdate via cron instead. It's consistently reliable. # Once per hour, at a random predetermined minute different for each system, set the time. * * * * * /usr/bin/test $(( 0x`/usr/bin/hostid` % 60 )) -eq `/bin/date '+%M'` && /usr/sbin/ntpdate north-america.pool.ntp.org &> /dev/null We have this in a dozen machines. For several days, they all work fine. And then we get, from a random machine each time, cron failure email "/bin/date: command not found" or /usr/bin/test, or /usr/bin/hostid, or any random one of those commands. The machine will have to reboot in order to make the problem go away. I tracked it down to I/O error recorded in the system log. The only explanation can be a fault in the storage backend, plus caching to make it keep failing on subsequent calls. Speaking of which, you would expect, for a tiny script that runs every minute, that the whole thing would be cached once upon first run and then storage never needed again. I don't have any explanation for that behavior. Also, if there's something bad in kernel cache, it should affect all processes trying the same thing. But if I login manually on the failing machine, and run the commands manually, they work fine. I also don't have an explanation for that behavior. Like I said before - the lack of other people complaining about the problem doesn't mean other people aren't experiencing the problem. You probably are too. You're just not bothering to detect it. I suggest monitoring the syslog in general, and in particular, IO errors. _______________________________________________ Tech mailing list [email protected] https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
