> From: Paul Heinlein [mailto:[email protected]]
> Sent: Saturday, January 11, 2014 10:55 AM
> 
> Likewise with a complex entity like Rackspace. My experience is completely
> different than Ed's: the VMs that I manage there have had superb storage
> characteristics. It's the networking bandwidth that's more often the issue.

Just try sticking some scripts into cron. Here's one that produces failures for 
us:

# Having witnessed more than enough failures of ntpd, run ntpdate via cron 
instead.  It's consistently reliable.
# Once per hour, at a random predetermined minute different for each system, 
set the time.
*  *  *  *  *   /usr/bin/test $(( 0x`/usr/bin/hostid` % 60 )) -eq `/bin/date 
'+%M'` && /usr/sbin/ntpdate north-america.pool.ntp.org &> /dev/null

We have this in a dozen machines.  For several days, they all work fine.  And 
then we get, from a random machine each time, cron failure email "/bin/date: 
command not found" or /usr/bin/test, or /usr/bin/hostid, or any random one of 
those commands.  The machine will have to reboot in order to make the problem 
go away.  I tracked it down to I/O error recorded in the system log.  The only 
explanation can be a fault in the storage backend, plus caching to make it keep 
failing on subsequent calls.

Speaking of which, you would expect, for a tiny script that runs every minute, 
that the whole thing would be cached once upon first run and then storage never 
needed again.  I don't have any explanation for that behavior.  Also, if 
there's something bad in kernel cache, it should affect all processes trying 
the same thing.  But if I login manually on the failing machine, and run the 
commands manually, they work fine.  I also don't have an explanation for that 
behavior.

Like I said before - the lack of other people complaining about the problem 
doesn't mean other people aren't experiencing the problem.  You probably are 
too.  You're just not bothering to detect it.  I suggest monitoring the syslog 
in general, and in particular, IO errors.
_______________________________________________
Tech mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to