Some random thoughts from our experience with HP, Juniper, Cisco:
- 99% of our infrastructure has redundancy built in (multiple servers
in a cluster, multiple switches in an A/B HA set up, etc.)
- For servers where the redundancy exists but is "suboptimal" (for
instance, we've got multiple DNS servers, but dealing with a dead DNS server
can be annoying on so many levels), we do 24x7x4, and try to keep some
"commonly failed" spare parts on-hand to swap ourselves while we wait for HP to
deliver the replacement part (hard drives are a good example).
- For most other servers, we will do either 13x5x4 or NBD, depending on
what's more cost-effective and available (for a while, you couldn't get a
3-year NBD contract at purchase time, but since we switched to 4-year
contracts, we can).
- For people who mentioned the "4 hours to show up, not to fix" issue,
two things.
1.) Know what you're getting. There's someone here, not me but
I can't remember who it was, who can tell the story of the $VENDOR support
contract that guaranteed an employee would be on-site within 4 hours, and at
3h55m this guy who could only be described as "Farmer Ted" shows up in
coveralls and muddy shoes simply to look at the machine and say "Yep, lemme
escalate this". "Farmer Ted", an employee of $VENDOR, was on-contract simply
to meet the commitment and nothing more. When he'd get a call, he'd come in out
of the fields he was working (no joke) and head off to $VENDOR's client's site.
2.) HP, at least, also offers "CTR" service (Call-to-Repair).
This is sold in, I believe, 4- and 6-hour commitments. It's more expensive, but
basically it's their commitment to have your problem FIXED within that
time-period. This means that they end up pre-staging cold hardware at a depot
somewhat close to you so that if you suffer even a total failure, you're back
up and running in XX hours.
- For our network gear, where the failure rate is fairly low, the
hardware cost is moderate, and the support-contract costs are high, we tend to
go with "bare minimum" support levels needed to make sure we can get
code-upgrades and cross-ship dead hardware, and keep cold-spares of the
hardware on-site. If we suffer a switch failure (it's happened), we just swap
out old-for-new and drop the old switch's config on the new one (they're backed
up every 30 minutes automatically in our environment). We've got a dozen-odd
switches per data-center, and the support-contract costs alone for all of them
could pay for a brand-new switch, so it was more cost-effective to do it this
way, especially once you get into Year Two.
HTH,
D
_______________________________________________
Tech mailing list
[email protected]
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/