Some random thoughts from our experience with HP, Juniper, Cisco: - 99% of our infrastructure has redundancy built in (multiple servers in a cluster, multiple switches in an A/B HA set up, etc.)
- For servers where the redundancy exists but is "suboptimal" (for instance, we've got multiple DNS servers, but dealing with a dead DNS server can be annoying on so many levels), we do 24x7x4, and try to keep some "commonly failed" spare parts on-hand to swap ourselves while we wait for HP to deliver the replacement part (hard drives are a good example). - For most other servers, we will do either 13x5x4 or NBD, depending on what's more cost-effective and available (for a while, you couldn't get a 3-year NBD contract at purchase time, but since we switched to 4-year contracts, we can). - For people who mentioned the "4 hours to show up, not to fix" issue, two things. 1.) Know what you're getting. There's someone here, not me but I can't remember who it was, who can tell the story of the $VENDOR support contract that guaranteed an employee would be on-site within 4 hours, and at 3h55m this guy who could only be described as "Farmer Ted" shows up in coveralls and muddy shoes simply to look at the machine and say "Yep, lemme escalate this". "Farmer Ted", an employee of $VENDOR, was on-contract simply to meet the commitment and nothing more. When he'd get a call, he'd come in out of the fields he was working (no joke) and head off to $VENDOR's client's site. 2.) HP, at least, also offers "CTR" service (Call-to-Repair). This is sold in, I believe, 4- and 6-hour commitments. It's more expensive, but basically it's their commitment to have your problem FIXED within that time-period. This means that they end up pre-staging cold hardware at a depot somewhat close to you so that if you suffer even a total failure, you're back up and running in XX hours. - For our network gear, where the failure rate is fairly low, the hardware cost is moderate, and the support-contract costs are high, we tend to go with "bare minimum" support levels needed to make sure we can get code-upgrades and cross-ship dead hardware, and keep cold-spares of the hardware on-site. If we suffer a switch failure (it's happened), we just swap out old-for-new and drop the old switch's config on the new one (they're backed up every 30 minutes automatically in our environment). We've got a dozen-odd switches per data-center, and the support-contract costs alone for all of them could pay for a brand-new switch, so it was more cost-effective to do it this way, especially once you get into Year Two. HTH, D _______________________________________________ Tech mailing list Tech@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/