On 05-Feb-18 12:01, Clem Cole wrote: > > But marketing never accepted because of the failover issue for > clusters. > > I never understood that. My argument was that nobody was going to > *knowingly***put a $1M cluster at risk with a $100 PCI card. We > could have just stated in the SPD that the Adaptec chip set was > supported on single (small) systems such as Workstations, DS10, DS20 > etc... But I lost that war. > ᐧ > The *word *you left out was probably the issue. It is trivially easy to add a workstation to a cluster, and neither VMS nor Tru64 verify that hardware meets requirements when a node joins a cluster. So it's not easy to dismiss the scenario that someone buys a workstation that is not intended for cluster use; then circumstances change and it turns up in your cluster. And it "just works" for a long time, until you hit the corner case. In your $M enterprise, stuff gets passed around and information gets lost as ownership changes at the periphery. (The way things moved about on the ZK engineering clusters is typical. Despite attempts to control, people needed to do their jobs & configuration limits were ignored/fudged.) *We just didn't make adding a node to a cluster difficult and mysterious enough.* Plus, profit is usually a percentage of user cost. More cost => more profit. (Assuming you make the sale.)
So product management's conservatism is understandable, given the risk that the SPD won't be re-read when the function of a node changes, and the resulting data corruption being laid at DEC's feet. Engineers aren't known for reading the instructions - and IT people who are under-staffed and under pressure less so. SPDs are even less appealing - they tend to be read at initial purchase - and subsequently only when the finger pointing starts. And that's after customer services has spent a lot of time and money diagnosing the problem. These days, we have gates with names like "network admission control"; they won't allow a VPN or Wireless client to connect to a network unless software is up-to-date. Something along those lines that also included hardware and firmware would be a useful addition to clusters - assuming you could do everything quickly enough to prevent cluster transition times from becoming unacceptable. It's non-trivial; the nasty cluster cases have to do with multi-ported hardware, so you need to check firmware revisions & bus configurations on all ports for compatibility. With all the permutations of the controllers being on stand-alone systems, cluster nodes not yet joined, joined cluster nodes, and redundant controllers on the same node. And interconnects: CI, NI, MC, DSSI, SCSI. And hot swap, which can upgrade or downgrade a controller on the fly. So, the counter-argument becomes "how much engineering should be invested in allowing a customer to save $100 on the cost of a PCI card?" And the easy answer is one of "none" and "it's not a priority". Ship only cluster capable hardware, and "problem solved". Not all engineering problems are best solved with engineering solutions. But I'll grant that the engineering would be a lot more fun :-) An imperfect analogy would be selling cars without windshield wipers to people who promise that they never drive in the rain. It's in the nature of things that someday the rain will come. Or the car will be passed on. Of course, missing wipers are a lot more obvious than what kind and revision of a PCI card is buried in a cardcage :-) A better analogy is a exercise left to the reader.
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Simh mailing list [email protected] http://mailman.trailing-edge.com/mailman/listinfo/simh
