On 10/5/19 3:35 PM, Hal Murray wrote:

jim...@earthlink.net said:
There is *great* resistance to changing any assembly and workmanship
standard - nobody wants to be the person who says "we don't need to do
*that* anymore" and then a disaster happens, and one of the potential  causes
is "you didn't do *that*"

It is entirely possible that the original rationale and explanation is  no
longer valid.

There is also a risk of troubles because you are still doing *that*.

Do the people who maintain the rules occasionally look around to see if a
better way has been developed?


TL,DR: Yes, but..


Sore point there - since my job these days is managing what are called Risk Class D missions, for which some of the (perceived) risk is that you don't have to follow all the process that is typical for Class A, B, and C missions. And I've had in flight failures where the spacecraft was lost (Ouch!) There's the question of "should we have followed some process that we didn't follow" - The idea is that process is expensive, and knowing acceptance of risk allows you to do things you could not otherwise do.


NASA divides missions into risk classes (NPR 8705.4), in terms of the "consequences of failure" or "national significance" or "difficulty of reflight" or cost ranging from A down to D. Class A is human or multibillion flagship; Class B is things like Mars rovers; Class C is "less than two year missions that cost less than $100M" kind of thing; Class D is "ok if it fails".

There is an enormous amount of "standard practice" for NASA missions - often derived from long experience, or, perhaps, from some "bad day" and a process/rule gets created that says "we're not going to do that again".

It's important to know that NASA, in general, does not do "reliability calculations" in a MIL-HDBK-217 way - there's no stacking up of individual part reliabilities to get an estimated system MTBF. This is historical - NASA typically builds "just one unit" (maybe 2 or 3) - so there's no chance to do life testing and build up statistics. I think (Jim's opinion) that when they started coming up with process, the part/assembly reliability data had huge variances, so the resulting MTBF predictions spanned a wide range, or worse yet, said "failure is certain". There's also a problem that parts reliability probably isn't the dominant factor for reliability - it's design (is that wire under tension causing it to break with thermal cycles) or workmanship (not in a good/bad sense, but a variability sense).

So there are tons of process to try and drive the variability of workmanship down - You don't just tighten a fastener, you torque it to a specified level, determined (in theory) by the design loads, etc.; and someone witnesses the torquing to make sure someone didn't forget to install the bolts. Mistakes happen - the system tends to get paperwork heavy - and disasters have occurred because someone ignored the evidence of their hands/eyes and trusted the paper - NOAA N-prime is the case in point. Interestingly, these are called "process escapes" - and there's a huge amount of work (multiple work years even for things where nothing bad happened) that goes into determining why someone did something that's outside the process - was it just a bad day? is the process itself inconvenient or incomplete, etc. There is an intense amount of contemplation on changing the process - typically it was created because of a single bad event (NASA just doesn't build that many things), it addressed the causes of that event and appears to be a "good idea" for the future. It then becomes part of the "received wisdom of the ages" and everyone does it - until some event triggers a reevaluations.

In general, the system is set up so that it's easier to just "do the standard thing" than to get a waiver to not do it. Getting the waiver typically requires that you *prove* in some sense that it won't increase risk, or that you've somehow backed yourself into a corner and there's no way to get the job done without it. The latter is the "willing acceptance of risk" and there's a lot of people who have to sign off on it - The NASA administrator does NOT want to sit in front of Congress explaining why a $500M mission was lost because a waiver was issued to not do something. "You mean, sir, that we saved a few hours labor and it cost us $500M?" You don't get to say "There were 10,000 things, each that are individually a good idea, but if we did them all, the mission would have cost $1B, and you only gave us $500M"

For a Class D mission, there is a formal process (at JPL, anyway) where you go through the roughly 700 "Design Principles" and "Flight Project Practices" and identify which ones you will comply with, which you won't, and which are "comply with intent, but adjusted for this mission". The DP and FPP are high level documents that describe "stuff you should do" - things like "you should have no more than 30% CPU loading at PDR", "You shouldn't discharge the batteries more than X%".

The result of this process (which takes a few months) is a list of blanket waivers - for instance, maybe you don't need to have independent people do a worst case analysis or parts stress analysis of all your circuits - you trust in the experience of the engineer doing the design, and they do some informal analysis (a spreadsheet of voltage rating vs voltage it sees in the circuit). A big one is getting waivers to not have inspection and test at ALL levels of integration - you can assemble the whole thing, test it as a whole, at the risk of discovering a problem late in the project. For instance, you assume all the transistors are good from the mfr, and that the board is correctly assembled by the automated fab, so you don't need electrical test. You plug the board in, and if the system doesn't work, you have a spare you can swap in. On the other hand, if it takes 6 months to dismantle the spacecraft and extract the failed board, you probably won't get the waiver. My spacecraft were easy - you could assemble or disassemble them into their component assemblies in less than a day - so it wasn't schedule risk, it was "will we break something by handling it" - little teeny connectors are fragile.

Ultimately, whether your mission succeeds or fails, but especially if it fails, we go back and look at all those waivers and over a period of years, we decide, hmm, maybe we should change that because technology has changed. Each time someone goes through the Class D process, the FPP and DP get looked at, and if everyone is getting exempted from some requirement, and has good reasons, then there's a rule change. But it's slow.

And where there is a problem, maybe a new rule will be created - with the large number of SmallSats (cubesats and slightly larger) being done these days, you wind up with physical properties that are outside the "traditional" experience. A 10 foot long flexible antenna sticking out of a 1000kg spacecraft is mechanically very different from that same antenna sticking out of a 5kg spacecraft.

And there will need to be new processes to deal with swarms and massive constellations - NASA is used to flying one spacecraft, maybe 2 (MER) - if there's a failure, it's a big deal. You convene a Failure Review Board (FRB), you identify Corrective Actions, etc. If you fly 100 spacecraft to perform a function, and one fails, and the function is still performed, meeting all requirements, is it a big deal? Maybe it's just that the spacecraft have 90% reliability, and you planned for that by launching 100 when you need 50 to make your measurement. Are you going to convene a FRB for each failure? Or are you going to say - oh yeah, that is an expected failure mode, we know it's random and not a common design flaw among all 100, move on.

With a move to "statistics" instead of "build it perfect" - there will be process changes - but there will need to be test data to back up the statistics.

_______________________________________________
time-nuts mailing list -- time-nuts@lists.febo.com
To unsubscribe, go to 
http://lists.febo.com/mailman/listinfo/time-nuts_lists.febo.com
and follow the instructions there.

Reply via email to