Re: [swift-evolution] Preconditions aborting process in server scenarios [was: Throws? and throws!]

John McCall via swift-evolution Wed, 18 Jan 2017 10:48:26 -0800

> On Jan 18, 2017, at 9:06 AM, Joe Groff via swift-evolution 
> <swift-evolution@swift.org> wrote:
>> On Jan 18, 2017, at 12:04 AM, Rien via swift-evolution 
>> <swift-evolution@swift.org> wrote:
>> 
>>> 
>>> On 18 Jan 2017, at 08:54, Jonathan Hull via swift-evolution 
>>> <swift-evolution@swift.org> wrote:
>>> 
>>> 
>>>> On Jan 17, 2017, at 7:13 PM, Dave Abrahams <dabrah...@apple.com> wrote:
>>>> 
>>>> 
>>>> on Tue Jan 17 2017, Jonathan Hull <jhull-AT-gbis.com> wrote:
>>>> 
>>>>> Bringing it back towards the initial post, what if there was a
>>>>> separation from true needs-to-take-down-the-entire-system trapping and
>>>>> things like out-of-bounds and overflow errors which could stop at
>>>>> thread/actor bounds (or in some cases even be recovered)?
>>>>> 
>>>>> The latter were the ones I was targeting with my proposal.  They live
>>>>> in this grey area, because honestly, they should be throwing errors if
>>>>> not for the performance overhead and usability issues.  
>>>> 
>>>> I fundamentally disagree with that statement.  There is value in
>>>> declaring certain program behaviors illegal, and in general for things
>>>> like out-of-bounds access and overflow no sensible recovery (where
>>>> “recovery” means something that would allow the program to continue
>>>> reliably) is possible.  
>>> 
>>> I think we do fundamentally disagree.  I know I come from a very different 
>>> background (Human-Computer Interaction & Human Factors) than most people 
>>> here, and I am kind of the odd man out, but I have never understood this 
>>> viewpoint for anything but the most severe cases where the system itself is 
>>> in danger of being compromised (certainly not for an index out of bounds).  
>>> In my mind “fail fast” is great for iterating in development builds, but 
>>> once you are deploying, the user’s needs should come ahead of the 
>>> programmer’s.
>>> 
>>> Shouldn’t a system be as robust as possible
>> 
>> Yes
>> 
>>> and try to minimize the fallout from any failure point?
>> 
>> That is in direct conflict with the robustness
>> Once an error is detected that is not handled by the immediate code, it must 
>> be assumed that a worst-case scenario happened. And further damage to the 
>> user can only be prevent by bringing down the app. Even if that means losing 
>> all work in progress.
>> 
>> A compromised system must be prevent from accessing any resources. Once a 
>> system is compromised the risk to the user is *much* higher than simply 
>> loosing work in progress. He might loose his job, career, etc.
> 
> That's certainly true of code that makes unaudited use of `unsafe` constructs 
> that can violate safety without any checking. It's my hope that our normal 
> safety checks are  thorough and fire early enough that your subprocess would 
> crash before wide-system compromise happens. In an "actor" or similar model, 
> even if we decide we don't want to pay for unwinding to fully clean up after 
> the crashed actor, that crash could still at least be noted by a coordinator 
> actor, which in your server situation could handle the problem by not 
> accepting any new connections and letting its existing connections finish 
> before restarting the process, or in an iOS-like mobile situation could  
> trigger serialization of the current user state so that the process can be 
> transparently killed and restarted. In either situation, perhaps we'd want to 
> "taint" actors that use unsafe constructs so that their failure can't be 
> recovered at all.


This seems like basically the right approach to me.  It means we don't make any 
effort to "clean up" the failing actor — essentially, it's treated as if it 
were just deadlocked — which means we don't pay the pervasive code-size costs 
of unwinding.  That's even fairly likely to leave the process in a state that 
can still be usefully debugged (as opposed to unwinding stacks, which 
completely destroys the execution context).  But there's still an opportunity 
to react and try to wind up other tasks.

I'm not sure it makes any sense to call out actors that have used unsafe 
constructs as somehow specially unrecoverable.  If the concern is that the 
unsafe code may corrupt the other actors, well, that's true, but (1) that 
implies that you have to forbid recovery if *any* actor has used unsafe 
constructs, since low-level corruption can be passed between actors when they 
communicate normally, and (2) that's equally true of all sorts of high-level 
corruption that don't depend on unsafe constructs, and which the failing 
assertion may be the first indication of.

John.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] Preconditions aborting process in server scenarios [was: Throws? and throws!]

Reply via email to