Re: Incident report D-TRUST: syntax error in one tls certificate

2018-12-09 Thread westmail24--- via dev-security-policy
Hello, 
D-TRUST will removed in the future or is this the last Chinese warning? :)

Andrew.
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: CA disclosure of revocations that exceed 5 days [Was: Re: Incident report D-TRUST: syntax error in one tls certificate]

2018-12-05 Thread Eric Mill via dev-security-policy
On Wed, Dec 5, 2018 at 2:36 AM Fotis Loukos via dev-security-policy <
dev-security-policy@lists.mozilla.org> wrote:

> On 4/12/18 8:30 μ.μ., Ryan Sleevi via dev-security-policy wrote:
> > On Tue, Dec 4, 2018 at 5:02 AM Fotis Loukos <
> me+mozdevsecpol...@fotisl.com>
>
> As far as I can tell, if no quantifiers are used in a proposition
> written in the English language, then it is assumed to be a universal
> proposition. If it were particular, then sentences such as "numbers are
> bigger than 10" and "cars are blue" would be true, since there are some
> numbers bigger than 10 and there are some cars that are blue. My
> knowledge of the inner workings of the English grammar is not that good,
> but at least this is what applies in Greek and in cs/logic (check
> http://www.cs.colostate.edu/~cs122/.Fall14/tutorials/tut_2.php for
> example). If I am mistaken, then it was error on my side.
>

Formally, yes, but in practice, there is ambiguity. For example, you can
say "elderly people vote for X political party", and it doesn't have to
mean that 100.0% of elderly people vote for that party for that to be a
reasonably accurate statement, if by and large that population has a clear
trend.

That's not to agree or disagree with Ryan's statement, just noting that
people do necessarily have to characterize groups sometimes, and that any
characterization of a large enough group will usually not apply to all of
its members.

I know I personally belong to a number of demographic groups whose behavior
as a group doesn't match mine as an individual, and when people criticize
those demographic groups, I try not to take it as a personal attack.

-- Eric
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: CA disclosure of revocations that exceed 5 days [Was: Re: Incident report D-TRUST: syntax error in one tls certificate]

2018-12-05 Thread Wayne Thayer via dev-security-policy
On Wed, Dec 5, 2018 at 3:48 AM Dimitris Zacharopoulos via
dev-security-policy  wrote:

> On 5/12/2018 10:02 π.μ., Fotis Loukos wrote:
>
> > The proposal was apparently to further restrict the ability of CAs to
> > make exceptions on their own, by requiring all such exceptions to go
> > through the public forums where the root programs can challenge or even
> > deny a proposed exception, after hearing the case by case arguments for
> > why an exception should be granted.
> >
> > effectively 'legalizing' BR violations after browsers' concent (granting
> > an exception). Before two paragraphs you stated that you never proposed
> > making an extended revocation legal.
> >
> > Regards,
> > Fotis
>
> You missed one of Jakob's important point. This usually happens when you
> clip-paste specific sentences that change the meaning of a whole
> conversation.
>
> "
>
> But only if one ignores the
> reality that such exceptions currently happen with little or no
> oversight."
>
> I am particularly troubled by the proposal that exceptions be granted by
Mozilla as part of some recognized process. There is a huge difference
between this and the current process in which CAs may choose to take
exceptions as explicit violations. Even if the result is the same, granting
exceptions transfers the risk from the CA to Mozilla. We then are
responsible for assessing the potential impact, and if we get it wrong,
it's our fault. Please, let's not go there. As has been stated, if there is
really no risk to violating a requirement, then it's reasonable to make a
case for removing that requirement.

- Wayne
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Incident report D-TRUST: syntax error in one tls certificate

2018-12-05 Thread Jakob Bohm via dev-security-policy

On 05/12/2018 01:05, Nick Lamb wrote:
> On Tue, 4 Dec 2018 14:55:47 +0100
> Jakob Bohm via dev-security-policy
>  wrote:
> 
>> Oh, so you meant "CA issuance systems and protocols with explicit
>> automation features" (as opposed to e.g. web server systems or
>> operating systems or site specific subscriber automation systems).
>> That's why I asked.
> 
> Yes. These systems exist, have existed for some time, and indeed now
> appear to make up a majority of all issuance.
> 

I didn't doubt that automation systems exist, I was thoroughly confused 
when, a few messages back you wrote a reference to "these systems" 
without stating which systems.

>> And note that this situation started with an OV certificate, not a DV
>> certificate.  So more than domain ownership needs to be validated.
> 
> Fortunately it is neither necessary nor usual to insist upon fresh
> validations for Organisational details for each issuance. Cached
> validations can be re-used for a period specified in the BRs although
> in some cases a CA might chose tighter constraints.
> 

However an OV or EV issuance often involve substantially different 
choices for domain validation and especially for validating the CSR-to-
subscriber-identity relationship than the choices made for robotic DV 
issuance systems, even when the organizational identity validation is 
cached.  For example, I know of at least one CA where the process 
involves a subscriber representative signing a paper form with a 
printout of the CSR (as one of multiple steps).

>> You have shown that ONE system, which you happen to like, can avoid
>> that weakness, IF you ignore some other issues.  You have not shown
>> that requiring subscribers to do this for any and all combinations of
>> validation systems and TLS server systems they encounter won't have
>> this weakness.
> 
> Yes, an existence proof. Subscribers must of course choose trade-offs
> that they're comfortable with. That might mean accepting that your web
> site could become unavailable for a period of several days at short
> notice, or that you can't safely keep running Microsoft IIS 6.0 even
> though you'd prefer not to upgrade. What I want to make clear is that
> offering automation without write access to the private key is not only
> theoretically conceivable, it's actually easy enough that a bunch of
> third party clients do it today because it was simpler than whatever
> else they considered.

Existence proof is good for refuting a claim that something doesn't 
exist.  It does nothing to prove that it is the only good thing.

Nothing I wrote has any relationship to Microsoft software specifics 
(except for my brief reply to your own aside about another Microsoft 
technology).

You have yet to point out any non-ACME client that organizations can 
use to automate the renewal and replacement of OV and EV certificates 
without write access to the private key, thus I can not validate your 
claims that there are "a bunch of third party clients" doing that.
You have only made some claims about what would be theoretically 
possible for the ACME HTTP-01 protocol.

(You mention cPanel below, more there).

> 
>> I made no such claim.  I was saying that your hypothetical that
>> all/most validation systems have the properties of ACME and that
>> all/most TLS servers allow certificate replacement without access to
>> the private key storage represents an idealized scenario different
>> from practical reality.
> 
> Subscribers must choose for themselves, in particular it does not
> constitute an excuse as to why they need more time to react. Choices
> have consequences, if you choose a process you know can't be done in a
> timely fashion, it won't be done in a timely fashion and you'll go
> off-line.

The choice of validation protocol is one made by the CA, subscribers 
have little influence except where a CA happens to offer more than 
one validation method or where multiple CAs are otherwise equal in 
terms of the subscribers selection criteria.

Outside of the pressure this community makes on CAs, there is very 
little reason why subscribers should expect that CAs suddenly revoke 
their certificate for entirely CA-internal reasons.  Therefore it is 
unreasonable to expect the general population of site owning 
organizations to plan on the basis that this is a risk worth 
planning for.

> 
>> And the paragraph I quoted says to not do that unless you are using a
>> HSM, which very few subscribers do.
> 
> It says it only recommends doing this for a _renewal_ if you have an
> HSM. But a scheduled _renewal_ already provides sufficient notice for
> you to replace keys and make a fresh CSR at your leisure if you so
> choose. Which is why you were talking about unscheduled events.
> 
> If you have a different reference which says what you originally
> claimed, I await it.
> 

Now you are going off on a huge tangent about the detailed specifics 
of that particular document and its choice of words.  The document was 

Re: CA disclosure of revocations that exceed 5 days [Was: Re: Incident report D-TRUST: syntax error in one tls certificate]

2018-12-05 Thread Dimitris Zacharopoulos via dev-security-policy

On 5/12/2018 10:02 π.μ., Fotis Loukos wrote:

On 4/12/18 8:29 μ.μ., Dimitris Zacharopoulos via dev-security-policy wrote:

Fotis,

You have quoted only one part of my message which doesn't capture the
entire concept.

I would appreciate it if you mentioned how exactly did I distort your
proposal and which parts that change the meaning of what I said did I miss.


I never claimed that you "distorted" my proposal. I said that it didn't 
capture the entire concept.




CAs that mis-issue and must revoke these mis-issued certificates,
already violated the BRs. Delaying revocation for more than what the BRs
require, is also a violation. There was never doubt about that. I never
proposed that "extended revocation" would somehow "not be considered a
BR violation" or "make it legal".

You explicitly mentioned that there were voices during the SC6 ballot
discussion that wanted to extend the 5 days to something more (*extend*
the 5 days), as you also explicitly mentioned that this is not a
theoretical discussion.


This was mentioned in the context of a very long thread and you have 
taken a piece of it which changes the meaning of the entire concept. I 
explained what the entire concept was. Jakob summarized the proposal 
correctly.



I tried to highlight in this discussion that there were real cases in
m.d.s.p. where the revocation was delayed in practice. However, the
circumstances of these extended revocations remain unclear. Yet, the
community didn't ask for more details. Seeing this repeated, was the
reason I suggested that more disclosure is necessary for CAs that
require more time to revoke than the BRs require. At the very minimum,
it would help the community understand in more detail the circumstances
why a CA asks for more time to revoke.

I refer you to Ryan's email. Do you really believe that this is
something not expected from CAs?


I think Jakob make an accurate summary.

You contradict what you said before 2 paragraphs. Jakob explicitly
mentioned:

The proposal was apparently to further restrict the ability of CAs to
make exceptions on their own, by requiring all such exceptions to go
through the public forums where the root programs can challenge or even
deny a proposed exception, after hearing the case by case arguments for
why an exception should be granted.

effectively 'legalizing' BR violations after browsers' concent (granting
an exception). Before two paragraphs you stated that you never proposed
making an extended revocation legal.

Regards,
Fotis


You missed one of Jakob's important point. This usually happens when you 
clip-paste specific sentences that change the meaning of a whole 
conversation.


"

But only if one ignores the
reality that such exceptions currently happen with little or no
oversight."


My previous response to you tries to re-summarize the concept in a more 
accurate way. Please use that if you want to refer to the concept of my 
proposal and not particular pieces from a huge thread.



Dimitris.




Dimitris.



On 4/12/2018 8:00 μ.μ., Fotis Loukos via dev-security-policy wrote:

Hello,

On 4/12/18 4:30 μ.μ., Jakob Bohm via dev-security-policy wrote:

Hello to you too.

It seems that you are both misunderstanding what the proposal was.

The proposal was apparently to further restrict the ability of CAs to
make exceptions on their own, by requiring all such exceptions to go
through the public forums where the root programs can challenge or even
deny a proposed exception, after hearing the case by case arguments for
why an exception should be granted.


Can you please point me to the exact place where this is mentioned?

The initial proposal is the following:

Mandating that CAs disclose revocation situations that exceed the 5-day
requirement with some risk analysis information, might be a good place
to start.

I see nothing related to public discussion and root programs challenging
or denying the proposed exception.

In a follow-up email, Dimitris mentions the following:

The reason for requiring disclosure is meant as a first step for
understanding what's happening in reality and collect some meaningful
data by policy. [...] If, for example, m.d.s.p. receives 10 or 20
revocation exception cases within a 12-month period and none of them is
convincing to the community and module owners to justify the exception,
the policy can be updated with clear rules about the risk of distrust if
the revocation doesn't happen within 5 days.

In this proposal it is clear that the CA will *disclose* and not ask for
permission for extending the 24h/5 day period, and furthermore he
accepts the fact that these exceptions may not be later accepted by the
community, which may lead to changing the policy.



A better example would be that if someone broke their leg for some
reason, and therefore wants to delay payment of a debt by a short while,
they should be able to ask for it, and the request would be considered
on its merits, not based on a hard-nosed principle of never granting any

Re: CA disclosure of revocations that exceed 5 days [Was: Re: Incident report D-TRUST: syntax error in one tls certificate]

2018-12-05 Thread Fotis Loukos via dev-security-policy
On 4/12/18 8:29 μ.μ., Dimitris Zacharopoulos via dev-security-policy wrote:
> Fotis,
> 
> You have quoted only one part of my message which doesn't capture the
> entire concept.

I would appreciate it if you mentioned how exactly did I distort your
proposal and which parts that change the meaning of what I said did I miss.

> 
> CAs that mis-issue and must revoke these mis-issued certificates,
> already violated the BRs. Delaying revocation for more than what the BRs
> require, is also a violation. There was never doubt about that. I never
> proposed that "extended revocation" would somehow "not be considered a
> BR violation" or "make it legal".

You explicitly mentioned that there were voices during the SC6 ballot
discussion that wanted to extend the 5 days to something more (*extend*
the 5 days), as you also explicitly mentioned that this is not a
theoretical discussion.

> 
> I tried to highlight in this discussion that there were real cases in
> m.d.s.p. where the revocation was delayed in practice. However, the
> circumstances of these extended revocations remain unclear. Yet, the
> community didn't ask for more details. Seeing this repeated, was the
> reason I suggested that more disclosure is necessary for CAs that
> require more time to revoke than the BRs require. At the very minimum,
> it would help the community understand in more detail the circumstances
> why a CA asks for more time to revoke.

I refer you to Ryan's email. Do you really believe that this is
something not expected from CAs?

> 
> I think Jakob make an accurate summary.

You contradict what you said before 2 paragraphs. Jakob explicitly
mentioned:

The proposal was apparently to further restrict the ability of CAs to
make exceptions on their own, by requiring all such exceptions to go
through the public forums where the root programs can challenge or even
deny a proposed exception, after hearing the case by case arguments for
why an exception should be granted.

effectively 'legalizing' BR violations after browsers' concent (granting
an exception). Before two paragraphs you stated that you never proposed
making an extended revocation legal.

Regards,
Fotis

> 
> 
> Dimitris.
> 
> 
> 
> On 4/12/2018 8:00 μ.μ., Fotis Loukos via dev-security-policy wrote:
>> Hello,
>>
>> On 4/12/18 4:30 μ.μ., Jakob Bohm via dev-security-policy wrote:
>>> Hello to you too.
>>>
>>> It seems that you are both misunderstanding what the proposal was.
>>>
>>> The proposal was apparently to further restrict the ability of CAs to
>>> make exceptions on their own, by requiring all such exceptions to go
>>> through the public forums where the root programs can challenge or even
>>> deny a proposed exception, after hearing the case by case arguments for
>>> why an exception should be granted.
>>>
>> Can you please point me to the exact place where this is mentioned?
>>
>> The initial proposal is the following:
>>
>> Mandating that CAs disclose revocation situations that exceed the 5-day
>> requirement with some risk analysis information, might be a good place
>> to start.
>>
>> I see nothing related to public discussion and root programs challenging
>> or denying the proposed exception.
>>
>> In a follow-up email, Dimitris mentions the following:
>>
>> The reason for requiring disclosure is meant as a first step for
>> understanding what's happening in reality and collect some meaningful
>> data by policy. [...] If, for example, m.d.s.p. receives 10 or 20
>> revocation exception cases within a 12-month period and none of them is
>> convincing to the community and module owners to justify the exception,
>> the policy can be updated with clear rules about the risk of distrust if
>> the revocation doesn't happen within 5 days.
>>
>> In this proposal it is clear that the CA will *disclose* and not ask for
>> permission for extending the 24h/5 day period, and furthermore he
>> accepts the fact that these exceptions may not be later accepted by the
>> community, which may lead to changing the policy.
>>
>>
>>> A better example would be that if someone broke their leg for some
>>> reason, and therefore wants to delay payment of a debt by a short while,
>>> they should be able to ask for it, and the request would be considered
>>> on its merits, not based on a hard-nosed principle of never granting any
>>> extensions.
>> I think that the proper analogy is if someone broke their leg, and
>> therefore wants to delay payment of a bank debt, he should be able to
>> delay it without notifying the bank in time, but after he has decided
>> that he is fine and he can walk, he can go to the bank and explain them
>> why he delayed the payment. I do not consider this a good practice.
>>
>>> Now because CAs making exceptions can be technically considered against
>>> the letter of the BRs, specifying how exceptions should be reviewed
>>> would constitute an admission by the community that exceptions might be
>>> ok in some cases.  Thus from a purely legalistic perspective it 

Re: Incident report D-TRUST: syntax error in one tls certificate

2018-12-04 Thread Nick Lamb via dev-security-policy
On Tue, 4 Dec 2018 14:55:47 +0100
Jakob Bohm via dev-security-policy
 wrote:

> Oh, so you meant "CA issuance systems and protocols with explicit
> automation features" (as opposed to e.g. web server systems or
> operating systems or site specific subscriber automation systems).
> That's why I asked.

Yes. These systems exist, have existed for some time, and indeed now
appear to make up a majority of all issuance.

> And note that this situation started with an OV certificate, not a DV
> certificate.  So more than domain ownership needs to be validated.

Fortunately it is neither necessary nor usual to insist upon fresh
validations for Organisational details for each issuance. Cached
validations can be re-used for a period specified in the BRs although
in some cases a CA might chose tighter constraints.

> You have shown that ONE system, which you happen to like, can avoid
> that weakness, IF you ignore some other issues.  You have not shown
> that requiring subscribers to do this for any and all combinations of
> validation systems and TLS server systems they encounter won't have
> this weakness.

Yes, an existence proof. Subscribers must of course choose trade-offs
that they're comfortable with. That might mean accepting that your web
site could become unavailable for a period of several days at short
notice, or that you can't safely keep running Microsoft IIS 6.0 even
though you'd prefer not to upgrade. What I want to make clear is that
offering automation without write access to the private key is not only
theoretically conceivable, it's actually easy enough that a bunch of
third party clients do it today because it was simpler than whatever
else they considered.

> I made no such claim.  I was saying that your hypothetical that
> all/most validation systems have the properties of ACME and that
> all/most TLS servers allow certificate replacement without access to
> the private key storage represents an idealized scenario different
> from practical reality.

Subscribers must choose for themselves, in particular it does not
constitute an excuse as to why they need more time to react. Choices
have consequences, if you choose a process you know can't be done in a
timely fashion, it won't be done in a timely fashion and you'll go
off-line.

> And the paragraph I quoted says to not do that unless you are using a
> HSM, which very few subscribers do.

It says it only recommends doing this for a _renewal_ if you have an
HSM. But a scheduled _renewal_ already provides sufficient notice for
you to replace keys and make a fresh CSR at your leisure if you so
choose. Which is why you were talking about unscheduled events.

If you have a different reference which says what you originally
claimed, I await it.

> It is not a convenience of scheduling.  It is a security best
> practice, called out (as the first example found) in that particular
> NIST document.

If that was indeed their claimed security best practice the NIST
document would say you must replace keys every time you replace
certificates, for which it would need some sort of justification, and
there isn't one. But it doesn't - it recommends you _renew_ once per
year‡, and that you should change keys when you _renew_, which is to
say, once per year.

‡ Technically this document is written to be copy-pasted into a three
ring binder for an organisation, so you can just write in some other
amount of time instead of . As with other documents of
this sort it will not achieve anything on its own.

> Which has absolutely no bearing on the rule that keys stored outside
> an HSM should (as a best practice) be changed on every reissue.  It
> would be contradictory if part B says not to reuse keys, and part C
> then prescribes an automation method violating that.

There is no such rule listed in that NIST document. The rule you've
cited talks about renewals, but a reissue is not a renewal. There was
nothing wrong with the expiry date for the certificate, that's not why
it was replaced.

There are however several recommendations which contradict this idea
that it's OK to have processes which take weeks to act, such as:

"System owners MUST maintain the ability to replace all certificates on
their systems within <2> days to respond to security incidents"

"Private keys, and the associated certificates, that have the
capability of being directly accessed by an administrator MUST be
replaced within <30> days of reassignment or <5> days of termination of
that administrator"


The NIST document also makes many other recommendations that - like the
one year limit - won't be followed by most real organisations; such as a
requirement to add CAA records, to revoke all their old certificates
a short time after they're replaced, the insistence on automation for
adding keys to "SSL inspection" type capabilities or the prohibition of
all wildcards.

> So it is real.

Oh yes, doing things that are a bad idea is very real. That is, after
all, why we're discussing this at all.

> - 

Re: CA disclosure of revocations that exceed 5 days [Was: Re: Incident report D-TRUST: syntax error in one tls certificate]

2018-12-04 Thread Ryan Sleevi via dev-security-policy
On Tue, Dec 4, 2018 at 1:29 PM Dimitris Zacharopoulos via
dev-security-policy  wrote:

> I tried to highlight in this discussion that there were real cases in
> m.d.s.p. where the revocation was delayed in practice. However, the
> circumstances of these extended revocations remain unclear. Yet, the
> community didn't ask for more details.


The expectation is that there will already be a discussion about this. At
the worst case, this discussion will be delayed until the audit
qualifications come in - the absence of audit qualifications in such
situations would be incredibly damning. It sounds like you believe this is
not, in fact, a requirement today, and it may be possible to clarify that
already.

Do you think the language in
https://wiki.mozilla.org/CA/Responding_To_An_Incident is sufficient, or do
you feel it's ambiguous as to whether or not a failure to abide by the BRs
constitutes "an incident"?

As to the second half, the community not asking for details, as a member of
this community, you can and should feel empowered to ask the details you
feel are relevant. Do you believe that something about the handling of this
makes it inappropriate for you to ask questions you believe are relevant?


> Seeing this repeated, was the
> reason I suggested that more disclosure is necessary for CAs that
> require more time to revoke than the BRs require.


It's not at all clear how this result is linked to the remarks you make
above. Above, your remark seems to focus on CAs not disclosing in a timely
fashion, nor disclosing the circumstances. The former is a violation of the
existing requirements, the latter is something you can and should inquire
on if you feel is relevant. It's unclear what is "more" about the existing
disclosure, and certainly, the framing used in this statement implies that
the issue is time, but seemingly acknowledges we don't have data to support
that.


> At the very minimum,
> it would help the community understand in more detail the circumstances
> why a CA asks for more time to revoke.
>

I think there's an equally flawed assumption here - which is that CAs
should be asking for exceptions to policies. I don't think this is at all a
reasonable model - and the one time it did happen (with respect to SHA-1)
was one that caused a lot of pain and harm overall. I think it should be
uncontroversial to suggest that "exceptions" don't exempt the need from
qualifications - certainly, neither the professional standards behind the
ETSI audit criteria nor the standards behind the WebTrust would allow a CA
to argue an event is not a qualification solely because Mozilla "granted an
exception".

Instead, the concept of "exceptions" is one of asking the community whether
or not they will agree to ignore, apriori, a matter of non-compliance. In a
world without "exceptions", the CA will take the qualification, and will
need to disclose (as part of an Incident Report and, later, the audit
report) the nature behind the incident, the facts, and those details. In
determining ongoing trust, the community will take a holistic look at the
incidents and qualifications, whether sufficient detail was presented, and
what the patterns and issues are.

This is a healthy system, whereas introducing "exceptions" and agreement, a
priori, to exclude certain facts from consideration is not. For one, it
prevents the determination and establishment of patterns - granting
exceptions as "one-offs" can (and demonstrably does) lead to patterns of
misissuance, and asking the community to overlook those patterns because it
agreed to overlook the specific events is very much an unreasonable, and
harmful, request. This is similar to the harm of creating "tiers" of
misissuance, as both acts seek to legitimize some forms of non-compliance,
without concrete data, which then collectively erodes the very notion of
compliance to begin with.

Thus, if we disabuse the notion that some CAs have, or worse, have promoted
to their subscribers - that browsers can, do, and will grant promises to
overlook certain areas of non-compliance - then the proposal itself goes
away. That's because the existing mechanisms - for disclosure and detail
gathering - function, and the community can and will consider those facts
when holistically considering the CA. It may be that some forms of
misissuance are so egregious that no CA should ever attempt (e.g. granting
an unconstrained CA), and it may be that other forms are considered
holistically as part of patterns, but the CA is ultimately going to be
gambling, and that's all the more reason that a CA shouldn't violate in the
first place.

If (some) CAs do feel the requirements are overly burdensome, then
proposing changes is not unreasonble - but it MUST be accompanied with
concrete and meaningful data. Absent that, it leads to the harmful problems
I discuss above, and thus is not worth the time spent or electrons wasted
on the discussion. However, if (most) CAs systemically provide data, then
we can have 

Re: CA disclosure of revocations that exceed 5 days [Was: Re: Incident report D-TRUST: syntax error in one tls certificate]

2018-12-04 Thread Ryan Sleevi via dev-security-policy
On Tue, Dec 4, 2018 at 5:02 AM Fotis Loukos 
wrote:

> An initial comment is that statements such as "I disagree that CAs are
> "doing their best" to comply with the rules." because some CAs are
> indeed not doing their best is simply a fallacy in Ryan's argumentation,
> the fallacy of composition. Dimitris does not represent all CAs, and I'm
> pretty sure that you are aware of this Ryan. Generalizations and the
> distinction of two teams, our team (the browsers) and their team (the
> CAs), where by default our team are the good guys and their team are
> malicious is plain demagoguery. Since you like extreme examples, please
> note that generalizations (we don't like a member of a demographic thus
> all people from that demographic are bad) have lead humanity to
> committing atrocities, let's not go down that road, especially since I
> know you Ryan and you're definitely not that type of person.


I appreciate you breaking this down. I think it's important to respond to
the remark, because there is a substantive bit of this criticism that I
think meaningfully affects this conversation, and it's worth diving into.

Broadly speaking, it seems the interpretation of the first remark 'CAs are
"doing their best"' can be interpreted as "(Some) CAs are doing their best"
or "(All) CAs are doing their best". You rightfully point out that Dimitris
does not represent all CAs, but that lack of representation can't be
assumed to mean the statement could not possibly be meant as all CAs - that
could have been the intent, and is a valid interpretation. Similarly, in
the criticism, it seems the interpretation for 'I disagree that CAs are
"doing their best"' can be interpreted as "I disagree that (some) CAs are
doing their best", "I disagree that (all) CAs are doing their best", or "I
disagree that (any) CAs are doing their best".

While I doubt that any of these interpretations are likely to be seen as
supporting genocide, they do underscore an issue: Ambiguity about whether
we're talking about some CAs or all CAs. When we speak about policy
requirements, whether in the CA/Browser Forum or here, it's necessary in
the framing to consider all CAs in aggregate. Dimitris proposed a
distinction between "good" CAs and "bad" CAs, on the basis that flexibility
is needed for "good" CAs, while my counter-argument is that such
flexibility is easily abused by "bad" CAs, and when "bad" CAs are the
majority, there's no longer the distinction between "good" and "bad".
Policies that propose ambiguity, flexibility and trust, whether through
validation methods or revocation decisions, fundamentally rest on the
assumption that all entities with that flexibility will use the flexibility
"correctly." Codifying what that means removes the flexibility, and thus is
incompatible with flexibility - so if there exists the possibility of
abuse, it has to be dealt with by avoiding ambiguity and flexibility, and
removing trust where it's "misused".

This isn't a fallacy of composition - it's the fundamental risk assessment
that others on this thread have proposed. The risk of a single bad CA
spoiling the bunch, as it were, which is absolutely the case in a public
trust ecosystem, is such that it cannot afford considerations of
flexibility for the 'good' CAs. It's equally telling that the distinction
between 'bad' CAs and 'good CAs' are "Those that are not following the
rules" vs "Those that are", rather than the far more desirable "Those that
are doing the bare minimum required of the rules" and "Those that are going
above and beyond". If it truly was that latter case, one could imagine more
flexibility being possible, but when we're at a state where there are
literally CAs routinely failing to abide by the core minimum, then it's
necessary and critical to consider in any conversation that is granting
more trust to consider what "all CAs" when we talk about what "CAs are
doing", just like we already assume that negative discussions and removing
trust necessarily begin about "some CAs" when we talk about what "CAs are
doing".
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: CA disclosure of revocations that exceed 5 days [Was: Re: Incident report D-TRUST: syntax error in one tls certificate]

2018-12-04 Thread Dimitris Zacharopoulos via dev-security-policy

Fotis,

You have quoted only one part of my message which doesn't capture the 
entire concept.


CAs that mis-issue and must revoke these mis-issued certificates, 
already violated the BRs. Delaying revocation for more than what the BRs 
require, is also a violation. There was never doubt about that. I never 
proposed that "extended revocation" would somehow "not be considered a 
BR violation" or "make it legal".


I tried to highlight in this discussion that there were real cases in 
m.d.s.p. where the revocation was delayed in practice. However, the 
circumstances of these extended revocations remain unclear. Yet, the 
community didn't ask for more details. Seeing this repeated, was the 
reason I suggested that more disclosure is necessary for CAs that 
require more time to revoke than the BRs require. At the very minimum, 
it would help the community understand in more detail the circumstances 
why a CA asks for more time to revoke.


I think Jakob make an accurate summary.


Dimitris.



On 4/12/2018 8:00 μ.μ., Fotis Loukos via dev-security-policy wrote:

Hello,

On 4/12/18 4:30 μ.μ., Jakob Bohm via dev-security-policy wrote:

Hello to you too.

It seems that you are both misunderstanding what the proposal was.

The proposal was apparently to further restrict the ability of CAs to
make exceptions on their own, by requiring all such exceptions to go
through the public forums where the root programs can challenge or even
deny a proposed exception, after hearing the case by case arguments for
why an exception should be granted.


Can you please point me to the exact place where this is mentioned?

The initial proposal is the following:

Mandating that CAs disclose revocation situations that exceed the 5-day
requirement with some risk analysis information, might be a good place
to start.

I see nothing related to public discussion and root programs challenging
or denying the proposed exception.

In a follow-up email, Dimitris mentions the following:

The reason for requiring disclosure is meant as a first step for
understanding what's happening in reality and collect some meaningful
data by policy. [...] If, for example, m.d.s.p. receives 10 or 20
revocation exception cases within a 12-month period and none of them is
convincing to the community and module owners to justify the exception,
the policy can be updated with clear rules about the risk of distrust if
the revocation doesn't happen within 5 days.

In this proposal it is clear that the CA will *disclose* and not ask for
permission for extending the 24h/5 day period, and furthermore he
accepts the fact that these exceptions may not be later accepted by the
community, which may lead to changing the policy.



A better example would be that if someone broke their leg for some
reason, and therefore wants to delay payment of a debt by a short while,
they should be able to ask for it, and the request would be considered
on its merits, not based on a hard-nosed principle of never granting any
extensions.

I think that the proper analogy is if someone broke their leg, and
therefore wants to delay payment of a bank debt, he should be able to
delay it without notifying the bank in time, but after he has decided
that he is fine and he can walk, he can go to the bank and explain them
why he delayed the payment. I do not consider this a good practice.


Now because CAs making exceptions can be technically considered against
the letter of the BRs, specifying how exceptions should be reviewed
would constitute an admission by the community that exceptions might be
ok in some cases.  Thus from a purely legalistic perspective it would
constitute a weakening of the rules.  But only if one ignores the
reality that such exceptions currently happen with little or no
oversight.

Please see above, there is no review in the original proposal.


As for doing risk assessments and reporting, no deep thinking and no
special logging of considerations is needed when revoking as quickly
as possible, well within the current 24 hour and 5 day deadlines (as
applicable), which hopefully constitutes the vast majority of revocations.

So, is deep thinking needed in the rest of the cases? If yes, how do you
think that a CA will be able to do this risk assessment and how can root
store operators decide on this within 24h in order to extend this
period? If no, would you trust such a risk assessment?

Regards,
Fotis



On 04/12/2018 11:02, Fotis Loukos wrote:

Hello everybody,
First of all, I would like to note that I am writing as an individual
and my opinion does not necessarily represent the opinion of my employer.

An initial comment is that statements such as "I disagree that CAs are
"doing their best" to comply with the rules." because some CAs are
indeed not doing their best is simply a fallacy in Ryan's argumentation,
the fallacy of composition. Dimitris does not represent all CAs, and I'm
pretty sure that you are aware of this Ryan. Generalizations and the
distinction of two 

Re: CA disclosure of revocations that exceed 5 days [Was: Re: Incident report D-TRUST: syntax error in one tls certificate]

2018-12-04 Thread Fotis Loukos via dev-security-policy
Hello,

On 4/12/18 4:30 μ.μ., Jakob Bohm via dev-security-policy wrote:
> Hello to you too.
> 
> It seems that you are both misunderstanding what the proposal was.
> 
> The proposal was apparently to further restrict the ability of CAs to 
> make exceptions on their own, by requiring all such exceptions to go 
> through the public forums where the root programs can challenge or even 
> deny a proposed exception, after hearing the case by case arguments for 
> why an exception should be granted.
> 

Can you please point me to the exact place where this is mentioned?

The initial proposal is the following:

Mandating that CAs disclose revocation situations that exceed the 5-day
requirement with some risk analysis information, might be a good place
to start.

I see nothing related to public discussion and root programs challenging
or denying the proposed exception.

In a follow-up email, Dimitris mentions the following:

The reason for requiring disclosure is meant as a first step for
understanding what's happening in reality and collect some meaningful
data by policy. [...] If, for example, m.d.s.p. receives 10 or 20
revocation exception cases within a 12-month period and none of them is
convincing to the community and module owners to justify the exception,
the policy can be updated with clear rules about the risk of distrust if
the revocation doesn't happen within 5 days.

In this proposal it is clear that the CA will *disclose* and not ask for
permission for extending the 24h/5 day period, and furthermore he
accepts the fact that these exceptions may not be later accepted by the
community, which may lead to changing the policy.


> A better example would be that if someone broke their leg for some 
> reason, and therefore wants to delay payment of a debt by a short while, 
> they should be able to ask for it, and the request would be considered 
> on its merits, not based on a hard-nosed principle of never granting any 
> extensions.

I think that the proper analogy is if someone broke their leg, and
therefore wants to delay payment of a bank debt, he should be able to
delay it without notifying the bank in time, but after he has decided
that he is fine and he can walk, he can go to the bank and explain them
why he delayed the payment. I do not consider this a good practice.

> 
> Now because CAs making exceptions can be technically considered against 
> the letter of the BRs, specifying how exceptions should be reviewed 
> would constitute an admission by the community that exceptions might be 
> ok in some cases.  Thus from a purely legalistic perspective it would 
> constitute a weakening of the rules.  But only if one ignores the 
> reality that such exceptions currently happen with little or no 
> oversight.

Please see above, there is no review in the original proposal.

> 
> As for doing risk assessments and reporting, no deep thinking and no 
> special logging of considerations is needed when revoking as quickly 
> as possible, well within the current 24 hour and 5 day deadlines (as 
> applicable), which hopefully constitutes the vast majority of revocations.

So, is deep thinking needed in the rest of the cases? If yes, how do you
think that a CA will be able to do this risk assessment and how can root
store operators decide on this within 24h in order to extend this
period? If no, would you trust such a risk assessment?

Regards,
Fotis

> 
> 
> On 04/12/2018 11:02, Fotis Loukos wrote:
>> Hello everybody,
>> First of all, I would like to note that I am writing as an individual
>> and my opinion does not necessarily represent the opinion of my employer.
>>
>> An initial comment is that statements such as "I disagree that CAs are
>> "doing their best" to comply with the rules." because some CAs are
>> indeed not doing their best is simply a fallacy in Ryan's argumentation,
>> the fallacy of composition. Dimitris does not represent all CAs, and I'm
>> pretty sure that you are aware of this Ryan. Generalizations and the
>> distinction of two teams, our team (the browsers) and their team (the
>> CAs), where by default our team are the good guys and their team are
>> malicious is plain demagoguery. Since you like extreme examples, please
>> note that generalizations (we don't like a member of a demographic thus
>> all people from that demographic are bad) have lead humanity to
>> committing atrocities, let's not go down that road, especially since I
>> know you Ryan and you're definitely not that type of person.
>>
>> I believe that the arguments presented by Dimitris are simply red
>> herring. Whether there is a blackout period, the CA lost internet
>> connectivity or a 65 character OU does not pose a risk to relying
>> parties is a form of ignoratio elenchi, a fallacy identified even by
>> Aristotle thousands of years ago. Using the same deductive reasoning,
>> someone could argue that if a person was scammed in participating in a
>> ponzi scheme and lost all his fortune, he can steal someone else's 

Re: CA disclosure of revocations that exceed 5 days [Was: Re: Incident report D-TRUST: syntax error in one tls certificate]

2018-12-04 Thread Jakob Bohm via dev-security-policy
Hello to you too.

It seems that you are both misunderstanding what the proposal was.

The proposal was apparently to further restrict the ability of CAs to 
make exceptions on their own, by requiring all such exceptions to go 
through the public forums where the root programs can challenge or even 
deny a proposed exception, after hearing the case by case arguments for 
why an exception should be granted.

A better example would be that if someone broke their leg for some 
reason, and therefore wants to delay payment of a debt by a short while, 
they should be able to ask for it, and the request would be considered 
on its merits, not based on a hard-nosed principle of never granting any 
extensions.

Now because CAs making exceptions can be technically considered against 
the letter of the BRs, specifying how exceptions should be reviewed 
would constitute an admission by the community that exceptions might be 
ok in some cases.  Thus from a purely legalistic perspective it would 
constitute a weakening of the rules.  But only if one ignores the 
reality that such exceptions currently happen with little or no 
oversight.

As for doing risk assessments and reporting, no deep thinking and no 
special logging of considerations is needed when revoking as quickly 
as possible, well within the current 24 hour and 5 day deadlines (as 
applicable), which hopefully constitutes the vast majority of revocations.


On 04/12/2018 11:02, Fotis Loukos wrote:
> Hello everybody,
> First of all, I would like to note that I am writing as an individual
> and my opinion does not necessarily represent the opinion of my employer.
> 
> An initial comment is that statements such as "I disagree that CAs are
> "doing their best" to comply with the rules." because some CAs are
> indeed not doing their best is simply a fallacy in Ryan's argumentation,
> the fallacy of composition. Dimitris does not represent all CAs, and I'm
> pretty sure that you are aware of this Ryan. Generalizations and the
> distinction of two teams, our team (the browsers) and their team (the
> CAs), where by default our team are the good guys and their team are
> malicious is plain demagoguery. Since you like extreme examples, please
> note that generalizations (we don't like a member of a demographic thus
> all people from that demographic are bad) have lead humanity to
> committing atrocities, let's not go down that road, especially since I
> know you Ryan and you're definitely not that type of person.
> 
> I believe that the arguments presented by Dimitris are simply red
> herring. Whether there is a blackout period, the CA lost internet
> connectivity or a 65 character OU does not pose a risk to relying
> parties is a form of ignoratio elenchi, a fallacy identified even by
> Aristotle thousands of years ago. Using the same deductive reasoning,
> someone could argue that if a person was scammed in participating in a
> ponzi scheme and lost all his fortune, he can steal someone else's money.
> 
> The true point of the argument is whether CAs should be allowed to break
> the BRs based on their own risk analysis. So, what is a certificate?
> It's more or less an assertion. And making an assertion is equally
> important as revoking it. As Ryan correctly mentioned, if this becomes a
> norm, why shouldn't CAs be allowed to make a risk analysis and decide
> that they will break the BRs in making the assertion too, effectively
> issuing certificates with their own validation methods? Where would this
> lead us? Who would be able to trust the WebPKI afterwards? Are we
> looking into making it the wild west of the internet?
> 
> In addition, do you think that CAs should be audited regarding their
> criteria for their risk analysis?
> 
> Furthermore, this poses a great risk for the CAs too. If this becomes a
> practice, how can CAs be assured that the browsers won't make a risk
> analysis and decide that an issuance made in accordance to all the
> requirements in the BRs is a misissuance? Until now, we have seen that
> browsers have distrusted CAs based on concrete evidence of misissuances.
> Do you think Dimitris that they should be allowed to distrust CAs based
> on some risk analysis?
> 
> Regards,
> Fotis
> 
> 
> On 30/11/18 6:13 μ.μ., Ryan Sleevi via dev-security-policy wrote:
>> On Fri, Nov 30, 2018 at 4:24 AM Dimitris Zacharopoulos 
>> wrote:
>>
>>>
>>>
>>> On 30/11/2018 1:49 π.μ., Ryan Sleevi wrote:
>>>
>>>
>>>
>>> On Thu, Nov 29, 2018 at 4:03 PM Dimitris Zacharopoulos via
>>> dev-security-policy  wrote:
>>>
 I didn't want to hijack the thread so here's a new one.


 Times and circumstances change.
>>>
>>>
>>> You have to demonstrate that.
>>>
>>>
>>> It's self-proved :-)
>>>
>>
>> This sort of glib reply shows a lack of good-faith effort to meaningfully
>> engage. It's like forcing the discussion every minute, since, yanno, "times
>> and circumstances have changed".
>>
>> I gave you concrete reasons why saying something like this is a
>> 

Re: Incident report D-TRUST: syntax error in one tls certificate

2018-12-04 Thread Jakob Bohm via dev-security-policy

On 04/12/2018 13:36, Nick Lamb wrote:

On Tue, 4 Dec 2018 07:56:12 +0100
Jakob Bohm via dev-security-policy
 wrote:


Which systems?


As far as I'm aware, any of the automated certificate issuance
technologies can be used here, ACME is the one I'm most familiar with
because it is going through IETF standardisation and so we get to see
not only the finished system but all the process and discussion.



Oh, so you meant "CA issuance systems and protocols with explicit
automation features" (as opposed to e.g. web server systems or operating
systems or site specific subscriber automation systems).  That's why I
asked.

And note that this situation started with an OV certificate, not a DV
certificate.  So more than domain ownership needs to be validated.


I prefer not to experiment with live certificates.  Anyway, this was
never intended to focus on the specifics of ACME, since OC issuance
isn't ACME anyway.


The direction of the thread was: Excuses for why a subscriber can't
manage to replace certificates in a timely fashion. Your contribution
was a claim that automated deployment has poor operational security
because:

"it necessarily grants read/write access to the certificate data
(including private key) to an automated, online, unsupervised system."

I've cleanly refuted that, showing that in a real, widely used system
neither read nor write access to the private key is needed to perform
automated certificate deployment. You do not need to like this, but to
insist that something false is "necessarily" true is ludicrous.



You have shown that ONE system, which you happen to like, can avoid that
weakness, IF you ignore some other issues.  You have not shown that
requiring subscribers to do this for any and all combinations of
validation systems and TLS server systems they encounter won't have this
weakness.


So returning to the typical, as-specified-in-the-BRs validation
challenges.  Those generally either do not include the CSR in the
challenge, or do so in a manner that would involve active checking
rather than just trivial concatenation.  These are the kind of
challenges that require the site owner to consider IF they are in a
certificate request process before responding.


I _think_ this means you still didn't grasp how ACME works, or even how
one would in general approach this problem. The CSR needs to go from
the would-be subscriber to the CA, it binds the SANs to the key pair,
proving that someone who knows the private key wanted a certificate for
these names. ACME wants to bind the names back to the would-be
subscriber, proving that whoever this is controls those names, and so
is entitled to such a certificate. It uses _different_ keys for that
precisely so that it doesn't need the TLS private key.


It means ACME is of very little relevance to OV and EV certificates from
most/all current OV and EV CAs.



But most centrally the Baseline Requirements aren't called the "Ideal
Goals" but only the "Baseline Requirements" for a reason. If a CA
approaches them as a target to be aimed for, rather than as a bare
minimum to be exceeded, we're going to have a problem. Accordingly the
Ten Blessed Methods aren't suggestions for how an ideal CA should
validate control of names, they're the very minimum you must do to
validate control of names. ACME does more, frankly any CA should be
aiming to do more.


I made no such claim.  I was saying that your hypothetical that all/most
validation systems have the properties of ACME and that all/most TLS
servers allow certificate replacement without access to the private key
storage represents an idealized scenario different from practical
reality.




See for example NIST SP 1800-16B Prelim Draft 1, Section 5.1.4 which
has this to say:

   "... It is possible to renew a certificate with the same public and
   private keys (i.e., not rekeying during the renewal process).
   However, this is only recommended when the private key is contained
   with a hardware security module (HSM) validated to Federal
Information Processing Standards (FIPS) Publication 140-2 Level 2 or
above"


Just before that sentence the current draft says:

"It is important to note that the validity period of a certificate is
different than the cryptoperiod of the public key contained in the
certificate and the corresponding private key."


And the paragraph I quoted says to not do that unless you are using a
HSM, which very few subscribers do.



Quite so. Thus, the only reason to change both at the same time is as I
said, a convenience of scheduling, NIST does not claim that creating
certificates has any actual impact on the cryptoperiod, they just want
organisations to change their keys frequently and "on renewal" is a
convenient time to schedule such a change.


It is not a convenience of scheduling.  It is a security best practice,
called out (as the first example found) in that particular NIST
document.



Moreover, this is (a draft of) Volume B of NIST's guidance. There is an
entire 

Re: Incident report D-TRUST: syntax error in one tls certificate

2018-12-04 Thread Nick Lamb via dev-security-policy
On Tue, 4 Dec 2018 07:56:12 +0100
Jakob Bohm via dev-security-policy
 wrote:

> Which systems?

As far as I'm aware, any of the automated certificate issuance
technologies can be used here, ACME is the one I'm most familiar with
because it is going through IETF standardisation and so we get to see
not only the finished system but all the process and discussion.

> I prefer not to experiment with live certificates.  Anyway, this was 
> never intended to focus on the specifics of ACME, since OC issuance 
> isn't ACME anyway.

The direction of the thread was: Excuses for why a subscriber can't
manage to replace certificates in a timely fashion. Your contribution
was a claim that automated deployment has poor operational security
because:

"it necessarily grants read/write access to the certificate data
(including private key) to an automated, online, unsupervised system."

I've cleanly refuted that, showing that in a real, widely used system
neither read nor write access to the private key is needed to perform
automated certificate deployment. You do not need to like this, but to
insist that something false is "necessarily" true is ludicrous.

> So returning to the typical, as-specified-in-the-BRs validation 
> challenges.  Those generally either do not include the CSR in the 
> challenge, or do so in a manner that would involve active checking 
> rather than just trivial concatenation.  These are the kind of 
> challenges that require the site owner to consider IF they are in a 
> certificate request process before responding.

I _think_ this means you still didn't grasp how ACME works, or even how
one would in general approach this problem. The CSR needs to go from
the would-be subscriber to the CA, it binds the SANs to the key pair,
proving that someone who knows the private key wanted a certificate for
these names. ACME wants to bind the names back to the would-be
subscriber, proving that whoever this is controls those names, and so
is entitled to such a certificate. It uses _different_ keys for that
precisely so that it doesn't need the TLS private key.

But most centrally the Baseline Requirements aren't called the "Ideal
Goals" but only the "Baseline Requirements" for a reason. If a CA
approaches them as a target to be aimed for, rather than as a bare
minimum to be exceeded, we're going to have a problem. Accordingly the
Ten Blessed Methods aren't suggestions for how an ideal CA should
validate control of names, they're the very minimum you must do to
validate control of names. ACME does more, frankly any CA should be
aiming to do more.

> See for example NIST SP 1800-16B Prelim Draft 1, Section 5.1.4 which
> has this to say:
> 
>   "... It is possible to renew a certificate with the same public and 
>   private keys (i.e., not rekeying during the renewal process). 
>   However, this is only recommended when the private key is contained 
>   with a hardware security module (HSM) validated to Federal
> Information Processing Standards (FIPS) Publication 140-2 Level 2 or
> above"

Just before that sentence the current draft says:

"It is important to note that the validity period of a certificate is
different than the cryptoperiod of the public key contained in the
certificate and the corresponding private key."

Quite so. Thus, the only reason to change both at the same time is as I
said, a convenience of scheduling, NIST does not claim that creating
certificates has any actual impact on the cryptoperiod, they just want
organisations to change their keys frequently and "on renewal" is a
convenient time to schedule such a change.

Moreover, this is (a draft of) Volume B of NIST's guidance. There is an
entire volume, Volume C, about the use of automation, to be published
later. I have no idea what that will say, but I doubt it will begin by
insisting that you need read-write access to private keys to do
something people are already doing today without such access.


> I am referring to the very real facts that:
> 
> - Many "config GUI only" systems request certificate import as
> PKCS#12 files or similar.

This is a real phenomenon, and encourages a lot of bad practices we've
discussed previously on m.d.s.policy. It even manages to make the
already confusing (for lay persons) question of what's "secret" and what
is not yet more puzzling, with IMNSHO minimal gains to show for it. Use
of PKCS#12 in this way can't be deprecated quickly enough for my liking.

[ This is also related to the Windows ecosystem in which there's a
pretence kept up that private keys aren't accessible once imported,
which of course isn't mechanically true since those keys are needed by
the system for it to work. So bad guys can ignore the documentation
saying its impossible and just read the keys out of RAM with a trivial
program, but good guys can't get back their own private keys.
A true masterpiece of security engineering, presumably from the same
people who invented the LANMAN password hash. ]

> - Many open source TLS servers 

Re: CA disclosure of revocations that exceed 5 days [Was: Re: Incident report D-TRUST: syntax error in one tls certificate]

2018-12-04 Thread Fotis Loukos via dev-security-policy
Hello everybody,
First of all, I would like to note that I am writing as an individual
and my opinion does not necessarily represent the opinion of my employer.

An initial comment is that statements such as "I disagree that CAs are
"doing their best" to comply with the rules." because some CAs are
indeed not doing their best is simply a fallacy in Ryan's argumentation,
the fallacy of composition. Dimitris does not represent all CAs, and I'm
pretty sure that you are aware of this Ryan. Generalizations and the
distinction of two teams, our team (the browsers) and their team (the
CAs), where by default our team are the good guys and their team are
malicious is plain demagoguery. Since you like extreme examples, please
note that generalizations (we don't like a member of a demographic thus
all people from that demographic are bad) have lead humanity to
committing atrocities, let's not go down that road, especially since I
know you Ryan and you're definitely not that type of person.

I believe that the arguments presented by Dimitris are simply red
herring. Whether there is a blackout period, the CA lost internet
connectivity or a 65 character OU does not pose a risk to relying
parties is a form of ignoratio elenchi, a fallacy identified even by
Aristotle thousands of years ago. Using the same deductive reasoning,
someone could argue that if a person was scammed in participating in a
ponzi scheme and lost all his fortune, he can steal someone else's money.

The true point of the argument is whether CAs should be allowed to break
the BRs based on their own risk analysis. So, what is a certificate?
It's more or less an assertion. And making an assertion is equally
important as revoking it. As Ryan correctly mentioned, if this becomes a
norm, why shouldn't CAs be allowed to make a risk analysis and decide
that they will break the BRs in making the assertion too, effectively
issuing certificates with their own validation methods? Where would this
lead us? Who would be able to trust the WebPKI afterwards? Are we
looking into making it the wild west of the internet?

In addition, do you think that CAs should be audited regarding their
criteria for their risk analysis?

Furthermore, this poses a great risk for the CAs too. If this becomes a
practice, how can CAs be assured that the browsers won't make a risk
analysis and decide that an issuance made in accordance to all the
requirements in the BRs is a misissuance? Until now, we have seen that
browsers have distrusted CAs based on concrete evidence of misissuances.
Do you think Dimitris that they should be allowed to distrust CAs based
on some risk analysis?

Regards,
Fotis


On 30/11/18 6:13 μ.μ., Ryan Sleevi via dev-security-policy wrote:
> On Fri, Nov 30, 2018 at 4:24 AM Dimitris Zacharopoulos 
> wrote:
> 
>>
>>
>> On 30/11/2018 1:49 π.μ., Ryan Sleevi wrote:
>>
>>
>>
>> On Thu, Nov 29, 2018 at 4:03 PM Dimitris Zacharopoulos via
>> dev-security-policy  wrote:
>>
>>> I didn't want to hijack the thread so here's a new one.
>>>
>>>
>>> Times and circumstances change.
>>
>>
>> You have to demonstrate that.
>>
>>
>> It's self-proved :-)
>>
> 
> This sort of glib reply shows a lack of good-faith effort to meaningfully
> engage. It's like forcing the discussion every minute, since, yanno, "times
> and circumstances have changed".
> 
> I gave you concrete reasons why saying something like this is a
> demonstration of a weak and bad-faith argument. If you would like to
> meaningfully assert this, you would need to demonstrate what circumstances
> have changed in such a way as to warrant a rediscussion of something that
> gets 'relitigated' regularly - and, in fact, was something discussed in the
> CA/Browser Forum for the past two years. Just because you're unsatisfied
> with the result and now we're in a month that ends in "R" doesn't mean time
> and circumstances have changed meaningfully to support the discussion.
> 
> Concrete suggestions involved a holistic look at _all_ revocations, since
> the discussion of exceptions is relevant to know whether we are discussing
> something that is 10%, 1%, .1%, or .1%. Similarly, having the framework
> in place to consistently and objectively measure that helps us assess
> whether any proposals for exceptions would change that "1%" from being
> exceptional to seeing "10%" or "100%" being claimed as exceptional under
> some new regime.
> 
> In the absence of that, it's an abusive and harmful act.
> 
> 
>> I already mentioned that this is separate from the incident report (of the
>> actual mis-issuance). We have repeatedly seen post-mortems that say that
>> for some specific cases (not all of them), the revocation of certificates
>> will require more time.
>>
> 
> No. We've seen the claim it will require more time, frequently without
> evidence. However, I do think you're not understanding - there is nothing
> preventing CAs from sharing details, for all revocations they do, about the
> factors they considered, and the 

Re: Incident report D-TRUST: syntax error in one tls certificate

2018-12-03 Thread Jakob Bohm via dev-security-policy
On 04/12/2018 05:38, Nick Lamb wrote:
> On Tue, 4 Dec 2018 01:39:05 +0100
> Jakob Bohm via dev-security-policy
>  wrote:
> 
>> A few clarifications below
>> Interesting.  What is that hole?
> 
> I had assumed that you weren't aware that you could just use these
> systems as designed. Your follow-up clarifies that you believe doing
> this is unsafe. I will endeavour to explain why you're mistaken.
> 

Which systems?

> But also I specifically endorse _learning by doing_. Experiment for
> yourself with how easy it is to achieve auto-renewal with something like
> ACME, try to request renewals against a site that's configured for
> "stateless renewal" but with a new ("bad guy") key instead of your real
> ACME account keys.
> 

I prefer not to experiment with live certificates.  Anyway, this was 
never intended to focus on the specifics of ACME, since OC issuance 
isn't ACME anyway.

So returning to the typical, as-specified-in-the-BRs validation 
challenges.  Those generally either do not include the CSR in the 
challenge, or do so in a manner that would involve active checking 
rather than just trivial concatenation.  These are the kind of 
challenges that require the site owner to consider IF they are in a 
certificate request process before responding.

> 
>> It certainly needs the ability to change private keys (as reusing
>> private keys for new certificates is bad practice and shouldn't be
>> automated).
> 
> In which good practice document can I read that private keys should be
> replaced earlier than their ordinary lifetime if new certificates are
> minted during that lifetime? Does this document explain how its authors
> imagine the new certificate introduces a novel risk?
> 
> [ This seems like breakthrough work to me, it implies a previously
> unimagined weakness in, at least, RSA ]
> 

Aligning key and certificate lifetime is generally good practice.

See for example NIST SP 1800-16B Prelim Draft 1, Section 5.1.4 which has 
this to say:

  "... It is possible to renew a certificate with the same public and 
  private keys (i.e., not rekeying during the renewal process). 
  However, this is only recommended when the private key is contained 
  with a hardware security module (HSM) validated to Federal Information 
  Processing Standards (FIPS) Publication 140-2 Level 2 or above"

And the operations I discuss are unlikely to purchase an expensive HSM 
that isn't even future proof. (I have checked leading brands of end site 
HSMs, and they barely go beyond current recommended key strengths).

> You must understand that bad guys can, if they wish, construct an
> unlimited number of new certificates corresponding to an existing key,
> silently. Does this too introduce an unacceptable risk ? If not, why is
> the risk introduced if a trusted third party mints one or more further
> certificates ?
> 
> No, I think the problem here is with your imaginary "bad practice".
> You have muddled the lifetime of the certificate (which relates to the
> decay in assurance of subject information validated and to other
> considerations) with the lifetime of the keys, see below.
> 
>> By definition, the strength of public keys, especially TLS RSA
>> signing keys used with PFS suites, involves a security tradeoff
>> between the time that attackers have to break/factor the public key
>> and the slowness of handling TLS connections with current generation
>> standard hardware and software.
> 
> This is true.
> 
>> The current WebPKI/BR tradeoff/compromise is set at 2048 bit keys
>> valid for about 24 months.
> 
> Nope. The limit of 825 days (not "about 24 months") is for leaf
> certificate lifetime, not for keys. It's shorter than it once was not
> out of concern about bad guys breaking 2048-bit RSA but because of
> concern about algorithmic agility and the lifetime of subject
> information validation, mostly the former.

825 Days = 24 months plus ~94 days slop, in practice CAs map this two 
payment for 2 years validity and some allowance for overlap during 
changeover.

> 
> Subscribers are _very_ strongly urged to choose shorter, not longer
> lifetimes, again not because we're worried about 2048-bit RSA (you will
> notice there's no exemption for 4096-bit keys) but because of agility
> and validation.
> 
> But choosing new keys every time you get a new certificate is
> purely a mechanical convenience of scheduling, not a technical necessity
> - like a fellow who schedules an appointment at the barber each time he
> receives a telephone bill, the one thing has nothing to do with the
> other.
> 

See above NIST quote.

> 
>> It requires write access to the private keys, even if the operators
>> might not need to see those keys, many real world systems don't allow
>> granting "install new private key" permission without "see new
>> private key" permission and "choose arbitrary private key" permission.
>>
>> Also, many real world systems don't allow installing a new
>> certificate for an existing key without reinstalling the 

Re: Incident report D-TRUST: syntax error in one tls certificate

2018-12-03 Thread Nick Lamb via dev-security-policy
On Tue, 4 Dec 2018 01:39:05 +0100
Jakob Bohm via dev-security-policy
 wrote:

> A few clarifications below
> Interesting.  What is that hole?

I had assumed that you weren't aware that you could just use these
systems as designed. Your follow-up clarifies that you believe doing
this is unsafe. I will endeavour to explain why you're mistaken.

But also I specifically endorse _learning by doing_. Experiment for
yourself with how easy it is to achieve auto-renewal with something like
ACME, try to request renewals against a site that's configured for
"stateless renewal" but with a new ("bad guy") key instead of your real
ACME account keys.


> It certainly needs the ability to change private keys (as reusing
> private keys for new certificates is bad practice and shouldn't be
> automated).

In which good practice document can I read that private keys should be
replaced earlier than their ordinary lifetime if new certificates are
minted during that lifetime? Does this document explain how its authors
imagine the new certificate introduces a novel risk?

[ This seems like breakthrough work to me, it implies a previously
unimagined weakness in, at least, RSA ]

You must understand that bad guys can, if they wish, construct an
unlimited number of new certificates corresponding to an existing key,
silently. Does this too introduce an unacceptable risk ? If not, why is
the risk introduced if a trusted third party mints one or more further
certificates ?

No, I think the problem here is with your imaginary "bad practice".
You have muddled the lifetime of the certificate (which relates to the
decay in assurance of subject information validated and to other
considerations) with the lifetime of the keys, see below.

> By definition, the strength of public keys, especially TLS RSA
> signing keys used with PFS suites, involves a security tradeoff
> between the time that attackers have to break/factor the public key
> and the slowness of handling TLS connections with current generation
> standard hardware and software.

This is true.

> The current WebPKI/BR tradeoff/compromise is set at 2048 bit keys
> valid for about 24 months.

Nope. The limit of 825 days (not "about 24 months") is for leaf
certificate lifetime, not for keys. It's shorter than it once was not
out of concern about bad guys breaking 2048-bit RSA but because of
concern about algorithmic agility and the lifetime of subject
information validation, mostly the former.

Subscribers are _very_ strongly urged to choose shorter, not longer
lifetimes, again not because we're worried about 2048-bit RSA (you will
notice there's no exemption for 4096-bit keys) but because of agility
and validation.

But choosing new keys every time you get a new certificate is
purely a mechanical convenience of scheduling, not a technical necessity
- like a fellow who schedules an appointment at the barber each time he
receives a telephone bill, the one thing has nothing to do with the
other.


> It requires write access to the private keys, even if the operators
> might not need to see those keys, many real world systems don't allow
> granting "install new private key" permission without "see new
> private key" permission and "choose arbitrary private key" permission.
> 
> Also, many real world systems don't allow installing a new
> certificate for an existing key without reinstalling the matching
> private key, simply because that's the interface.
> 
> Traditional military encryption systems are built without these 
> limitations, but civilian systems are often not.

Nevertheless.

I'm sure there's a system out there somewhere which requires you to
provide certificates on a 3.5" floppy disk. But that doesn't mean
issuing certificates can reasonably be said to require a 3.5" floppy
disk, it's just those particular systems.

> This is why good CAs send out reminder e-mails in advance.  And why 
> one should avoid CAs that use that contact point for infinite spam 
> about new services.

They do say that insanity consists of doing the same thing over and
over and expecting different results.

> The scenario is "Bad guy requests new cert, CA properly challenges 
> good guy at good guy address, good guy responds positively without 
> reference to old good guy CSR, CA issues for bad guy CSR, bad guy 
> grabs new cert from anywhere and matches to bad guy private key, 
> bad guy does actual attack".

You wrote this in response to me explaining exactly why this scenario
won't work in ACME (or any system which wasn't designed by idiots -
though having read their patent filings the commercial CAs on the whole
may be taken as idiots to my understanding)

I did make one error though, in using the word "signature" when this
data is not a cryptographic signature, but rather a "JWK Thumbprint".

When "good guy responds positively" that positive response includes
a Thumbprint corresponding to their ACME public key. When they're
requesting issuance this works fine because they use their ACME keys
for 

Re: Incident report D-TRUST: syntax error in one tls certificate

2018-12-03 Thread Jakob Bohm via dev-security-policy
A few clarifications below

On 30/11/2018 10:48, Nick Lamb wrote:
> On Wed, 28 Nov 2018 22:41:37 +0100
> Jakob Bohm via dev-security-policy
>  wrote:
> 
>> I blame those standards for forcing every site to choose between two
>> unfortunate risks, in this case either the risks prevented by those
>> "pinning" mechanisms and the risks associated with having only one
>> certificate.
> 
> HTTPS Key Pinning (HPKP) is deprecated by Google and is widely
> considered a failure because it acts as a foot-gun and (more seriously
> but less likely in practice) enables sites to be held to ransom by bad
> guys.
> 
> Mostly though, what I want to focus on is a big hole in your knowledge
> of what's available today, which I'd argue is likely significant in
> that probably most certificate Subscribers don't know about it, and
> that's something the certificate vendors could help to educate them
> about and/or deliver products to help them use.
> 

Interesting.  What is that hole?

>> Automating certificate deployment (as you often suggest) lowers
>> operational security, as it necessarily grants read/write access to
>> the certificate data (including private key) to an automated, online,
>> unsupervised system.
> 
> No!
> 
> This system does not need access to private keys. Let us take ACME as
> our example throughout, though nothing about what I'm describing needs
> ACME per se, it's simply a properly documented protocol for automation
> that complies with CA/B rules.

It certainly needs the ability to change private keys (as reusing private 
keys for new certificates is bad practice and shouldn't be automated).

This means that some part of the overall automated system needs the ability 
to generate fresh keys, sign CSRs, and cause servers to switch to those new 
keys.

And because this discussion entails triggering all that at an out-of-schedule 
time, having a "CSR pre-generation ceremony" every 24 months (the normal 
reissue schedule for EV certs) will provide limited ability to handle 
out-of-schedule certificate replacement (because it is also bad practice to 
have private keys with a design lifetime of 24 months laying around for 48 
months prior to planned expiry).


> 
> The ACME CA expects a CSR, signed with the associated private key, but
> it does not require that this CSR be created fresh during validation +
> issuance. A Subscriber can as they wish generate the CSR manually,
> offline and with full supervision. The CSR is a public document
> (revealing it does not violate any cryptographic assumptions). It is
> entirely reasonable to create one CSR when the key pair is minted and
> replace it only in a scheduled, predictable fashion along with the keys
> unless a grave security problem occurs with your systems.
> 
> ACME involves a different private key, possessed by the subscriber/
> their agent only for interacting securely with ACME, the ACME client
> needs this key when renewing, but it doesn't put the TLS certificate key
> at risk.
> 
> Certificates are public information by definition. No new risk there.
> 

By definition, the strength of public keys, especially TLS RSA signing 
keys used with PFS suites, involves a security tradeoff between the 
time that attackers have to break/factor the public key and the slowness 
of handling TLS connections with current generation standard hardware and 
software.

The current WebPKI/BR tradeoff/compromise is set at 2048 bit keys valid 
for about 24 months.



> 
>> Allowing multiple persons to replace the certificates also lowers
>> operational security, as it (by definition) grants multiple persons
>> read/write access to the certificate data.
> 
> Again, certificates themselves are public information and this does not
> require access to the private keys.

It requires write access to the private keys, even if the operators might 
not need to see those keys, many real world systems don't allow granting 
"install new private key" permission without "see new private key" 
permission and "choose arbitrary private key" permission.

Also, many real world systems don't allow installing a new certificate 
for an existing key without reinstalling the matching private key, simply 
because that's the interface.

Traditional military encryption systems are built without these 
limitations, but civilian systems are often not.


> 
>> Under the current and past CA model, certificate and private key
>> replacement is a rare (once/2 years) operation that can be done
>> manually and scheduled weeks in advance, except for unexpected
>> failures (such as a CA messing up).
>   
> This approach, which has been used at some of my past employers,
> inevitably results in systems where the certificates expire "by
> mistake". Recriminations and insistence that lessons will be learned
> follow, and then of course nothing is followed up and the problem
> recurs.
> 
> It's a bad idea, a popular one, but still a bad idea.

This is why good CAs send out reminder e-mails in advance.  And why 

Re: Incident report D-TRUST: syntax error in one tls certificate

2018-12-01 Thread Eric Mill via dev-security-policy
On Wed, Nov 28, 2018 at 4:41 PM Jakob Bohm via dev-security-policy <
dev-security-policy@lists.mozilla.org> wrote:

> On 27/11/2018 00:54, Ryan Sleevi wrote:
> > On Mon, Nov 26, 2018 at 12:12 PM Jakob Bohm via dev-security-policy <
> > dev-security-policy@lists.mozilla.org> wrote:
> >
> >> 2. Being critical from a society perspective (e.g. being the contact
> >> point for a service to help protect the planet), doesn't mean that
> the
> >> people running such a service can be expected to be IT superstars
> >> capable of dealing with complex IT issues such as unscheduled
> >> certificate replacement due to no fault of their own.
> >>
> >
> > That sounds like an operational risk the site (knowingly) took. Solutions
> > for automation exist, as do concepts such as "hiring multiple people"
> > (having a NOC/SOC). I see nothing to argue that a single person is
> somehow
> > the risk here.
> >
>
> The number of people in the world who can do this is substantially
> smaller than the number of sites that might need them.  We must
> therefore, by necessity, accept that some such sites will not hire such
> people, or worse multiple such people for their own exclusive use.
>
> Automating certificate deployment (as you often suggest) lowers
> operational security, as it necessarily grants read/write access to
> the certificate data (including private key) to an automated, online,
> unsupervised system.
>

Respectfully, this isn't accurate. Automated certificate deployment and
rotation is a best practice for high-functioning enterprises, and can be
done without exposing general read/write access to other systems. I've seen
automated certificate rotation implemented in several federal government
agencies, and (maybe more importantly) have seen many more agencies let
their certificates expire and impact the security of public services due to
a lack of automation.

Nick already described how the ACME protocol can be automated without
exposing the TLS private key, but more generally, organizations can use
scoped permissioning to grant individual components only the specific
access they need to accomplish their job. As an example, customers of
Amazon Web Services can use the IAM permissions framework to establish
granular permissions that mitigate the impact of component compromise.
Enterprises relying on self-managed infrastructure are free to implement a
similar system.

For a government example of automated certificate issuance, see
https://cloud.gov/docs/services/cdn-route/, which is a FedRAMPed service
whose security authorization is signed off on by the Departments of Defense
and Homeland Security.

Societally important organizations who don't specialize in technology
(which is most of them), or for whatever reason can't feasibly automate
their certificate operations, should definitely be relying on
infrastructure managed by third parties which do specialize in this
technology, be it basic site hosting like Squarespace or more sophisticated
cloud services.

In other words, no organization has an excuse to not be able to rotate a
certificate given 5 days' notice. The fact that many large organizations
continue to have a problem with this doesn't make it any more excusable.

-- Eric


> Allowing multiple persons to replace the certificates also lowers
> operational security, as it (by definition) grants multiple persons
> read/write access to the certificate data.
>
> Under the current and past CA model, certificate and private key
> replacement is a rare (once/2 years) operation that can be done
> manually and scheduled weeks in advance, except for unexpected
> failures (such as a CA messing up).
>
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: CA disclosure of revocations that exceed 5 days [Was: Re: Incident report D-TRUST: syntax error in one tls certificate]

2018-11-30 Thread Ryan Sleevi via dev-security-policy
On Fri, Nov 30, 2018 at 4:24 AM Dimitris Zacharopoulos 
wrote:

>
>
> On 30/11/2018 1:49 π.μ., Ryan Sleevi wrote:
>
>
>
> On Thu, Nov 29, 2018 at 4:03 PM Dimitris Zacharopoulos via
> dev-security-policy  wrote:
>
>> I didn't want to hijack the thread so here's a new one.
>>
>>
>> Times and circumstances change.
>
>
> You have to demonstrate that.
>
>
> It's self-proved :-)
>

This sort of glib reply shows a lack of good-faith effort to meaningfully
engage. It's like forcing the discussion every minute, since, yanno, "times
and circumstances have changed".

I gave you concrete reasons why saying something like this is a
demonstration of a weak and bad-faith argument. If you would like to
meaningfully assert this, you would need to demonstrate what circumstances
have changed in such a way as to warrant a rediscussion of something that
gets 'relitigated' regularly - and, in fact, was something discussed in the
CA/Browser Forum for the past two years. Just because you're unsatisfied
with the result and now we're in a month that ends in "R" doesn't mean time
and circumstances have changed meaningfully to support the discussion.

Concrete suggestions involved a holistic look at _all_ revocations, since
the discussion of exceptions is relevant to know whether we are discussing
something that is 10%, 1%, .1%, or .1%. Similarly, having the framework
in place to consistently and objectively measure that helps us assess
whether any proposals for exceptions would change that "1%" from being
exceptional to seeing "10%" or "100%" being claimed as exceptional under
some new regime.

In the absence of that, it's an abusive and harmful act.


> I already mentioned that this is separate from the incident report (of the
> actual mis-issuance). We have repeatedly seen post-mortems that say that
> for some specific cases (not all of them), the revocation of certificates
> will require more time.
>

No. We've seen the claim it will require more time, frequently without
evidence. However, I do think you're not understanding - there is nothing
preventing CAs from sharing details, for all revocations they do, about the
factors they considered, and the 'exceptional' cases to the customers,
without requiring any BR violations (of the 24 hour / 5 day rule). That CAs
don't do this only undermines any validity of the argument you are making.

There is zero legitimate reason to normalize aberrant behaviour.


> Even the underscore revocation deadline creates problems for some large
> organizations as Jeremy pointed out. I understand the compatibility
> argument and CAs are doing their best to comply with the rules but you are
> advocating there should be no exceptions and you say that without having
> looked at specific evidence that would be provided by CAs asking for
> exceptions. You would rather have Relying Parties loose their internet
> services from one of the Fortune 500 companies. As a Relying Party myself,
> I would hate it if I couldn't connect to my favorite online e-shop or bank
> or webmail. So I'm still confused about which Relying Party we are trying
> to help/protect by requiring the immediate revocation of a Certificate that
> has 65 characters in the OU field.
>
> I also see your point that "if we start making exceptions..." it's too
> risky. I'm just suggesting that there should be some tolerance for extended
> revocations (to help with collecting more information) which doesn't
> necessarily mean that we are dealing with a "bad" CA. I trust the Mozilla
> module owner's judgement to balance that. If the community believes that
> this problem is already solved, I'm happy with that :)
>

The argument being made here is as odious as saying "We should have one day
where all crime is legal, including murder" or "Those who knowingly buy
stolen goods should be able to keep them, because they're using them".

I disagree that CAs are "doing their best" to comply with the rules. The
post-mortems continually show a lack of applied best practice. DigiCert's
example is, I think, a good one - because I do not believe it's reasonable
for DigiCert to have argued that there was ambiguity, given that prior to
the ballot, it was agreed they were forbidden, a ballot to explicitly
permit them failed, and the discussion of that ballot explicitly cited why
they weren't valid. From that, several non-DigiCert CAs took steps to
migrate their customers and cease issuance. As such, you cannot reasonably
argue DigiCert was doing "their best", unless you're willing to accept that
DigiCert's best is, in fact, far lower than the industry norm.

The framing about "Think about harm to the Subscriber" is, again, one that
is actively harmful, and, as coming from a CA, somewhat offensive, because
it shows a difference in perspective that further emphasizes why CA's
judgement cannot be trusted. In this regard, we're in agreement that the
certificates we're discussing are clearly misissued - the CA was never
authorized to have issued that 

Re: Incident report D-TRUST: syntax error in one tls certificate

2018-11-30 Thread Nick Lamb via dev-security-policy
On Wed, 28 Nov 2018 22:41:37 +0100
Jakob Bohm via dev-security-policy
 wrote:

> I blame those standards for forcing every site to choose between two 
> unfortunate risks, in this case either the risks prevented by those 
> "pinning" mechanisms and the risks associated with having only one 
> certificate.

HTTPS Key Pinning (HPKP) is deprecated by Google and is widely
considered a failure because it acts as a foot-gun and (more seriously
but less likely in practice) enables sites to be held to ransom by bad
guys.

Mostly though, what I want to focus on is a big hole in your knowledge
of what's available today, which I'd argue is likely significant in
that probably most certificate Subscribers don't know about it, and
that's something the certificate vendors could help to educate them
about and/or deliver products to help them use.

> Automating certificate deployment (as you often suggest) lowers 
> operational security, as it necessarily grants read/write access to 
> the certificate data (including private key) to an automated, online, 
> unsupervised system.

No!

This system does not need access to private keys. Let us take ACME as
our example throughout, though nothing about what I'm describing needs
ACME per se, it's simply a properly documented protocol for automation
that complies with CA/B rules.

The ACME CA expects a CSR, signed with the associated private key, but
it does not require that this CSR be created fresh during validation +
issuance. A Subscriber can as they wish generate the CSR manually,
offline and with full supervision. The CSR is a public document
(revealing it does not violate any cryptographic assumptions). It is
entirely reasonable to create one CSR when the key pair is minted and
replace it only in a scheduled, predictable fashion along with the keys
unless a grave security problem occurs with your systems.

ACME involves a different private key, possessed by the subscriber/
their agent only for interacting securely with ACME, the ACME client
needs this key when renewing, but it doesn't put the TLS certificate key
at risk.

Certificates are public information by definition. No new risk there.


> Allowing multiple persons to replace the certificates also lowers 
> operational security, as it (by definition) grants multiple persons 
> read/write access to the certificate data.

Again, certificates themselves are public information and this does not
require access to the private keys.

> Under the current and past CA model, certificate and private key 
> replacement is a rare (once/2 years) operation that can be done 
> manually and scheduled weeks in advance, except for unexpected 
> failures (such as a CA messing up).
 
This approach, which has been used at some of my past employers,
inevitably results in systems where the certificates expire "by
mistake". Recriminations and insistence that lessons will be learned
follow, and then of course nothing is followed up and the problem
recurs.

It's a bad idea, a popular one, but still a bad idea.

> For example, every BR permitted automated domain validation method 
> involves a challenge-response interaction with the site owner, who
> must not (to prevent rogue issuance) respond to that interaction
> except during planned issuance.

It is entirely possible and theoretically safe to configure ACME
responders entirely passively. You can see this design in several
popular third party ACME clients.

The reason it's theoretically safe is that ACME's design ensures the
validation server (for example Let's Encrypt's Boulder) unavoidably
verifies that the validation response is from the correct ACME account
holder.

So if bad guys request issuance, the auto-responder will present a
validation response for the good guy account, which does not match and
issuance will not occur. The bad guys will be told their validation
failed and they've got the keys wrong. Which of course they can't fix
since they've no idea what the right ACME account private key is.

For http-01 at least, you can even configure this without the
auto-responder having any private knowledge at all. Since this part is
just playing back a signature, our basic cryptographic assumptions mean
that we can generate the signature offline and then paste it into the
auto-responder. At least one popular ACME client offers this behaviour.

For a huge outfit like Google or Facebook that can doubtless afford to
have an actual "certificate team" this would not be an appropriate
measure, but at a smaller business it seems entirely reasonable.


> Thus any unscheduled revalidation of domain ownership would, by 
> necessity, involve contacting the site owner and convincing them this
> is not a phishing attempt.

See above, this works today for lots of ACME validated domains.

> Some ACME protocols may contain specific authenticated ways for the
> CA to revalidate out-of-schedule, but this would be outside the norm.

Just revalidating, though it seems to be a popular trick for CAs, is
not 

Re: CA disclosure of revocations that exceed 5 days [Was: Re: Incident report D-TRUST: syntax error in one tls certificate]

2018-11-30 Thread Dimitris Zacharopoulos via dev-security-policy



On 30/11/2018 1:49 π.μ., Ryan Sleevi wrote:



On Thu, Nov 29, 2018 at 4:03 PM Dimitris Zacharopoulos via 
dev-security-policy > wrote:


I didn't want to hijack the thread so here's a new one.


Times and circumstances change.


You have to demonstrate that.


It's self-proved :-)



When I brought this up at the Server
Certificate Working Group of the CA/B Forum
(https://cabforum.org/pipermail/servercert-wg/2018-September/000165.html),

there was no open disagreement from CAs. 



Look at the discussion during Wayne’s ballot. Look at the discussion 
back when it was Jeremy’s ballot. The proposal was as simplified as 
could be - modeled after 9.16.3 of the BRs. It would have allowed for 
a longer period - NOT an unbounded period, which is grossly negligent 
for publicly trusted CAs.


Agreed.



However, think about CAs that
decide to extend the 5-days (at their own risk) because of
extenuating
circumstances. Doesn't this community want to know what these
circumstances are and evaluate the gravity (or not) of the situation?
The only way this could happen in a consistent way among CAs would
be to
require it in some kind of policy.


This already happens. This is a matter of the CA violating any 
contracts or policies of the root store it is in, and is already being 
handled by those root stores - e.g. misissuance reports. What you’re 
describing as a problem is already solved, as are the expectations for 
CAs - that violating requirements is a path to distrust.


The only “problem” you’re solving is giving CAs more time, and there 
is zero demonstrable evidence, to date, about that being necessary or 
good - and rich and ample evidence of it being bad.


I already mentioned that this is separate from the incident report (of 
the actual mis-issuance). We have repeatedly seen post-mortems that say 
that for some specific cases (not all of them), the revocation of 
certificates will require more time. Even the underscore revocation 
deadline creates problems for some large organizations as Jeremy pointed 
out. I understand the compatibility argument and CAs are doing their 
best to comply with the rules but you are advocating there should be no 
exceptions and you say that without having looked at specific evidence 
that would be provided by CAs asking for exceptions. You would rather 
have Relying Parties loose their internet services from one of the 
Fortune 500 companies. As a Relying Party myself, I would hate it if I 
couldn't connect to my favorite online e-shop or bank or webmail. So I'm 
still confused about which Relying Party we are trying to help/protect 
by requiring the immediate revocation of a Certificate that has 65 
characters in the OU field.


I also see your point that "if we start making exceptions..." it's too 
risky. I'm just suggesting that there should be some tolerance for 
extended revocations (to help with collecting more information) which 
doesn't necessarily mean that we are dealing with a "bad" CA. I trust 
the Mozilla module owner's judgement to balance that. If the community 
believes that this problem is already solved, I'm happy with that :)




> Phrased differently: You don't think large organizations are
currently
> capable, and believe the rest of the industry should accommodate
that.

"Tolerate" would probably be the word I'd use instead of
"accommodate".


I chose accommodate, because you’d like the entire world to take on 
systemic risk - and it is indeed systemic risk, to users especially - 
to benefit some large companies.


Why stop with revocation, though? Why not just let CAs define their 
own validation methods of they think they’re equivalent? After all, if 
we can trust CAs to make good judgements on revocation, why can’t we 
also trust them with validation? Some large companies struggle with 
our existing validation methods, why can’t we accommodate them?


That’s exactly what one of the arguments against restricting 
validation methods was.


As I said, I think this discussion will not accomplish anything 
productive without a structured analysis of the data. Not anecdata 
from one or two incidents, but holistic - because for every 1 real 
need, there may have been 9,999 unnecessary delays in revocation with 
real risk.


How do CAs provide this? For *all* revocations, provide meaningful 
data. I do not see there being any value to discussing further 
extensions until we have systemic transparency in place, and I do not 
see any good coming from trying to change at the same time as placing 
that systemic transparency in place, because there’s no way to measure 
the (negative) impact such change would have.


I don't see how data and evidence for "all revocations" somehow makes 
things better, unless I misunderstood your proposal. It's not a balanced 
request. It would be a huge effort for CAs to write risk assessment 
reports for 

Re: CA disclosure of revocations that exceed 5 days [Was: Re: Incident report D-TRUST: syntax error in one tls certificate]

2018-11-29 Thread Ryan Sleevi via dev-security-policy
On Thu, Nov 29, 2018 at 4:03 PM Dimitris Zacharopoulos via
dev-security-policy  wrote:

> I didn't want to hijack the thread so here's a new one.
>
>
> Times and circumstances change.


You have to demonstrate that.

When I brought this up at the Server
> Certificate Working Group of the CA/B Forum
> (https://cabforum.org/pipermail/servercert-wg/2018-September/000165.html),
>
> there was no open disagreement from CAs.


Look at the discussion during Wayne’s ballot. Look at the discussion back
when it was Jeremy’s ballot. The proposal was as simplified as could be -
modeled after 9.16.3 of the BRs. It would have allowed for a longer period
- NOT an unbounded period, which is grossly negligent for publicly trusted
CAs.

However, think about CAs that
> decide to extend the 5-days (at their own risk) because of extenuating
> circumstances. Doesn't this community want to know what these
> circumstances are and evaluate the gravity (or not) of the situation?
> The only way this could happen in a consistent way among CAs would be to
> require it in some kind of policy.


This already happens. This is a matter of the CA violating any contracts or
policies of the root store it is in, and is already being handled by those
root stores - e.g. misissuance reports. What you’re describing as a problem
is already solved, as are the expectations for CAs - that violating
requirements is a path to distrust.

The only “problem” you’re solving is giving CAs more time, and there is
zero demonstrable evidence, to date, about that being necessary or good -
and rich and ample evidence of it being bad.

> Phrased differently: You don't think large organizations are currently
> > capable, and believe the rest of the industry should accommodate that.
>
> "Tolerate" would probably be the word I'd use instead of "accommodate".


I chose accommodate, because you’d like the entire world to take on
systemic risk - and it is indeed systemic risk, to users especially - to
benefit some large companies.

Why stop with revocation, though? Why not just let CAs define their own
validation methods of they think they’re equivalent? After all, if we can
trust CAs to make good judgements on revocation, why can’t we also trust
them with validation? Some large companies struggle with our existing
validation methods, why can’t we accommodate them?

That’s exactly what one of the arguments against restricting validation
methods was.

As I said, I think this discussion will not accomplish anything productive
without a structured analysis of the data. Not anecdata from one or two
incidents, but holistic - because for every 1 real need, there may have
been 9,999 unnecessary delays in revocation with real risk.

How do CAs provide this? For *all* revocations, provide meaningful data. I
do not see there being any value to discussing further extensions until we
have systemic transparency in place, and I do not see any good coming from
trying to change at the same time as placing that systemic transparency in
place, because there’s no way to measure the (negative) impact such change
would have.

>
> > Do you believe these organizations could respond within 5 days if
> > their internet connectivity was lost?
>
> I think there is different impact. Losing network connectivity would
> have "real" and large (i.e. all RPs) impact compared to installing a

certificate with -say- 65 characters in the OU field which may cause
> very few problems to some RPs that want to use a certain web site.


So you do believe organizations are capable of making timely changes when
necessary, and thus we aren’t discussing capabilities, but perceived
necessity. And because some organizations have been mislead as to the role
of CAs, and thus don’t feel its necessary, don’t feel they should have to
use that capability.

I’m not terribly sympathetic to that at all. As you mention, they can
respond when all RPs are affected, so they can respond when their
certificate is misissused and thus revoked.

You describe it as a black/white issue. I understand your argument that
> other control areas will likely have issues but it always comes down to
> what impact and what damage these failed controls can produce. Layered
> controls and compensating controls in critical areas usually lower the
> risk of severe impact. The Internet is probably safe and will not break
> if for example a certificate with 65-character OU is used on a public
> web site. It's not the same as a CA issuing SHA1 Certificates with
> collision risk.


It absolutely is, and we have seen this time and time again. The CAs most
likely to argue the position you’re taking are the CAs that have had the
most issues.

Do we agree, at least, that any CA violating the BRs or Root Policies puts
the Internet ecosystem at risk?

It seems the core of your argument is how much risk should be acceptable,
and the answer is none. Zero. The point of postmortems is to get us to a
point where, as an industry, we’ve taken every available step 

CA disclosure of revocations that exceed 5 days [Was: Re: Incident report D-TRUST: syntax error in one tls certificate]

2018-11-29 Thread Dimitris Zacharopoulos via dev-security-policy

I didn't want to hijack the thread so here's a new one.

On 29/11/2018 6:39 μ.μ., Ryan Sleevi wrote:



On Thu, Nov 29, 2018 at 2:16 AM Dimitris Zacharopoulos 
mailto:ji...@it.auth.gr>> wrote:


Mandating that CAs disclose revocation situations that exceed the
5-day
requirement with some risk analysis information, might be a good
place
to start. 



This was proposed several times by Google in the Forum, and 
consistently rejected, unfortunately.


Times and circumstances change. When I brought this up at the Server 
Certificate Working Group of the CA/B Forum 
(https://cabforum.org/pipermail/servercert-wg/2018-September/000165.html), 
there was no open disagreement from CAs. However, think about CAs that 
decide to extend the 5-days (at their own risk) because of extenuating 
circumstances. Doesn't this community want to know what these 
circumstances are and evaluate the gravity (or not) of the situation? 
The only way this could happen in a consistent way among CAs would be to 
require it in some kind of policy.


This list has seen disclosures of revocation cases from CAs, mainly as 
part of incident reports. What I understand as disclosure is the fact 
that CAs shared that certain Subscribers (we know these subscribers 
because their Certificates were disclosed as part of the incident 
report) would be damaged if the mis-issued certificates were revoked 
within 24 hours. Now, depending on the circumstances this might be 
extended to 5 days.



I don't consider 5 days (they are not even working days) to be
adequate
warning period to a large organization with slow reflexes and long
procedures. 



Phrased differently: You don't think large organizations are currently 
capable, and believe the rest of the industry should accommodate that.


"Tolerate" would probably be the word I'd use instead of "accommodate".



Do you believe these organizations could respond within 5 days if 
their internet connectivity was lost?


I think there is different impact. Losing network connectivity would 
have "real" and large (i.e. all RPs) impact compared to installing a 
certificate with -say- 65 characters in the OU field which may cause 
very few problems to some RPs that want to use a certain web site.




For example, if many CAs violate the 5-day rule for revocations
related
to improper subject information encoding, out of range, wrong
syntax and
that sort, Mozilla or the BRs might decide to have a separate
category
with a different time frame and/or different actions.


Given the security risks in this, I think this is extremely harmful to 
the ecosystem and to users.


It is not the first time we talk about this and it might be worth
exploring further.


I don't think any of the facts have changed. We've discussed for 
several years that CAs have the opportunity to provide this 
information, and haven't, so I don't think it's at all proper to 
suggest starting a conversation without structured data. CAs that are 
passionate about this could have supported such efforts in the Forum 
to provide this information, or could have demonstrated doing so on 
their own. I don't think it would at all be productive to discuss 
these situations in abstract hypotheticals, as some of the discussions 
here try to do - without data, that would be an extremely unproductive 
use of time.


There were voices during the SC6 ballot discussion that wanted to extend 
the 5 days to something more. We continuously see CAs that either detect 
or learn about having mis-issued Certificates, that fail to revoke 
within 24 hours or even 5 days because their Subscribers have problems 
and the RPs would be left with no service until the certificates were 
replaces. I don't think we are having a hypothetical discussion, we have 
seen real cases being disclosed in m.d.s.p. but it would be important to 
have a policy in place to require disclosure of more information. 
Perhaps that would work as a deterrent for CAs to revoke past the 5 days 
if they don't have strong arguments to support their decisions in public.



As a general comment, IMHO when we talk about RP risk when a CA
issues a
Certificate with -say- longer than 64 characters in an OU field, that
would only pose risk to Relying Parties *that want to interact
with that
particular Subscriber*, not the entire Internet. 



No. This is demonstrably and factually wrong.

First, we already know that technical errors are a strong sign that 
the policies and practices themselves are not being followed - both 
the validation activities and the issuance activities result from the 
CA following it's practices and procedures. If a CA is not following 
its practices and procedures, that's a security risk to the Internet, 
full stop.


You describe it as a black/white issue. I understand your argument that 
other control areas will likely have issues but it always comes down to 
what impact and what 

Re: Incident report D-TRUST: syntax error in one tls certificate

2018-11-29 Thread Ryan Sleevi via dev-security-policy
On Thu, Nov 29, 2018 at 2:16 AM Dimitris Zacharopoulos 
wrote:

> Mandating that CAs disclose revocation situations that exceed the 5-day
> requirement with some risk analysis information, might be a good place
> to start.


This was proposed several times by Google in the Forum, and consistently
rejected, unfortunately.


> I don't consider 5 days (they are not even working days) to be adequate
> warning period to a large organization with slow reflexes and long
> procedures.


Phrased differently: You don't think large organizations are currently
capable, and believe the rest of the industry should accommodate that.

Do you believe these organizations could respond within 5 days if their
internet connectivity was lost?


> For example, if many CAs violate the 5-day rule for revocations related
> to improper subject information encoding, out of range, wrong syntax and
> that sort, Mozilla or the BRs might decide to have a separate category
> with a different time frame and/or different actions.
>

Given the security risks in this, I think this is extremely harmful to the
ecosystem and to users.

It is not the first time we talk about this and it might be worth
> exploring further.
>

I don't think any of the facts have changed. We've discussed for several
years that CAs have the opportunity to provide this information, and
haven't, so I don't think it's at all proper to suggest starting a
conversation without structured data. CAs that are passionate about this
could have supported such efforts in the Forum to provide this information,
or could have demonstrated doing so on their own. I don't think it would at
all be productive to discuss these situations in abstract hypotheticals, as
some of the discussions here try to do - without data, that would be an
extremely unproductive use of time.


> As a general comment, IMHO when we talk about RP risk when a CA issues a
> Certificate with -say- longer than 64 characters in an OU field, that
> would only pose risk to Relying Parties *that want to interact with that
> particular Subscriber*, not the entire Internet.


No. This is demonstrably and factually wrong.

First, we already know that technical errors are a strong sign that the
policies and practices themselves are not being followed - both the
validation activities and the issuance activities result from the CA
following it's practices and procedures. If a CA is not following its
practices and procedures, that's a security risk to the Internet, full stop.

Second, it presumes (incorrectly) that interoperability is not something
valuable. That is, if say the three existing, most popular implementations
all do not check whether or not it's longer than 64 characters (for
example), and a fourth implementation would like to come along, they cannot
read the relevant standards and implement something interoperable. This is
because 'interoperability' is being redefined as 'ignoring' the standard -
which defeats the purposes of standards to begin with. These choices - to
permit deviations - creates risks for the entire ecosystem, because there's
no longer interoperability. This is equally captured in
https://tools.ietf.org/html/draft-iab-protocol-maintenance-01

The premise to all of this is that "CAs shouldn't have to follow rules,
browsers should just enforce them," which is shocking and unfortunate. It's
like saying "It's OK to lie about whatever you want, as long as you don't
get caught" - no, that line of thinking is just as problematic for morality
as it is for technical interoperability. CAs that routinely violate the
standards create risk, because they have full trust on the Internet. If the
argument is that the CA's actions (of accidentally or deliberately
introducing risk) is the problem, but that we shouldn't worry about
correcting the individual certificate, that entirely misses the point that
without correcting the certificate, there's zero incentive to actually
follow the standards, and as a result, that creates risk for everyone.
Revocation, if you will, is the "less worse" alternative to complete
distrust - it only affects that single certificate, rather than every one
of the certificates the CA has issued. The alternative - not revoking -
simply says that it's better to look at distrust options, and that's more
risk for everyone.

Finally, CAs are terrible at assessing the risk to RPs. For example,
negative serial numbers were prolific prior to the linters, and those have
issues in as much as they are, for some systems, irrevocable. This is
because those systems implemented the standards correctly - serials are
positive INTEGERs - yet had to account for the fact that CAs are improperly
encoding them, such as by "making" them positive (adding the leading zero).
This leading zero then doesn't get stripped off when looking up by Issuer &
Serial Number, because they're using the "spec-correct" serial rather than
the "issuer-broken" serial. That's an example where the certificate
"works", no report 

Re: Incident report D-TRUST: syntax error in one tls certificate

2018-11-28 Thread Dimitris Zacharopoulos via dev-security-policy



On 29/11/2018 12:14 π.μ., Wayne Thayer via dev-security-policy wrote:

The way that we currently handle these types of issues is about as good as
we're going to get. We have a [recently relaxed but still] fairly stringent
set of rules around revocation in the BRs. This is necessary and proper
because slow/delayed revocation can clearly harm our users. It was
difficult to gain consensus within the CAB Forum on allowing even 5 days in
some circumstances - I'm confident that something like 28 days would be a
non-starter. I'm also confident that CAs will always take the entire time
permitted to perform revocations, regardless of the risk, because it is in
their interest to do so (that is not mean to be a criticism of CAs so much
as a statement that CAs exist to serve their customers, not our users). I'm
also confident that any attempt to define "low risk" misissuance would just
incentivize CAs to stop treating misissuance as a serious offense and we'd
be back to where we were prior to the existence of linters..

CAs obviously do choose to violate the revocation time requirements. I do
not believe this is generally based on a thorough risk analysis, but in
practice it is clear that they do have some discretion. I am not aware of a
case (yet) in which Mozilla has punished a CA solely for violating a
revocation deadline. When that happens, the violation is documented in a
bug and should appear on the CA's next audit report/attestation statement.
>From there, the circumstances (how many certs?, what was the issue?, was it
previously documented?, is this a pattern of behavior?) have to be
considered on a case-by-case basis to decide a course of action. I realize
that this is not a very satisfying answer to the questions that are being
raised, but I do think it's the best answer.

- Wayne


Mandating that CAs disclose revocation situations that exceed the 5-day 
requirement with some risk analysis information, might be a good place 
to start. Of course, this should be independent of a "mis-issuance 
incident report". By collecting this information, Mozilla would be in a 
better position to evaluate the challenges CAs face with revocations 
*initiated by the CA* without adequate warning to the Subscriber. I 
don't consider 5 days (they are not even working days) to be adequate 
warning period to a large organization with slow reflexes and long 
procedures. Once Mozilla collects more information, you might be able to 
see possible patterns in various CAs and decide what is acceptable and 
what is not, and create policy rules accordingly.


For example, if many CAs violate the 5-day rule for revocations related 
to improper subject information encoding, out of range, wrong syntax and 
that sort, Mozilla or the BRs might decide to have a separate category 
with a different time frame and/or different actions.


It is not the first time we talk about this and it might be worth 
exploring further.


As a general comment, IMHO when we talk about RP risk when a CA issues a 
Certificate with -say- longer than 64 characters in an OU field, that 
would only pose risk to Relying Parties *that want to interact with that 
particular Subscriber*, not the entire Internet. These RPs *might* 
encounter compatibility issues depending on their browser and will 
either contact the Subscriber and notify them that their web site 
doesn't work or they will do nothing. It's similar to a situation where 
a site operator forgets to send the intermediate CA Certificate in the 
chain. These particular RPs will fail to get TLS working when they visit 
the Subscriber's web site.



Dimitris.




On Wed, Nov 28, 2018 at 1:10 PM Nick Lamb via dev-security-policy <
dev-security-policy@lists.mozilla.org> wrote:


On Mon, 26 Nov 2018 18:47:25 -0500
Ryan Sleevi via dev-security-policy
 wrote:

CAs have made the case - it was not accepted.

On a more fundamental and philosophical level, I think this is
well-intentioned but misguided. Let's consider that the issue is one
that the CA had the full power-and-ability to prevent - namely, they
violated the requirements and misissued. A CA is only in this
situation if they are a bad CA - a good CA will never run the risk of
"annoying" the customer.

I would sympathise with this position if we were considering, say, a
problem that had caused a CA to issue certs with the exact same mistake
for 18 months, rather than, as I understand here, a single certificate.

Individual human errors are inevitable at a "good CA". We should not
design systems, including policy making, that assume all errors will be
prevented because that contradicts the assumption that human error is
inevitable. Although it is often used specifically to mean operator
error, human error can be introduced anywhere. A requirements document
which erroneously says a particular Unicode codepoint is permitted in a
field when it should be forbidden is still human error. A department
head who feels tired and signs off on a piece of work that 

Re: Incident report D-TRUST: syntax error in one tls certificate

2018-11-28 Thread Wayne Thayer via dev-security-policy
The way that we currently handle these types of issues is about as good as
we're going to get. We have a [recently relaxed but still] fairly stringent
set of rules around revocation in the BRs. This is necessary and proper
because slow/delayed revocation can clearly harm our users. It was
difficult to gain consensus within the CAB Forum on allowing even 5 days in
some circumstances - I'm confident that something like 28 days would be a
non-starter. I'm also confident that CAs will always take the entire time
permitted to perform revocations, regardless of the risk, because it is in
their interest to do so (that is not mean to be a criticism of CAs so much
as a statement that CAs exist to serve their customers, not our users). I'm
also confident that any attempt to define "low risk" misissuance would just
incentivize CAs to stop treating misissuance as a serious offense and we'd
be back to where we were prior to the existence of linters..

CAs obviously do choose to violate the revocation time requirements. I do
not believe this is generally based on a thorough risk analysis, but in
practice it is clear that they do have some discretion. I am not aware of a
case (yet) in which Mozilla has punished a CA solely for violating a
revocation deadline. When that happens, the violation is documented in a
bug and should appear on the CA's next audit report/attestation statement.
>From there, the circumstances (how many certs?, what was the issue?, was it
previously documented?, is this a pattern of behavior?) have to be
considered on a case-by-case basis to decide a course of action. I realize
that this is not a very satisfying answer to the questions that are being
raised, but I do think it's the best answer.

- Wayne

On Wed, Nov 28, 2018 at 1:10 PM Nick Lamb via dev-security-policy <
dev-security-policy@lists.mozilla.org> wrote:

> On Mon, 26 Nov 2018 18:47:25 -0500
> Ryan Sleevi via dev-security-policy
>  wrote:
> > CAs have made the case - it was not accepted.
> >
> > On a more fundamental and philosophical level, I think this is
> > well-intentioned but misguided. Let's consider that the issue is one
> > that the CA had the full power-and-ability to prevent - namely, they
> > violated the requirements and misissued. A CA is only in this
> > situation if they are a bad CA - a good CA will never run the risk of
> > "annoying" the customer.
>
> I would sympathise with this position if we were considering, say, a
> problem that had caused a CA to issue certs with the exact same mistake
> for 18 months, rather than, as I understand here, a single certificate.
>
> Individual human errors are inevitable at a "good CA". We should not
> design systems, including policy making, that assume all errors will be
> prevented because that contradicts the assumption that human error is
> inevitable. Although it is often used specifically to mean operator
> error, human error can be introduced anywhere. A requirements document
> which erroneously says a particular Unicode codepoint is permitted in a
> field when it should be forbidden is still human error. A department
> head who feels tired and signs off on a piece of work that actually
> didn't pass tests, still human error.
>
> In true failure-is-death scenarios like fly-by-wire aircraft controls
> this assumption means extraordinary methods must be used in order to
> minimise the risk of inevitable human error resulting in real world
> systems failure. Accordingly the resulting systems are exceptionally
> expensive. Though the Web PKI is important, we should not imagine for
> ourselves that it warrants this degree of care and justifies this level
> of expense even at a "good CA".
>
> What we can require in policy - and as I understand it Mozilla policy
> does require - is that the management (also humans) take steps to
> report known problems and prevent them from recurring. This happened
> here.
>
> > This presumes that the customer cannot take steps to avoid this.
> > However, as suggested by others, the customer could have minimized or
> > eliminated annoyance, such as by ensuring they have a robust system
> > to automate the issuance/replacement of certificates. That they
> > didn't is an operational failure on their fault.
>
> I agree with this part.
>
> > This presumes that there is "little or no risk to relying parties."
> > Unfortunately, they are by design not a stakeholder in those
> > conversations
>
> It does presume this, and I've seen no evidence to the contrary. Also I
> think I am in fact a stakeholder in this conversation anyway?
>
> > I agree that it's entirely worthless the increasingly implausible
> > "important" revocations. I think a real and meaningful solution is
> > what is being more consistently pursued, and that's to distrust CAs
> > that are not adhering to the set of expectations.
>
> I don't think root distrust is an appropriate response, in the current
> state, to a single incident of this nature, this sort of thing is,
> indeed, why you may 

Re: Incident report D-TRUST: syntax error in one tls certificate

2018-11-28 Thread Jakob Bohm via dev-security-policy
On 27/11/2018 00:54, Ryan Sleevi wrote:
> On Mon, Nov 26, 2018 at 12:12 PM Jakob Bohm via dev-security-policy <
> dev-security-policy@lists.mozilla.org> wrote:
> 
>> 1. Having a spare certificate ready (if done with proper security, e.g.
>> a separate key) from a different CA may unfortunately conflict with
>> badly thought out parts of various certificate "pinning" standards.
>>
> 
> You blame the standards, but that seems an operational risk that the site
> (knowingly) took. That doesn't make a compelling argument.
> 

I blame those standards for forcing every site to choose between two 
unfortunate risks, in this case either the risks prevented by those 
"pinning" mechanisms and the risks associated with having only one 
certificate.

The fact that sites are forced to make that choice makes it unfair to 
presume they should always choose to prevent whichever risk is discussed 
in a given context.  Groups discussing other risks could equally unfairly 
blame sites for not using one of those "pinning" mechanims.

> 
>> 2. Being critical from a society perspective (e.g. being the contact
>> point for a service to help protect the planet), doesn't mean that the
>> people running such a service can be expected to be IT superstars
>> capable of dealing with complex IT issues such as unscheduled
>> certificate replacement due to no fault of their own.
>>
> 
> That sounds like an operational risk the site (knowingly) took. Solutions
> for automation exist, as do concepts such as "hiring multiple people"
> (having a NOC/SOC). I see nothing to argue that a single person is somehow
> the risk here.
> 

The number of people in the world who can do this is substantially 
smaller than the number of sites that might need them.  We must 
therefore, by necessity, accept that some such sites will not hire such 
people, or worse multiple such people for their own exclusive use.

Automating certificate deployment (as you often suggest) lowers 
operational security, as it necessarily grants read/write access to 
the certificate data (including private key) to an automated, online, 
unsupervised system.

Allowing multiple persons to replace the certificates also lowers 
operational security, as it (by definition) grants multiple persons 
read/write access to the certificate data.

Under the current and past CA model, certificate and private key 
replacement is a rare (once/2 years) operation that can be done 
manually and scheduled weeks in advance, except for unexpected 
failures (such as a CA messing up).


> 
>> 3. Not every site can be expected to have the 24/7 staff on hand to do
>> "top security credentials required" changes, for example a high-
>> security end site may have a rule that two senior officials need to
>> sign off on any change in cryptographic keys and certificates, while a
>> limited-staff end-site may have to schedule a visit from their outside
>> security consultant to perform the certificate replacement.
>>
> 
> This is exactly describing a known risk that the site took, accepting the
> tradeoffs. I fail to see a compelling argument that there should be no
> tradeoffs - given the harm presented to the ecosystem - and if sites want
> to make such policies, rather than promoting automation and CI/CD, then it
> seems that's a risk they should bear and make an informed choice.
> 

The trade off would have been made against the risk of the site itself 
mishandling its private key (e.g. a site breach).  Not against force 
majeure situations such as a CA recalling a certificate out of turn.

It is generally not fair to say "that we may impose a difficult 
situation is a risk that the site took".

> Thus I would be all for an official BR ballot to clarify/introduce
>> that 24 hour revocation for non-compliance doesn't apply to non-
>> dangerous technical violations.
>>
> 
> As discussed elsewhere, there is no such thing as "non-dangerous technical
> violations". It is a construct, much like "clean coal", that has an
> appealing turn of phrase, but without the evidence to support it.
> 

That is simply not true.  The case at hand is a very good example, as 
the problem is that a text field used only for display purposes by 
current software, and generally requiring either human interpretation or 
yet-to-be-defined parseable definitions, was given an out-of-range 
value.

Unless someone can point out a real-world piece of production software 
which causes security problems when presented with the particular out-
of-range value, or that the particular out-of-range value would 
reasonably mislead human relying parties, than dangers are entirely 
hypothetical and/or political.

> 
>> Another category that would justify a longer CA response time would be a
>> situation where a large batch of certificates need to be revalidated due
>> to a weakness in validation procedures (such as finding out that a
>> validation method had a vulnerability, but not knowing which if any of

Re: Incident report D-TRUST: syntax error in one tls certificate

2018-11-28 Thread Pedro Fuentes via dev-security-policy
Hi Rufus,
I got internal server error on that link, but I really appreciate your post and 
the link to code!
Pedro

El miércoles, 28 de noviembre de 2018, 8:45:42 (UTC+1), Buschart, Rufus  
escribió:
> To simplify the process of monitoring crt.sh, we at Siemens have implemented 
> a little web service which directly queries crt.sh DB and returns the errors 
> as JSON. By this you don't have to parse HTML files and can directly 
> integrate it into your monitoring. Maybe this function is of interest for 
> some other CA:
> 
> https://eo0kjkxapi.execute-api.eu-central-1.amazonaws.com/prod/crtsh-monitor?caID=52410=30=false
> 
> To monitor your CA, replace the caID with your CA's ID from crt.sh. In case 
> you receive an endpoint time-out message, try again, crt.sh DB often returns 
> time outs. For more details or function requests, have a look into its GitHub 
> repo: https://github.com/RufusJWB/crt.sh-monitor
> 
> 
> With best regards,
> Rufus Buschart
> 
> Siemens AG
> Information Technology
> Human Resources
> PKI / Trustcenter
> GS IT HR 7 4
> Hugo-Junkers-Str. 9
> 90411 Nuernberg, Germany 
> Tel.: +49 1522 2894134
> mailto:rufus.busch...@siemens.com
> www.twitter.com/siemens
> 
> www.siemens.com/ingenuityforlife
> 
> Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Jim Hagemann 
> Snabe; Managing Board: Joe Kaeser, Chairman, President and Chief Executive 
> Officer; Roland Busch, Lisa Davis, Klaus Helmrich, Janina Kugel, Cedrik 
> Neike, Michael Sen, Ralf P. Thomas; Registered offices: Berlin and Munich, 
> Germany; Commercial registries: Berlin Charlottenburg, HRB 12300, Munich, HRB 
> 6684; WEEE-Reg.-No. DE 23691322
> 
> > -Ursprüngliche Nachricht-
> > Von: dev-security-policy  Im 
> > Auftrag von Enrico Entschew via dev-security-policy
> > Gesendet: Dienstag, 27. November 2018 18:17
> > An: mozilla-dev-security-pol...@lists.mozilla.org
> > Betreff: Re: Incident report D-TRUST: syntax error in one tls certificate
> > 
> > Am Montag, 26. November 2018 18:34:38 UTC+1 schrieb Jakob Bohm:
> > 
> > > In addition to this, would you add the following:
> > >
> > > - Daily checks of crt.sh (or some other existing tool) if  additional
> > > such certificates are erroneously issued before  the automated
> > > countermeasures are in place?
> > 
> > Thank you, Jakob. This is what we intended to do. We are monitoring crt.sh 
> > at least twice daily every day from now on.
> > 
> > As to your other point, we do restrict the serial number element and the 
> > error occurred precisely in defining the constraints for this
> > field. As mentioned above, we plan to make adjustments to our systems to 
> > prevent this kind of error in future.
> > ___
> > dev-security-policy mailing list
> > dev-security-policy@lists.mozilla.org
> > https://lists.mozilla.org/listinfo/dev-security-policy
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


AW: Incident report D-TRUST: syntax error in one tls certificate

2018-11-27 Thread Buschart, Rufus via dev-security-policy
To simplify the process of monitoring crt.sh, we at Siemens have implemented a 
little web service which directly queries crt.sh DB and returns the errors as 
JSON. By this you don't have to parse HTML files and can directly integrate it 
into your monitoring. Maybe this function is of interest for some other CA:

https://eo0kjkxapi.execute-api.eu-central-1.amazonaws.com/prod/crtsh-monitor?caID=52410=30=false

To monitor your CA, replace the caID with your CA's ID from crt.sh. In case you 
receive an endpoint time-out message, try again, crt.sh DB often returns time 
outs. For more details or function requests, have a look into its GitHub repo: 
https://github.com/RufusJWB/crt.sh-monitor


With best regards,
Rufus Buschart

Siemens AG
Information Technology
Human Resources
PKI / Trustcenter
GS IT HR 7 4
Hugo-Junkers-Str. 9
90411 Nuernberg, Germany 
Tel.: +49 1522 2894134
mailto:rufus.busch...@siemens.com
www.twitter.com/siemens

www.siemens.com/ingenuityforlife

Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Jim Hagemann 
Snabe; Managing Board: Joe Kaeser, Chairman, President and Chief Executive 
Officer; Roland Busch, Lisa Davis, Klaus Helmrich, Janina Kugel, Cedrik Neike, 
Michael Sen, Ralf P. Thomas; Registered offices: Berlin and Munich, Germany; 
Commercial registries: Berlin Charlottenburg, HRB 12300, Munich, HRB 6684; 
WEEE-Reg.-No. DE 23691322

> -Ursprüngliche Nachricht-
> Von: dev-security-policy  Im 
> Auftrag von Enrico Entschew via dev-security-policy
> Gesendet: Dienstag, 27. November 2018 18:17
> An: mozilla-dev-security-pol...@lists.mozilla.org
> Betreff: Re: Incident report D-TRUST: syntax error in one tls certificate
> 
> Am Montag, 26. November 2018 18:34:38 UTC+1 schrieb Jakob Bohm:
> 
> > In addition to this, would you add the following:
> >
> > - Daily checks of crt.sh (or some other existing tool) if  additional
> > such certificates are erroneously issued before  the automated
> > countermeasures are in place?
> 
> Thank you, Jakob. This is what we intended to do. We are monitoring crt.sh at 
> least twice daily every day from now on.
> 
> As to your other point, we do restrict the serial number element and the 
> error occurred precisely in defining the constraints for this
> field. As mentioned above, we plan to make adjustments to our systems to 
> prevent this kind of error in future.
> ___
> dev-security-policy mailing list
> dev-security-policy@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-security-policy
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Incident report D-TRUST: syntax error in one tls certificate

2018-11-27 Thread Enrico Entschew via dev-security-policy
Am Montag, 26. November 2018 18:34:38 UTC+1 schrieb Jakob Bohm:

> In addition to this, would you add the following:
> 
> - Daily checks of crt.sh (or some other existing tool) if 
>  additional such certificates are erroneously issued before 
>  the automated countermeasures are in place?

Thank you, Jakob. This is what we intended to do. We are monitoring crt.sh at 
least twice daily every day from now on.

As to your other point, we do restrict the serial number element and the error 
occurred precisely in defining the constraints for this field. As mentioned 
above, we plan to make adjustments to our systems to prevent this kind of error 
in future. 
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Incident report D-TRUST: syntax error in one tls certificate

2018-11-26 Thread Ryan Sleevi via dev-security-policy
On Mon, Nov 26, 2018 at 12:12 PM Jakob Bohm via dev-security-policy <
dev-security-policy@lists.mozilla.org> wrote:

> 1. Having a spare certificate ready (if done with proper security, e.g.
>a separate key) from a different CA may unfortunately conflict with
>badly thought out parts of various certificate "pinning" standards.
>

You blame the standards, but that seems an operational risk that the site
(knowingly) took. That doesn't make a compelling argument.


> 2. Being critical from a society perspective (e.g. being the contact
>point for a service to help protect the planet), doesn't mean that the
>people running such a service can be expected to be IT superstars
>capable of dealing with complex IT issues such as unscheduled
>certificate replacement due to no fault of their own.
>

That sounds like an operational risk the site (knowingly) took. Solutions
for automation exist, as do concepts such as "hiring multiple people"
(having a NOC/SOC). I see nothing to argue that a single person is somehow
the risk here.


> 3. Not every site can be expected to have the 24/7 staff on hand to do
>"top security credentials required" changes, for example a high-
>security end site may have a rule that two senior officials need to
>sign off on any change in cryptographic keys and certificates, while a
>limited-staff end-site may have to schedule a visit from their outside
>security consultant to perform the certificate replacement.
>

This is exactly describing a known risk that the site took, accepting the
tradeoffs. I fail to see a compelling argument that there should be no
tradeoffs - given the harm presented to the ecosystem - and if sites want
to make such policies, rather than promoting automation and CI/CD, then it
seems that's a risk they should bear and make an informed choice.

Thus I would be all for an official BR ballot to clarify/introduce
> that 24 hour revocation for non-compliance doesn't apply to non-
> dangerous technical violations.
>

As discussed elsewhere, there is no such thing as "non-dangerous technical
violations". It is a construct, much like "clean coal", that has an
appealing turn of phrase, but without the evidence to support it.


> Another category that would justify a longer CA response time would be a
> situation where a large batch of certificates need to be revalidated due
> to a weakness in validation procedures (such as finding out that a
> validation method had a vulnerability, but not knowing which if any of
> the validated identities were actually fake).  For example to recheck a
> typical domain-control method, a CA would have to ask each certificate
> holder to respond to a fresh challenge (lots of manual work by end
> sites), then do the actual check (automated).


Like the other examples, this is not at all compelling. Solutions exist to
mitigate this risk entirely. CAs and their Subscribers that choose not to
avail themselves of these methods - for whatever the reason - are making an
informed market choice about these. If they're not informed, that's on the
CAs. If they are making the choice, that's on the Subscribers.

There's zero reason to change, especially when such revalidation can be,
and is, being done automatically.
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Incident report D-TRUST: syntax error in one tls certificate

2018-11-26 Thread Ryan Sleevi via dev-security-policy
On Mon, Nov 26, 2018 at 10:31 AM Nick Lamb via dev-security-policy <
dev-security-policy@lists.mozilla.org> wrote:

> CA/B is the right place for CAs to make the case for a general rule about
> giving themselves more time to handle technical non-compliances whose
> correct resolution will annoy customers but impose little or no risk to
> relying parties,
>

CAs have made the case - it was not accepted.

On a more fundamental and philosophical level, I think this is
well-intentioned but misguided. Let's consider that the issue is one that
the CA had the full power-and-ability to prevent - namely, they violated
the requirements and misissued. A CA is only in this situation if they are
a bad CA - a good CA will never run the risk of "annoying" the customer.

This also presumes that "annoyance" of the subscriber is a bad thing - but
this is also wrong. If we accept that CAs are differentiated based on
security, then a CA that regularly misissues and annoys its customers is a
CA that will lose customers. This is, arguably, better than the
alternative, which is to remove trust in a CA entirely, which will annoy
all of its customers.

This presumes that the customer cannot take steps to avoid this. However,
as suggested by others, the customer could have minimized or eliminated
annoyance, such as by ensuring they have a robust system to automate the
issuance/replacement of certificates. That they didn't is an operational
failure on their fault.

This presumes that there is "little or no risk to relying parties."
Unfortunately, they are by design not a stakeholder in those conversations
- the stakeholders are the CA and the Subscriber, both of which are
incentivized to do nothing (it avoids annoying the customer for the CA, it
avoids having to change for the customer). This creates the tragedy of the
commons that we absolutely saw result from browsers not regularly enforcing
compliance on CAs - areas of technical non-compliance that prevented
developing interoperable solutions from the spec, which required all sorts
of hacks, which then subsequently introduced security issues. This is not a
'broken windows' argument so much as a statement of the demonstrable
reality we lived in prior to Amazon's development and publication of
linting tools that simplified compliance and enforcement, and the
subsequent improvements by ZLint.

Conceptually, this is similar to an ISP that regularly cuts its own
backbone cables or publishes bad routes. By ensuring that the system
consistently functions as designs - and that the CA follows their own
stated practices and procedures and revokes everything that doesn't - the
disruption is entirely self-inflicted and avoidable, and the market can be
left to correct for that.


> I personally at least would much rather see CAs actually formally agree
> they should all have say 28 days in such cases - even though that's surely
> far longer than it should be - than a series of increasingly implausible
> "important" but ultimately purely self-serving undocumented exceptions that
> make the rules on paper worthless.
>

I disagree that encouraging regulatory capture (and the CA/Browser Forum
doesn't work by formal agreement of CAs, nor does it alter root program
expectations) is the solution here.

I agree that it's entirely worthless the increasingly implausible
"important" revocations. I think a real and meaningful solution is what is
being more consistently pursued, and that's to distrust CAs that are not
adhering to the set of expectations. There's no reason to believe the
"impact" argument, particularly when it's one that both the Subscriber and
the CA can and should have avoided, and CAs that continue to make that
argument are increasingly showing that they're not working in the best
interests of Relying Parties (see above) or Subscribers (by "annoying" them
or lying to them), and that's worthy of distrust.
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Incident report D-TRUST: syntax error in one tls certificate

2018-11-26 Thread Jakob Bohm via dev-security-policy
On 23/11/2018 16:24, Enrico Entschew wrote:
> This post links to https://bugzilla.mozilla.org/show_bug.cgi?id=1509512
> 
> syntax error in one tls certificate
> 
> 1. How your CA first became aware of the problem (e.g. via a problem report 
> submitted to your Problem Reporting Mechanism, a discussion in 
> mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the 
> time and date.
> 
> We became aware of the issue via https://crt.sh/ on 2018-11-12, 09:01 UTC.
> 
> 2. A timeline of the actions your CA took in response. A timeline is a 
> date-and-time-stamped sequence of all relevant events. This may include 
> events before the incident was reported, such as when a particular 
> requirement became applicable, or a document changed, or a bug was 
> introduced, or an audit was done.
> 
> Timeline:
> 2018-11-12, 09:01 UTC CA became aware via https://crt.sh/ of a syntax error 
> in one tls certificate issued on 2018-06-02.  The PrintableString of OBJECT 
> IDENTIFIER serialNumber (2 5 4 5) contains an invalid character. For more 
> details see https://crt.sh/?id=514472818
> 2018-11-12, 09:30 UTC CA Security Issues task force analyzed the error and 
> recommended further procedure.
> 2018-11-12, 10:30 UTC Customer was contacted the first time. Customer runs an 
> international critical trade platform for emissions. Immediate revocation of 
> the certificate would cause irreparable harm to the public.
> 2018-11-12, 13:00 UTC We performed  a dedicated  additionally coaching on 
> this specific syntax topic within the validation team to avoid this kind of 
> error in the future.
> 2018-11-16, 08:40 UTC Customer responded first time and asked for more time 
> to evaluate the certificate replacement process.
> 2018-11-19, 12:30 UTC CA informed the auditor TÜV-IT about the issue.
> 2018-11-20, 15:19 UTC Customer declared to replace the certificate on 
> 2018-11-22 latest.
> 2018-11-22, 15:52 UTC New certificate has been applied for and has been 
> issued.
> 2018-11-22, 16:08 UTC The certificate with the serial number 3c 7c fb bf ea 
> 35 a8 96 c6 79 c6 5c 82 ec 40 13 was revoked by customer.
> 
> 3. Whether your CA has stopped, or has not yet stopped, issuing certificates 
> with the problem. A statement that you have will be considered a pledge to 
> the community; a statement that you have not requires an explanation.
> 
> The CA has not stopped issuing EV-certificates. We applied dedicated coaching 
> on this specific syntax topic within the validation team to avoid this kind 
> of error until software adjustments to both effected systems have been 
> completed.
> 
> 4. A summary of the problematic certificates. For each problem: number of 
> certs, and the date the first and last certs with that problem were issued.
> 
> 1 Certificate
> SHA-256 41F3AD0CBDA392F078D776FD1CDC0E35F7AF61030C56C7B26B95936F41A83B32
> Issued on 2018-06-01
> 
> 5. The complete certificate data for the problematic certificates. The 
> recommended way to provide this is to ensure each certificate is logged to CT 
> and then list the fingerprints or crt.sh IDs, either in the report or as an 
> attached spreadsheet, with one list per distinct problem.
> 
> For more details see https://crt.sh/?id=514472818
> 
> 6. Explanation about how and why the mistakes were made or bugs introduced, 
> and how they avoided detection until now.
> 
> This problem was caused within the frontend system to the customer and the 
> lint system. Both systems did not check the entry in the field of 
> serialNumber (2 5 4 5) correctly. It was possible to enter characters other 
> than defined in PrintableString definition.
> 
> 7. List of steps your CA is taking to resolve the situation and ensure such 
> issuance will not be repeated in the future, accompanied with a timeline of 
> when your CA expects to accomplish these things.
> 
> The CA Security Issues task force together with the software development 
> analyzed the error. We applied dedicated coaching on this specific syntax 
> topic within the validation team to avoid this kind of error until software 
> adjustments to both effected systems have been completed.  The changes in the 
> systems are expected to go live in early January 2019.
> 

In addition to this, would you add the following:

- Daily checks of crt.sh (or some other existing tool) if 
 additional such certificates are erroneously issued before 
 the automated countermeasures are in place?

- Procedurally (and eventually technically) restrict the serial number 
 element to actual validated identification numbers from a fixed set of 
 databases for each jurisdiction.  For example for a Bundesamt, this 
 should be a special prefix followed by some kind of official 
 identifying number of entities within the Bundesvervaltung.  Similar of 
 cause for Landesamts, companies etc.
  Also, it is unclear why a Bundesamt belongs to an identification 
 jurisdiction lower than the entire BDR.
  For comparison, Danish Company entities 

Re: Incident report D-TRUST: syntax error in one tls certificate

2018-11-26 Thread Jakob Bohm via dev-security-policy

On 26/11/2018 16:31, Nick Lamb wrote:
In common with others who've responded to this report I am very 
skeptical about the contrast between the supposed importance of this 
customer's systems versus their, frankly, lackadaisical technical response.


This might all seem harmless but it ends up as "the boy who cried wolf". 
If you relay laughable claims from customers several times, when it 
comes to an incident where maybe some extraordinary delay was 
justifiable any good will is already used up by the prior claims.


CA/B is the right place for CAs to make the case for a general rule 
about giving themselves more time to handle technical non-compliances 
whose correct resolution will annoy customers but impose little or no 
risk to relying parties, I personally at least would much rather see CAs 
actually formally agree they should all have say 28 days in such cases - 
even though that's surely far longer than it should be - than a series 
of increasingly implausible "important" but ultimately purely 
self-serving undocumented exceptions that make the rules on paper worthless.


It should be noted that the counter-measures that some posts have
expected of the end-site in question may not always be realistic
(Speaking generally, as I have not data on the specifics of this end-
site):

1. Having a spare certificate ready (if done with proper security, e.g.
  a separate key) from a different CA may unfortunately conflict with
  badly thought out parts of various certificate "pinning" standards.

2. Being critical from a society perspective (e.g. being the contact
  point for a service to help protect the planet), doesn't mean that the
  people running such a service can be expected to be IT superstars
  capable of dealing with complex IT issues such as unscheduled
  certificate replacement due to no fault of their own.

3. Not every site can be expected to have the 24/7 staff on hand to do
  "top security credentials required" changes, for example a high-
  security end site may have a rule that two senior officials need to
  sign off on any change in cryptographic keys and certificates, while a
  limited-staff end-site may have to schedule a visit from their outside
  security consultant to perform the certificate replacement.

Thus I would be all for an official BR ballot to clarify/introduce
that 24 hour revocation for non-compliance doesn't apply to non-
dangerous technical violations.

Another category that would justify a longer CA response time would be a
situation where a large batch of certificates need to be revalidated due
to a weakness in validation procedures (such as finding out that a
validation method had a vulnerability, but not knowing which if any of
the validated identities were actually fake).  For example to recheck a
typical domain-control method, a CA would have to ask each certificate
holder to respond to a fresh challenge (lots of manual work by end
sites), then do the actual check (automated).



Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Incident report D-TRUST: syntax error in one tls certificate

2018-11-26 Thread Nick Lamb via dev-security-policy
In common with others who've responded to this report I am very skeptical about the contrast between the supposed importance of this customer's systems versus their, frankly, lackadaisical technical response.This might all seem harmless but it ends up as "the boy who cried wolf". If you relay laughable claims from customers several times, when it comes to an incident where maybe some extraordinary delay was justifiable any good will is already used up by the prior claims.CA/B is the right place for CAs to make the case for a general rule about giving themselves more time to handle technical non-compliances whose correct resolution will annoy customers but impose little or no risk to relying parties, I personally at least would much rather see CAs actually formally agree they should all have say 28 days in such cases - even though that's surely far longer than it should be - than a series of increasingly implausible "important" but ultimately purely self-serving undocumented exceptions that make the rules on paper worthless.___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Incident report D-TRUST: syntax error in one tls certificate

2018-11-26 Thread Gijs Kruitbosch via dev-security-policy

(for the avoidance of doubt: posting in a personal capacity)

On 23/11/2018 15:24, Enrico Entschew wrote:

Timeline:
2018-11-12, 10:30 UTC Customer was contacted the first time. Customer runs an 
international critical trade platform for emissions. Immediate revocation of 
the certificate would cause irreparable harm to the public.



2018-11-22, 16:08 UTC The certificate with the serial number 3c 7c fb bf ea 35 
a8 96 c6 79 c6 5c 82 ec 40 13 was revoked by customer.


Some questions I have:

1) Don't the BR specify CAs MUST revoke within 24 hours (for some 
issues) or 5 days (for others)? This looks like just over 10 days, and 
was customer-prompted as opposed to set by the CA, it seems. Am I just 
missing the part of the BRs that says ignoring the 5 days is OK if it's 
"just" a syntax error?


2) what procedure does D-TRUST follow to ensure adequate revocation 
times, and in particular, under what circumstances does it decide that 
not revoking until the customer gives an OK is necessary (e.g. how does 
it decide what constitutes an "international[ly] critical" site)? Is 
this documented, e.g. in CPS or similar? Have auditors signed off on that?


3) can you elaborate on the system being down causing "irreparable 
harm"? What would have happened if the cert had just been revoked after 
24/120 hours? In this case, the website in question ( www.dehst.de ) has 
been broken in Firefox for the past 64 or so hours (ie since about 6pm 
UK time on Friday, when I first read your message) because the server 
doesn't actually send the full chain of certs for its new certificate. 
Given that the server (AFAICT) doesn't staple OCSP responses, I don't 
imagine that practical breakage in a web browser would have been worse 
if the original cert had been revoked immediately, given the CRL 
revocation done last week hasn't appeared in CRLSet/OneCRL either.


~ Gijs

___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Incident report D-TRUST: syntax error in one tls certificate

2018-11-25 Thread Paul Léo Steinberg via dev-security-policy
> 2018-11-12, 09:01 UTC CA became aware via https://crt.sh/ of a syntax error 
> in one tls certificate issued on 2018-06-02.  The PrintableString of OBJECT 
> IDENTIFIER serialNumber (2 5 4 5) contains an invalid character. For more 
> details see https://crt.sh/?id=514472818
> 2018-11-12, 10:30 UTC Customer was contacted the first time. Customer runs an 
> international critical trade platform for emissions. Immediate revocation of 
> the certificate would cause irreparable harm to the public.
> 2018-11-22, 16:08 UTC The certificate with the serial number 3c 7c fb bf ea 
> 35 a8 96 c6 79 c6 5c 82 ec 40 13 was revoked by customer.

Going forward, if the platform is that important, have you advised the customer 
to have a second certificate from a different CA (with a different key) ready 
for emergencies? Being too big to fail seems like a really lame excuse for not 
planning ahead.

Additionally, if the platform's operation is critical, it would seem to be a 
good idea to apply an even stricter standard of security than mandated by the 
BR than to laxen it (revocation in more than 10 days instead of less than 24 
hours). E.g., it seems also like a bad idea, though permitted by BR, to issue 
certificates with a lifetime of around 2 years to such a service, while MUCH 
shorter lifetimes would seem more appropriate.

If, on the contrary, your are arguing that availability is more important than 
security, operating the service over unencrypted HTTP would be wise.
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Incident report D-TRUST: syntax error in one tls certificate

2018-11-23 Thread Enrico Entschew via dev-security-policy
This post links to https://bugzilla.mozilla.org/show_bug.cgi?id=1509512

syntax error in one tls certificate

1. How your CA first became aware of the problem (e.g. via a problem report 
submitted to your Problem Reporting Mechanism, a discussion in 
mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the 
time and date.

We became aware of the issue via https://crt.sh/ on 2018-11-12, 09:01 UTC.

2. A timeline of the actions your CA took in response. A timeline is a 
date-and-time-stamped sequence of all relevant events. This may include events 
before the incident was reported, such as when a particular requirement became 
applicable, or a document changed, or a bug was introduced, or an audit was 
done.

Timeline:
2018-11-12, 09:01 UTC CA became aware via https://crt.sh/ of a syntax error in 
one tls certificate issued on 2018-06-02.  The PrintableString of OBJECT 
IDENTIFIER serialNumber (2 5 4 5) contains an invalid character. For more 
details see https://crt.sh/?id=514472818
2018-11-12, 09:30 UTC CA Security Issues task force analyzed the error and 
recommended further procedure.
2018-11-12, 10:30 UTC Customer was contacted the first time. Customer runs an 
international critical trade platform for emissions. Immediate revocation of 
the certificate would cause irreparable harm to the public.
2018-11-12, 13:00 UTC We performed  a dedicated  additionally coaching on this 
specific syntax topic within the validation team to avoid this kind of error in 
the future.
2018-11-16, 08:40 UTC Customer responded first time and asked for more time to 
evaluate the certificate replacement process.  
2018-11-19, 12:30 UTC CA informed the auditor TÜV-IT about the issue.
2018-11-20, 15:19 UTC Customer declared to replace the certificate on 
2018-11-22 latest.
2018-11-22, 15:52 UTC New certificate has been applied for and has been issued.
2018-11-22, 16:08 UTC The certificate with the serial number 3c 7c fb bf ea 35 
a8 96 c6 79 c6 5c 82 ec 40 13 was revoked by customer.

3. Whether your CA has stopped, or has not yet stopped, issuing certificates 
with the problem. A statement that you have will be considered a pledge to the 
community; a statement that you have not requires an explanation.

The CA has not stopped issuing EV-certificates. We applied dedicated coaching 
on this specific syntax topic within the validation team to avoid this kind of 
error until software adjustments to both effected systems have been completed.

4. A summary of the problematic certificates. For each problem: number of 
certs, and the date the first and last certs with that problem were issued.

1 Certificate
SHA-256 41F3AD0CBDA392F078D776FD1CDC0E35F7AF61030C56C7B26B95936F41A83B32 
Issued on 2018-06-01

5. The complete certificate data for the problematic certificates. The 
recommended way to provide this is to ensure each certificate is logged to CT 
and then list the fingerprints or crt.sh IDs, either in the report or as an 
attached spreadsheet, with one list per distinct problem.

For more details see https://crt.sh/?id=514472818

6. Explanation about how and why the mistakes were made or bugs introduced, and 
how they avoided detection until now.

This problem was caused within the frontend system to the customer and the lint 
system. Both systems did not check the entry in the field of serialNumber (2 5 
4 5) correctly. It was possible to enter characters other than defined in 
PrintableString definition. 

7. List of steps your CA is taking to resolve the situation and ensure such 
issuance will not be repeated in the future, accompanied with a timeline of 
when your CA expects to accomplish these things.

The CA Security Issues task force together with the software development 
analyzed the error. We applied dedicated coaching on this specific syntax topic 
within the validation team to avoid this kind of error until software 
adjustments to both effected systems have been completed.  The changes in the 
systems are expected to go live in early January 2019.

Thank you
Enrico Entschew
D-TRUST GmbH
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy