Re: Linux messages full of `random: get_random_u32 called from`

2018-05-21 Thread Trent Piepho
On Fri, 2018-05-18 at 19:22 -0400, Theodore Y. Ts'o wrote:
> On Fri, May 18, 2018 at 10:56:18PM +, Trent Piepho wrote:
> > 
> > Let's look at what we're doing after this fix:
> > Want non-cryptographic random data for UUID, ask kernel for it.
> > Kernel has non-cryptographic random data, won't give it to us.
> > Wait one second for cryptographic random data, which we didn't need.
> > Give up and create our own random data, which is non-cryptographic and
> > even worse than what the kernel could have given us from the start.
> > 
> > util-linux falls back to rand() seeded with the pid, uid, tv_sec, and
> > tv_usec from gettimeofday().  Pretty bad on an embedded system with no
> > RTC and worse than what the kernel in crng_init 1 state can give us.
> 
> So what util-linux's libuuid could do is fall back to using
> /dev/urandom instead.  Whether or not you retry for a second before
> you fall back to /dev/urandom really depends on how important the
> second U in UUID ("unique") is to you.  If you use lower quality
> randomness, you can potentially risk getting non-unique UUID's.

Does it really matter how long one waits?  The fact that there is a
fallback that can be used would seem to provide a guarantee of
randomness/uniquness only as good as that fallback.

And here is the fallback, https://github.com/karelzak/util-linux/blob/m
aster/lib/randutils.c#L64

It doesn't seem all that great.  Can we say that the kernel, e.g.
urandom, can always provide random data at least as good as the above
without blocking?  If the kernel is always as good or better, then
what's the point of having the inferior fallback?

> If you don't worry leaking your computer's identity and the time when
> the UUID was generated, the application could also use the time-based
> UUID's.  There are privacy implications for doing so, it's not

libuuid will still ask for random data to initialize its clock file:

https://github.com/karelzak/util-linux/blob/master/libuuid/src/gen_uuid
.c#L281


> > It would seem to be a fact that there will be users of non-
> > cryptographic random data in early boot.  What is the best practice for
> > that?  To fall back to each user trying "to find randomly-looking
> > things on an 1990s Unix."  That doesn't seem good to me.  But what's
> > the better way?
> 
> We could add a new flag to getrandom(2), but application authors can
> just as easily fall back to using /dev/urandom.  The real concern I

I wouldn't say just as easily.  It's a more complex code path,
documented across multiple man pages and requires certain file system
access that getrandom() doesn't.  But it's certainly readily
achievable, so maybe that's good enough.  I think a flag to getrandom
would result in fewer mistakes in userspace code.

> have is application authors that actually *really* need cryptographic
> randomness, but they're too lazy to figure out a way to defer key
> generation until the last possible moment.

Would it be safe to say the the randutils code in util-linux would be
better off falling back to /dev/urandom instead of what it does?

If authors that really need cryptographic data use random_get_bytes()
or uuid_generate(), they'll get code that automatically falls back to
gettimeofday().  And probably not even know it.

I get your concern about lazy authors using an API that isn't
appropriate for their use case.

But we have this api already, in util-linux and code copied/inspired by
it, and it seems there are use cases where it is appropriate.  If we
make it better(*), then does the risk of it being used where it
shouldn't go up?

(*) Better: use the best available random data that can be provided
without blocking.


> There are other things we can do to add support in the bootloader to
> read an entropy state file and inject it into the kernel alongside the
> initrd and boot command line.  But that doesn't completely solve the
> problem; you still have to deal with the "frest from the factory,

This is problematic on a number of embedded platforms.

The bootloader might have no writable persistent storage to read/write 
this entropy from.  This requires drivers for the storage hardware,
ability to deal with the storage being in an inconsistent state, and
security of the storage.  Assuming hardware for writable storage even
exists.

So if I want u-boot to read/write an encrypted and authenticated flash
file system, there is a lot of code to put in the bootloader!  And now
we have to worry about that being exploited.  Maybe this means the
bootloader needs an encryption key that it didn't previous need have
access to.

Some systems have a limit on bootloader size and RAM.  Cyclone 5 is
64kB, which pretty much requires a two stage bootloader.  Arria 10 has
256kB and boots in a single stage, but bootloader features are quite
limited.  On imx23, it's possible to boot directly into linux with no
bootloader at all.  The cpu's rom can initialize the hardware enough to
run linux just from info in the mxs boot image 

Re: Linux messages full of `random: get_random_u32 called from`

2018-05-21 Thread Trent Piepho
On Fri, 2018-05-18 at 19:22 -0400, Theodore Y. Ts'o wrote:
> On Fri, May 18, 2018 at 10:56:18PM +, Trent Piepho wrote:
> > 
> > Let's look at what we're doing after this fix:
> > Want non-cryptographic random data for UUID, ask kernel for it.
> > Kernel has non-cryptographic random data, won't give it to us.
> > Wait one second for cryptographic random data, which we didn't need.
> > Give up and create our own random data, which is non-cryptographic and
> > even worse than what the kernel could have given us from the start.
> > 
> > util-linux falls back to rand() seeded with the pid, uid, tv_sec, and
> > tv_usec from gettimeofday().  Pretty bad on an embedded system with no
> > RTC and worse than what the kernel in crng_init 1 state can give us.
> 
> So what util-linux's libuuid could do is fall back to using
> /dev/urandom instead.  Whether or not you retry for a second before
> you fall back to /dev/urandom really depends on how important the
> second U in UUID ("unique") is to you.  If you use lower quality
> randomness, you can potentially risk getting non-unique UUID's.

Does it really matter how long one waits?  The fact that there is a
fallback that can be used would seem to provide a guarantee of
randomness/uniquness only as good as that fallback.

And here is the fallback, https://github.com/karelzak/util-linux/blob/m
aster/lib/randutils.c#L64

It doesn't seem all that great.  Can we say that the kernel, e.g.
urandom, can always provide random data at least as good as the above
without blocking?  If the kernel is always as good or better, then
what's the point of having the inferior fallback?

> If you don't worry leaking your computer's identity and the time when
> the UUID was generated, the application could also use the time-based
> UUID's.  There are privacy implications for doing so, it's not

libuuid will still ask for random data to initialize its clock file:

https://github.com/karelzak/util-linux/blob/master/libuuid/src/gen_uuid
.c#L281


> > It would seem to be a fact that there will be users of non-
> > cryptographic random data in early boot.  What is the best practice for
> > that?  To fall back to each user trying "to find randomly-looking
> > things on an 1990s Unix."  That doesn't seem good to me.  But what's
> > the better way?
> 
> We could add a new flag to getrandom(2), but application authors can
> just as easily fall back to using /dev/urandom.  The real concern I

I wouldn't say just as easily.  It's a more complex code path,
documented across multiple man pages and requires certain file system
access that getrandom() doesn't.  But it's certainly readily
achievable, so maybe that's good enough.  I think a flag to getrandom
would result in fewer mistakes in userspace code.

> have is application authors that actually *really* need cryptographic
> randomness, but they're too lazy to figure out a way to defer key
> generation until the last possible moment.

Would it be safe to say the the randutils code in util-linux would be
better off falling back to /dev/urandom instead of what it does?

If authors that really need cryptographic data use random_get_bytes()
or uuid_generate(), they'll get code that automatically falls back to
gettimeofday().  And probably not even know it.

I get your concern about lazy authors using an API that isn't
appropriate for their use case.

But we have this api already, in util-linux and code copied/inspired by
it, and it seems there are use cases where it is appropriate.  If we
make it better(*), then does the risk of it being used where it
shouldn't go up?

(*) Better: use the best available random data that can be provided
without blocking.


> There are other things we can do to add support in the bootloader to
> read an entropy state file and inject it into the kernel alongside the
> initrd and boot command line.  But that doesn't completely solve the
> problem; you still have to deal with the "frest from the factory,

This is problematic on a number of embedded platforms.

The bootloader might have no writable persistent storage to read/write 
this entropy from.  This requires drivers for the storage hardware,
ability to deal with the storage being in an inconsistent state, and
security of the storage.  Assuming hardware for writable storage even
exists.

So if I want u-boot to read/write an encrypted and authenticated flash
file system, there is a lot of code to put in the bootloader!  And now
we have to worry about that being exploited.  Maybe this means the
bootloader needs an encryption key that it didn't previous need have
access to.

Some systems have a limit on bootloader size and RAM.  Cyclone 5 is
64kB, which pretty much requires a two stage bootloader.  Arria 10 has
256kB and boots in a single stage, but bootloader features are quite
limited.  On imx23, it's possible to boot directly into linux with no
bootloader at all.  The cpu's rom can initialize the hardware enough to
run linux just from info in the mxs boot image 

Re: Linux messages full of `random: get_random_u32 called from`

2018-05-18 Thread Theodore Y. Ts'o
On Fri, May 18, 2018 at 10:56:18PM +, Trent Piepho wrote:
> 
> I feel like "fix" might overstate the result a bit.
> 
> This ends up taking a full second to make each UUID.  Having gone to
> great effort to make an iMX25 complete userspace startup in 250 ms, a
> full second, per UUID, in early startup is pretty appalling.
> 
> Let's look at what we're doing after this fix:
> Want non-cryptographic random data for UUID, ask kernel for it.
> Kernel has non-cryptographic random data, won't give it to us.
> Wait one second for cryptographic random data, which we didn't need.
> Give up and create our own random data, which is non-cryptographic and
> even worse than what the kernel could have given us from the start.
> 
> util-linux falls back to rand() seeded with the pid, uid, tv_sec, and
> tv_usec from gettimeofday().  Pretty bad on an embedded system with no
> RTC and worse than what the kernel in crng_init 1 state can give us.

So what util-linux's libuuid could do is fall back to using
/dev/urandom instead.  Whether or not you retry for a second before
you fall back to /dev/urandom really depends on how important the
second U in UUID ("unique") is to you.  If you use lower quality
randomness, you can potentially risk getting non-unique UUID's.

If you don't worry leaking your computer's identity and the time when
the UUID was generated, the application could also use the time-based
UUID's.  There are privacy implications for doing so, it's not
something we can do automatically (or at least I can't recommend it).
Also, if you don't have the clock sequence file and/or you don't have
a writable root, you might need some randomness anyway to protect
against non-monotonically increasing system time.

> It would seem to be a fact that there will be users of non-
> cryptographic random data in early boot.  What is the best practice for
> that?  To fall back to each user trying "to find randomly-looking
> things on an 1990s Unix."  That doesn't seem good to me.  But what's
> the better way?

We could add a new flag to getrandom(2), but application authors can
just as easily fall back to using /dev/urandom.  The real concern I
have is application authors that actually *really* need cryptographic
randomness, but they're too lazy to figure out a way to defer key
generation until the last possible moment.

There are other things we can do to add support in the bootloader to
read an entropy state file and inject it into the kernel alongside the
initrd and boot command line.  But that doesn't completely solve the
problem; you still have to deal with the "frest from the factory,
first time out of box" experience.  And if you have trusted random
number generation hardware, and are reasonably certain you don't have
to worry about a state-sponsored agency from intercepting hardware
shipments and gimmicking your hardware, that can be a solution as
well.

So there are things we can do to improve some of the scenarios.
Unfortunately, there is no silver bullet that will address all of
them.

- Ted


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-18 Thread Theodore Y. Ts'o
On Fri, May 18, 2018 at 10:56:18PM +, Trent Piepho wrote:
> 
> I feel like "fix" might overstate the result a bit.
> 
> This ends up taking a full second to make each UUID.  Having gone to
> great effort to make an iMX25 complete userspace startup in 250 ms, a
> full second, per UUID, in early startup is pretty appalling.
> 
> Let's look at what we're doing after this fix:
> Want non-cryptographic random data for UUID, ask kernel for it.
> Kernel has non-cryptographic random data, won't give it to us.
> Wait one second for cryptographic random data, which we didn't need.
> Give up and create our own random data, which is non-cryptographic and
> even worse than what the kernel could have given us from the start.
> 
> util-linux falls back to rand() seeded with the pid, uid, tv_sec, and
> tv_usec from gettimeofday().  Pretty bad on an embedded system with no
> RTC and worse than what the kernel in crng_init 1 state can give us.

So what util-linux's libuuid could do is fall back to using
/dev/urandom instead.  Whether or not you retry for a second before
you fall back to /dev/urandom really depends on how important the
second U in UUID ("unique") is to you.  If you use lower quality
randomness, you can potentially risk getting non-unique UUID's.

If you don't worry leaking your computer's identity and the time when
the UUID was generated, the application could also use the time-based
UUID's.  There are privacy implications for doing so, it's not
something we can do automatically (or at least I can't recommend it).
Also, if you don't have the clock sequence file and/or you don't have
a writable root, you might need some randomness anyway to protect
against non-monotonically increasing system time.

> It would seem to be a fact that there will be users of non-
> cryptographic random data in early boot.  What is the best practice for
> that?  To fall back to each user trying "to find randomly-looking
> things on an 1990s Unix."  That doesn't seem good to me.  But what's
> the better way?

We could add a new flag to getrandom(2), but application authors can
just as easily fall back to using /dev/urandom.  The real concern I
have is application authors that actually *really* need cryptographic
randomness, but they're too lazy to figure out a way to defer key
generation until the last possible moment.

There are other things we can do to add support in the bootloader to
read an entropy state file and inject it into the kernel alongside the
initrd and boot command line.  But that doesn't completely solve the
problem; you still have to deal with the "frest from the factory,
first time out of box" experience.  And if you have trusted random
number generation hardware, and are reasonably certain you don't have
to worry about a state-sponsored agency from intercepting hardware
shipments and gimmicking your hardware, that can be a solution as
well.

So there are things we can do to improve some of the scenarios.
Unfortunately, there is no silver bullet that will address all of
them.

- Ted


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-18 Thread Trent Piepho
On Thu, 2018-05-17 at 22:32 -0400, Theodore Y. Ts'o wrote:
> On Fri, May 18, 2018 at 01:27:03AM +, Trent Piepho wrote:
> > I've hit this on an embedded system.  mke2fs hangs trying to format a
> > persistent writable filesystem, which is where the random seed to
> > initialize the kernel entropy pool would be stored, because it wants 16
> > bytes of non-cryptographic random data for a filesystem UUID, and util-
> > linux libuuid calls getrandom(16, 0) - no GRND_RANDOM flag - and this
> > hangs for over four minutes.
> 
> This is fixed in util-linux 2.32.  It ships with the following commits:

I feel like "fix" might overstate the result a bit.

This ends up taking a full second to make each UUID.  Having gone to
great effort to make an iMX25 complete userspace startup in 250 ms, a
full second, per UUID, in early startup is pretty appalling.

Let's look at what we're doing after this fix:
Want non-cryptographic random data for UUID, ask kernel for it.
Kernel has non-cryptographic random data, won't give it to us.
Wait one second for cryptographic random data, which we didn't need.
Give up and create our own random data, which is non-cryptographic and
even worse than what the kernel could have given us from the start.

util-linux falls back to rand() seeded with the pid, uid, tv_sec, and
tv_usec from gettimeofday().  Pretty bad on an embedded system with no
RTC and worse than what the kernel in crng_init 1 state can give us.

What took microseconds now takes a seconds.  We have lower quality
random data than we had before.

Seems like two steps backward.  Can't we do better?

How about adding a flag to getrandom() that allows the kernel to return
low-quality data if high-quality data would require blocking?

It would seem to be a fact that there will be users of non-
cryptographic random data in early boot.  What is the best practice for
that?  To fall back to each user trying "to find randomly-looking
things on an 1990s Unix."  That doesn't seem good to me.  But what's
the better way?

Re: Linux messages full of `random: get_random_u32 called from`

2018-05-18 Thread Trent Piepho
On Thu, 2018-05-17 at 22:32 -0400, Theodore Y. Ts'o wrote:
> On Fri, May 18, 2018 at 01:27:03AM +, Trent Piepho wrote:
> > I've hit this on an embedded system.  mke2fs hangs trying to format a
> > persistent writable filesystem, which is where the random seed to
> > initialize the kernel entropy pool would be stored, because it wants 16
> > bytes of non-cryptographic random data for a filesystem UUID, and util-
> > linux libuuid calls getrandom(16, 0) - no GRND_RANDOM flag - and this
> > hangs for over four minutes.
> 
> This is fixed in util-linux 2.32.  It ships with the following commits:

I feel like "fix" might overstate the result a bit.

This ends up taking a full second to make each UUID.  Having gone to
great effort to make an iMX25 complete userspace startup in 250 ms, a
full second, per UUID, in early startup is pretty appalling.

Let's look at what we're doing after this fix:
Want non-cryptographic random data for UUID, ask kernel for it.
Kernel has non-cryptographic random data, won't give it to us.
Wait one second for cryptographic random data, which we didn't need.
Give up and create our own random data, which is non-cryptographic and
even worse than what the kernel could have given us from the start.

util-linux falls back to rand() seeded with the pid, uid, tv_sec, and
tv_usec from gettimeofday().  Pretty bad on an embedded system with no
RTC and worse than what the kernel in crng_init 1 state can give us.

What took microseconds now takes a seconds.  We have lower quality
random data than we had before.

Seems like two steps backward.  Can't we do better?

How about adding a flag to getrandom() that allows the kernel to return
low-quality data if high-quality data would require blocking?

It would seem to be a fact that there will be users of non-
cryptographic random data in early boot.  What is the best practice for
that?  To fall back to each user trying "to find randomly-looking
things on an 1990s Unix."  That doesn't seem good to me.  But what's
the better way?

Re: Linux messages full of `random: get_random_u32 called from`

2018-05-17 Thread Theodore Y. Ts'o
On Fri, May 18, 2018 at 01:27:03AM +, Trent Piepho wrote:
> 
> I've hit this on an embedded system.  mke2fs hangs trying to format a
> persistent writable filesystem, which is where the random seed to
> initialize the kernel entropy pool would be stored, because it wants 16
> bytes of non-cryptographic random data for a filesystem UUID, and util-
> linux libuuid calls getrandom(16, 0) - no GRND_RANDOM flag - and this
> hangs for over four minutes.

This is fixed in util-linux 2.32.  It ships with the following commits:

commit edc1c90cb972fdca1f66be5a8e2b0706bd2a4949
Author: Karel Zak 
Date:   Tue Mar 20 14:17:24 2018 +0100

lib/randutils: don't break on EAGAIN, use usleep()

The current code uses lose_counter to make more attempts to read
random numbers. It seems better to wait a moment between attempts to
avoid busy loop (we do the same in all-io.h).

The worst case is 1 second delay for all random_get_bytes() on systems
with uninitialized entropy pool -- for example you call sfdisk (MBR Id
or GPT UUIDs) on very first boot, etc. In this case it will use libc
rand() as a fallback solution.

Note that we do not use random numbers for security sensitive things
like keys or so. It's used for random based UUIDs etc.

Addresses: https://github.com/karelzak/util-linux/pull/603
Signed-off-by: Karel Zak 

commit a9cf659e0508c1f56813a7d74c64f67bbc962538
Author: Carlo Caione 
Date:   Mon Mar 19 10:31:07 2018 +

lib/randutils: Do not block on getrandom()

In Endless we have hit a problem when using 'sfdisk' on the really first
boot to automatically expand the rootfs partition. On this platform
'sfdisk' is blocking on getrandom() because not enough random bytes are
available. This is an ARM platform without a hwrng.

We fix this passing GRND_NONBLOCK to getrandom(). 'sfdisk' will use the
best entropy it has available and fallback only as necessary.

Signed-off-by: Carlo Caione 

Interestingly, these commits in util-linux landed *before* the patches
to address CVE-2018-1108 appeared in the kernel in April 2019.  This
was because the issue of libuuid was blocking on a handful of embedded
systems even for we made this change in Linux's random driver.  (It
just made this problem more likely to be visbile on a larger number of
systems; but it was always there.)

- Ted


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-17 Thread Theodore Y. Ts'o
On Fri, May 18, 2018 at 01:27:03AM +, Trent Piepho wrote:
> 
> I've hit this on an embedded system.  mke2fs hangs trying to format a
> persistent writable filesystem, which is where the random seed to
> initialize the kernel entropy pool would be stored, because it wants 16
> bytes of non-cryptographic random data for a filesystem UUID, and util-
> linux libuuid calls getrandom(16, 0) - no GRND_RANDOM flag - and this
> hangs for over four minutes.

This is fixed in util-linux 2.32.  It ships with the following commits:

commit edc1c90cb972fdca1f66be5a8e2b0706bd2a4949
Author: Karel Zak 
Date:   Tue Mar 20 14:17:24 2018 +0100

lib/randutils: don't break on EAGAIN, use usleep()

The current code uses lose_counter to make more attempts to read
random numbers. It seems better to wait a moment between attempts to
avoid busy loop (we do the same in all-io.h).

The worst case is 1 second delay for all random_get_bytes() on systems
with uninitialized entropy pool -- for example you call sfdisk (MBR Id
or GPT UUIDs) on very first boot, etc. In this case it will use libc
rand() as a fallback solution.

Note that we do not use random numbers for security sensitive things
like keys or so. It's used for random based UUIDs etc.

Addresses: https://github.com/karelzak/util-linux/pull/603
Signed-off-by: Karel Zak 

commit a9cf659e0508c1f56813a7d74c64f67bbc962538
Author: Carlo Caione 
Date:   Mon Mar 19 10:31:07 2018 +

lib/randutils: Do not block on getrandom()

In Endless we have hit a problem when using 'sfdisk' on the really first
boot to automatically expand the rootfs partition. On this platform
'sfdisk' is blocking on getrandom() because not enough random bytes are
available. This is an ARM platform without a hwrng.

We fix this passing GRND_NONBLOCK to getrandom(). 'sfdisk' will use the
best entropy it has available and fallback only as necessary.

Signed-off-by: Carlo Caione 

Interestingly, these commits in util-linux landed *before* the patches
to address CVE-2018-1108 appeared in the kernel in April 2019.  This
was because the issue of libuuid was blocking on a handful of embedded
systems even for we made this change in Linux's random driver.  (It
just made this problem more likely to be visbile on a larger number of
systems; but it was always there.)

- Ted


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-17 Thread Trent Piepho
Since I wasn't on this thread from the start, I can only find a way to
reply to message in mbox format on patchwork, and this seemed the best.

On Fri, 2018-04-27 at 16:10 -0400, Theodore Tso wrote:
> 
> 
> This is why ultimately, we do need to attack this problem from both
> ends, which means teaching userspace programs to only request
> cryptographic-grade randomness when it is really needed --- and most
> of the time, if the user has not logged in yet, you probably don't
> need cryptographic-grade randomness

I've hit this on an embedded system.  mke2fs hangs trying to format a
persistent writable filesystem, which is where the random seed to
initialize the kernel entropy pool would be stored, because it wants 16
bytes of non-cryptographic random data for a filesystem UUID, and util-
linux libuuid calls getrandom(16, 0) - no GRND_RANDOM flag - and this
hangs for over four minutes.

Some things I've seen here don't work in the embedded world.

The user will not log in.  No one logs in.  There are not even user
accounts with a valid password that could log in.

The storage comes pre-written with a static image from the manufacturer
or is programmed from a static image via JTAG or some other out of band
step.  It cannot be different from device to device when it first
boots.  No saved entropy.

The bootloader gets entropy from writable storage to give to the
kernel?  Can't do that.  The bootloader has no access to writable
storage.

I understand that if someone wants cryptographic-grade randomness early
in boot when that just isn't available and isn't going to be available,
then that isn't going to happen and lying to the consumer about the
randomness of the data isn't the answer.

But I just want UUIDs for a filesystem.  And the systemd machineid for
the journal file.  It seems the util-linux authors thought, apparently
incorrectly, that getrandom() without GRND_RANDOM was a good way to do
get it.

What is the right way?  The fact that so many userspace consumers get
it wrong might be a sign that this is lacking or at least very non-
obvious.

I want random data and I want it now.  It's ok if it's low entropy. 
This seems to be a very real, and unavoidable, thing in early boot. 
And crng_init == 1 seems to be the intended way to do this.  What's the
way to get random data of crng_init==1 quality without blocking?



Re: Linux messages full of `random: get_random_u32 called from`

2018-05-17 Thread Trent Piepho
Since I wasn't on this thread from the start, I can only find a way to
reply to message in mbox format on patchwork, and this seemed the best.

On Fri, 2018-04-27 at 16:10 -0400, Theodore Tso wrote:
> 
> 
> This is why ultimately, we do need to attack this problem from both
> ends, which means teaching userspace programs to only request
> cryptographic-grade randomness when it is really needed --- and most
> of the time, if the user has not logged in yet, you probably don't
> need cryptographic-grade randomness

I've hit this on an embedded system.  mke2fs hangs trying to format a
persistent writable filesystem, which is where the random seed to
initialize the kernel entropy pool would be stored, because it wants 16
bytes of non-cryptographic random data for a filesystem UUID, and util-
linux libuuid calls getrandom(16, 0) - no GRND_RANDOM flag - and this
hangs for over four minutes.

Some things I've seen here don't work in the embedded world.

The user will not log in.  No one logs in.  There are not even user
accounts with a valid password that could log in.

The storage comes pre-written with a static image from the manufacturer
or is programmed from a static image via JTAG or some other out of band
step.  It cannot be different from device to device when it first
boots.  No saved entropy.

The bootloader gets entropy from writable storage to give to the
kernel?  Can't do that.  The bootloader has no access to writable
storage.

I understand that if someone wants cryptographic-grade randomness early
in boot when that just isn't available and isn't going to be available,
then that isn't going to happen and lying to the consumer about the
randomness of the data isn't the answer.

But I just want UUIDs for a filesystem.  And the systemd machineid for
the journal file.  It seems the util-linux authors thought, apparently
incorrectly, that getrandom() without GRND_RANDOM was a good way to do
get it.

What is the right way?  The fact that so many userspace consumers get
it wrong might be a sign that this is lacking or at least very non-
obvious.

I want random data and I want it now.  It's ok if it's low entropy. 
This seems to be a very real, and unavoidable, thing in early boot. 
And crng_init == 1 seems to be the intended way to do this.  What's the
way to get random data of crng_init==1 quality without blocking?



Re: Linux messages full of `random: get_random_u32 called from`

2018-05-03 Thread Justin Forbes
On Wed, May 2, 2018 at 5:25 PM, Theodore Y. Ts'o  wrote:
> On Wed, May 02, 2018 at 10:49:34AM -0700, Laura Abbott wrote:
>>
>> It is a Fedora patch we're carrying
>> https://src.fedoraproject.org/rpms/libgcrypt/blob/master/f/libgcrypt-1.6.2-fips-ctor.patch#_23
>> so yes, it is a Fedora specific use case.
>> From talking to the libgcrypt team, this is a FIPS mode requirement
>> to run power on self test at the library constructor and the self
>> test of libgrcypt ends up requiring a fully seeded RNG. Citation
>> is in section 9.10 of
>> https://csrc.nist.gov/CSRC/media/Projects/Cryptographic-Module-Validation-Program/documents/fips140-2/FIPS1402IG.pdf
>
> Forgive me if this is a stupid question, but does Fedora need FIPS
> compliance?  Or is this something which is only required for RHEL?
>
> ("Here's to FIPS: the cause of, and solution to, all of Life's
> problems."  :-)
>
One of the advantages of carrying such things in Fedora is we find
these problems before RHEL does and hopefully there is a solution in
place before they ever even see it.

>From the rawhide end, I just brought in virtio-rng as inline vs
module, this works around the issue for lots of users, but not all.
GCE is still impacted, and a user came to complain about it already
last night.  And of course any other virt platform without virtio-rng,
or some hardware. Most hardware installs don't have dracut-fips so
they will boot, eventually.

Justin


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-03 Thread Justin Forbes
On Wed, May 2, 2018 at 5:25 PM, Theodore Y. Ts'o  wrote:
> On Wed, May 02, 2018 at 10:49:34AM -0700, Laura Abbott wrote:
>>
>> It is a Fedora patch we're carrying
>> https://src.fedoraproject.org/rpms/libgcrypt/blob/master/f/libgcrypt-1.6.2-fips-ctor.patch#_23
>> so yes, it is a Fedora specific use case.
>> From talking to the libgcrypt team, this is a FIPS mode requirement
>> to run power on self test at the library constructor and the self
>> test of libgrcypt ends up requiring a fully seeded RNG. Citation
>> is in section 9.10 of
>> https://csrc.nist.gov/CSRC/media/Projects/Cryptographic-Module-Validation-Program/documents/fips140-2/FIPS1402IG.pdf
>
> Forgive me if this is a stupid question, but does Fedora need FIPS
> compliance?  Or is this something which is only required for RHEL?
>
> ("Here's to FIPS: the cause of, and solution to, all of Life's
> problems."  :-)
>
One of the advantages of carrying such things in Fedora is we find
these problems before RHEL does and hopefully there is a solution in
place before they ever even see it.

>From the rawhide end, I just brought in virtio-rng as inline vs
module, this works around the issue for lots of users, but not all.
GCE is still impacted, and a user came to complain about it already
last night.  And of course any other virt platform without virtio-rng,
or some hardware. Most hardware installs don't have dracut-fips so
they will boot, eventually.

Justin


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-03 Thread Pavel Machek
On Wed 2018-05-02 18:25:22, Theodore Y. Ts'o wrote:
> On Wed, May 02, 2018 at 10:49:34AM -0700, Laura Abbott wrote:
> > 
> > It is a Fedora patch we're carrying
> > https://src.fedoraproject.org/rpms/libgcrypt/blob/master/f/libgcrypt-1.6.2-fips-ctor.patch#_23
> > so yes, it is a Fedora specific use case.
> > From talking to the libgcrypt team, this is a FIPS mode requirement
> > to run power on self test at the library constructor and the self
> > test of libgrcypt ends up requiring a fully seeded RNG. Citation
> > is in section 9.10 of
> > https://csrc.nist.gov/CSRC/media/Projects/Cryptographic-Module-Validation-Program/documents/fips140-2/FIPS1402IG.pdf
> 
> Forgive me if this is a stupid question, but does Fedora need FIPS
> compliance?  Or is this something which is only required for RHEL?

If RHEL needs it, Fedora needs it, too -- as Fedora is a beta test for
RHEL.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-03 Thread Pavel Machek
On Wed 2018-05-02 18:25:22, Theodore Y. Ts'o wrote:
> On Wed, May 02, 2018 at 10:49:34AM -0700, Laura Abbott wrote:
> > 
> > It is a Fedora patch we're carrying
> > https://src.fedoraproject.org/rpms/libgcrypt/blob/master/f/libgcrypt-1.6.2-fips-ctor.patch#_23
> > so yes, it is a Fedora specific use case.
> > From talking to the libgcrypt team, this is a FIPS mode requirement
> > to run power on self test at the library constructor and the self
> > test of libgrcypt ends up requiring a fully seeded RNG. Citation
> > is in section 9.10 of
> > https://csrc.nist.gov/CSRC/media/Projects/Cryptographic-Module-Validation-Program/documents/fips140-2/FIPS1402IG.pdf
> 
> Forgive me if this is a stupid question, but does Fedora need FIPS
> compliance?  Or is this something which is only required for RHEL?

If RHEL needs it, Fedora needs it, too -- as Fedora is a beta test for
RHEL.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-02 Thread Theodore Y. Ts'o
On Wed, May 02, 2018 at 10:49:34AM -0700, Laura Abbott wrote:
> 
> It is a Fedora patch we're carrying
> https://src.fedoraproject.org/rpms/libgcrypt/blob/master/f/libgcrypt-1.6.2-fips-ctor.patch#_23
> so yes, it is a Fedora specific use case.
> From talking to the libgcrypt team, this is a FIPS mode requirement
> to run power on self test at the library constructor and the self
> test of libgrcypt ends up requiring a fully seeded RNG. Citation
> is in section 9.10 of
> https://csrc.nist.gov/CSRC/media/Projects/Cryptographic-Module-Validation-Program/documents/fips140-2/FIPS1402IG.pdf

Forgive me if this is a stupid question, but does Fedora need FIPS
compliance?  Or is this something which is only required for RHEL?

("Here's to FIPS: the cause of, and solution to, all of Life's
problems."  :-)

 - Ted


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-02 Thread Theodore Y. Ts'o
On Wed, May 02, 2018 at 10:49:34AM -0700, Laura Abbott wrote:
> 
> It is a Fedora patch we're carrying
> https://src.fedoraproject.org/rpms/libgcrypt/blob/master/f/libgcrypt-1.6.2-fips-ctor.patch#_23
> so yes, it is a Fedora specific use case.
> From talking to the libgcrypt team, this is a FIPS mode requirement
> to run power on self test at the library constructor and the self
> test of libgrcypt ends up requiring a fully seeded RNG. Citation
> is in section 9.10 of
> https://csrc.nist.gov/CSRC/media/Projects/Cryptographic-Module-Validation-Program/documents/fips140-2/FIPS1402IG.pdf

Forgive me if this is a stupid question, but does Fedora need FIPS
compliance?  Or is this something which is only required for RHEL?

("Here's to FIPS: the cause of, and solution to, all of Life's
problems."  :-)

 - Ted


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-02 Thread Laura Abbott

On 05/02/2018 09:26 AM, Theodore Y. Ts'o wrote:

On Wed, May 02, 2018 at 07:09:11AM -0500, Justin Forbes wrote:

Yes, Fedora libgcrypt is carrying a patch which makes it particularly
painful for us, we have reached out to the libgcrypt maintainer to
follow up on that end. But as I said before, even without that code
path (no dracut-fips) we are seeing some instances of 4 minute boots.
This is not really a workable user experience.  And are you sure that
every cloud platform and VM platform offers, makes it possible to
config virtio-rng?


Unfortunately, the answer is no.  Google Compute Engine, alas, does
not currently support virtio-rng.  With my Google hat on, I can't
comment on future product features.  With my upstream developer hat
on, I'll give you three guesses what I have been advocating and
pushing for internally, and the first two don't count.  :-)

That being said, I just booted a Debian 9 (Stable, aka Stretch)
standard kernel, and then installed 4.17-rc3 (which has the
CVE-2018-1108 patches).  The crng_init=2 message doesn't appear
immediately, and it does appear quite a bit later comapred to
the standard 4.9.0-6-amd64 Debian 9 kernel.  However, the lack of a
fully initialized random pool doesn't prevent the standard Debian 9
image from booting:

May  2 15:33:42 localhost kernel: [0.00] Linux version 
4.17.0-rc3-xfstests (tytso@cwcc) (gcc version 7.3.0 (Debian 7.3.0-16)) #169 SMP 
Wed May 2 11:28:17 EDT 2018
May  2 15:33:42 localhost kernel: [1.456883] random: fast init done
May  2 15:33:46 rng-testing systemd[1]: Startup finished in 3.202s (kernel) + 
5.963s (userspace) = 9.166s.
May  2 15:33:46 rng-testing google-accounts: INFO Starting Google Accounts 
daemon.
May  2 15:44:39 rng-testing kernel: [  661.436664] random: crng init done

So it really does appear to be something going on with Fedora's
userspace; can you help try to track down what it is?

Thanks,

- Ted



It is a Fedora patch we're carrying
https://src.fedoraproject.org/rpms/libgcrypt/blob/master/f/libgcrypt-1.6.2-fips-ctor.patch#_23
so yes, it is a Fedora specific use case.
From talking to the libgcrypt team, this is a FIPS mode requirement
to run power on self test at the library constructor and the self
test of libgrcypt ends up requiring a fully seeded RNG. Citation
is in section 9.10 of
https://csrc.nist.gov/CSRC/media/Projects/Cryptographic-Module-Validation-Program/documents/fips140-2/FIPS1402IG.pdf

The response was this _could_ be fixed in libgcrypt but it needs
to be done carefully to ensure nothing actually gets broken. So in
the mean time we're stuck with userspace getting blocked whenever
some program decides to use libgcrypt too early.

Thanks,
Laura


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-02 Thread Laura Abbott

On 05/02/2018 09:26 AM, Theodore Y. Ts'o wrote:

On Wed, May 02, 2018 at 07:09:11AM -0500, Justin Forbes wrote:

Yes, Fedora libgcrypt is carrying a patch which makes it particularly
painful for us, we have reached out to the libgcrypt maintainer to
follow up on that end. But as I said before, even without that code
path (no dracut-fips) we are seeing some instances of 4 minute boots.
This is not really a workable user experience.  And are you sure that
every cloud platform and VM platform offers, makes it possible to
config virtio-rng?


Unfortunately, the answer is no.  Google Compute Engine, alas, does
not currently support virtio-rng.  With my Google hat on, I can't
comment on future product features.  With my upstream developer hat
on, I'll give you three guesses what I have been advocating and
pushing for internally, and the first two don't count.  :-)

That being said, I just booted a Debian 9 (Stable, aka Stretch)
standard kernel, and then installed 4.17-rc3 (which has the
CVE-2018-1108 patches).  The crng_init=2 message doesn't appear
immediately, and it does appear quite a bit later comapred to
the standard 4.9.0-6-amd64 Debian 9 kernel.  However, the lack of a
fully initialized random pool doesn't prevent the standard Debian 9
image from booting:

May  2 15:33:42 localhost kernel: [0.00] Linux version 
4.17.0-rc3-xfstests (tytso@cwcc) (gcc version 7.3.0 (Debian 7.3.0-16)) #169 SMP 
Wed May 2 11:28:17 EDT 2018
May  2 15:33:42 localhost kernel: [1.456883] random: fast init done
May  2 15:33:46 rng-testing systemd[1]: Startup finished in 3.202s (kernel) + 
5.963s (userspace) = 9.166s.
May  2 15:33:46 rng-testing google-accounts: INFO Starting Google Accounts 
daemon.
May  2 15:44:39 rng-testing kernel: [  661.436664] random: crng init done

So it really does appear to be something going on with Fedora's
userspace; can you help try to track down what it is?

Thanks,

- Ted



It is a Fedora patch we're carrying
https://src.fedoraproject.org/rpms/libgcrypt/blob/master/f/libgcrypt-1.6.2-fips-ctor.patch#_23
so yes, it is a Fedora specific use case.
From talking to the libgcrypt team, this is a FIPS mode requirement
to run power on self test at the library constructor and the self
test of libgrcypt ends up requiring a fully seeded RNG. Citation
is in section 9.10 of
https://csrc.nist.gov/CSRC/media/Projects/Cryptographic-Module-Validation-Program/documents/fips140-2/FIPS1402IG.pdf

The response was this _could_ be fixed in libgcrypt but it needs
to be done carefully to ensure nothing actually gets broken. So in
the mean time we're stuck with userspace getting blocked whenever
some program decides to use libgcrypt too early.

Thanks,
Laura


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-02 Thread Theodore Y. Ts'o
On Wed, May 02, 2018 at 07:09:11AM -0500, Justin Forbes wrote:
> Yes, Fedora libgcrypt is carrying a patch which makes it particularly
> painful for us, we have reached out to the libgcrypt maintainer to
> follow up on that end. But as I said before, even without that code
> path (no dracut-fips) we are seeing some instances of 4 minute boots.
> This is not really a workable user experience.  And are you sure that
> every cloud platform and VM platform offers, makes it possible to
> config virtio-rng?

Unfortunately, the answer is no.  Google Compute Engine, alas, does
not currently support virtio-rng.  With my Google hat on, I can't
comment on future product features.  With my upstream developer hat
on, I'll give you three guesses what I have been advocating and
pushing for internally, and the first two don't count.  :-)

That being said, I just booted a Debian 9 (Stable, aka Stretch)
standard kernel, and then installed 4.17-rc3 (which has the
CVE-2018-1108 patches).  The crng_init=2 message doesn't appear
immediately, and it does appear quite a bit later comapred to
the standard 4.9.0-6-amd64 Debian 9 kernel.  However, the lack of a
fully initialized random pool doesn't prevent the standard Debian 9
image from booting:

May  2 15:33:42 localhost kernel: [0.00] Linux version 
4.17.0-rc3-xfstests (tytso@cwcc) (gcc version 7.3.0 (Debian 7.3.0-16)) #169 SMP 
Wed May 2 11:28:17 EDT 2018
May  2 15:33:42 localhost kernel: [1.456883] random: fast init done
May  2 15:33:46 rng-testing systemd[1]: Startup finished in 3.202s (kernel) + 
5.963s (userspace) = 9.166s.
May  2 15:33:46 rng-testing google-accounts: INFO Starting Google Accounts 
daemon.
May  2 15:44:39 rng-testing kernel: [  661.436664] random: crng init done

So it really does appear to be something going on with Fedora's
userspace; can you help try to track down what it is?

Thanks,

- Ted


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-02 Thread Theodore Y. Ts'o
On Wed, May 02, 2018 at 07:09:11AM -0500, Justin Forbes wrote:
> Yes, Fedora libgcrypt is carrying a patch which makes it particularly
> painful for us, we have reached out to the libgcrypt maintainer to
> follow up on that end. But as I said before, even without that code
> path (no dracut-fips) we are seeing some instances of 4 minute boots.
> This is not really a workable user experience.  And are you sure that
> every cloud platform and VM platform offers, makes it possible to
> config virtio-rng?

Unfortunately, the answer is no.  Google Compute Engine, alas, does
not currently support virtio-rng.  With my Google hat on, I can't
comment on future product features.  With my upstream developer hat
on, I'll give you three guesses what I have been advocating and
pushing for internally, and the first two don't count.  :-)

That being said, I just booted a Debian 9 (Stable, aka Stretch)
standard kernel, and then installed 4.17-rc3 (which has the
CVE-2018-1108 patches).  The crng_init=2 message doesn't appear
immediately, and it does appear quite a bit later comapred to
the standard 4.9.0-6-amd64 Debian 9 kernel.  However, the lack of a
fully initialized random pool doesn't prevent the standard Debian 9
image from booting:

May  2 15:33:42 localhost kernel: [0.00] Linux version 
4.17.0-rc3-xfstests (tytso@cwcc) (gcc version 7.3.0 (Debian 7.3.0-16)) #169 SMP 
Wed May 2 11:28:17 EDT 2018
May  2 15:33:42 localhost kernel: [1.456883] random: fast init done
May  2 15:33:46 rng-testing systemd[1]: Startup finished in 3.202s (kernel) + 
5.963s (userspace) = 9.166s.
May  2 15:33:46 rng-testing google-accounts: INFO Starting Google Accounts 
daemon.
May  2 15:44:39 rng-testing kernel: [  661.436664] random: crng init done

So it really does appear to be something going on with Fedora's
userspace; can you help try to track down what it is?

Thanks,

- Ted


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-02 Thread Justin Forbes
On Tue, May 1, 2018 at 7:02 PM, Theodore Y. Ts'o  wrote:
> On Tue, May 01, 2018 at 05:35:56PM -0500, Justin Forbes wrote:
>>
>> I have not reproduced in GCE myself.  We did get some confirmation
>> that removing dracut-fips does make the problem less dire (but I
>> wouldn't call a 4 minute boot a win, but booting in 4 minutes is
>> better than not booting at all).  Specifically systemd calls libgcrypt
>> before it even opens the log with fips there, and this is before
>> virtio-rng modules could even load. Right now though, we are looking
>> at pretty much any possible options as the majority of people are
>> calling for me to backout the patches completely from rawhide.
>
> FWIW, Debian Testing is using systemd 238, and from what I can tell
> it's calling libgcrypt and it has the same (as near as I can tell)
> totally pointless hmac nonsense, and it's not a problem that I can
> see.  Of course, Debian and Fedora may have a different set of
> patches
>
Yes, Fedora libgcrypt is carrying a patch which makes it particularly
painful for us, we have reached out to the libgcrypt maintainer to
follow up on that end. But as I said before, even without that code
path (no dracut-fips) we are seeing some instances of 4 minute boots.
This is not really a workable user experience.  And are you sure that
every cloud platform and VM platform offers, makes it possible to
config virtio-rng?

Justin


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-02 Thread Justin Forbes
On Tue, May 1, 2018 at 7:02 PM, Theodore Y. Ts'o  wrote:
> On Tue, May 01, 2018 at 05:35:56PM -0500, Justin Forbes wrote:
>>
>> I have not reproduced in GCE myself.  We did get some confirmation
>> that removing dracut-fips does make the problem less dire (but I
>> wouldn't call a 4 minute boot a win, but booting in 4 minutes is
>> better than not booting at all).  Specifically systemd calls libgcrypt
>> before it even opens the log with fips there, and this is before
>> virtio-rng modules could even load. Right now though, we are looking
>> at pretty much any possible options as the majority of people are
>> calling for me to backout the patches completely from rawhide.
>
> FWIW, Debian Testing is using systemd 238, and from what I can tell
> it's calling libgcrypt and it has the same (as near as I can tell)
> totally pointless hmac nonsense, and it's not a problem that I can
> see.  Of course, Debian and Fedora may have a different set of
> patches
>
Yes, Fedora libgcrypt is carrying a patch which makes it particularly
painful for us, we have reached out to the libgcrypt maintainer to
follow up on that end. But as I said before, even without that code
path (no dracut-fips) we are seeing some instances of 4 minute boots.
This is not really a workable user experience.  And are you sure that
every cloud platform and VM platform offers, makes it possible to
config virtio-rng?

Justin


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-01 Thread Sultan Alsawaf
On Tue, May 01, 2018 at 08:56:04PM -0400, Theodore Y. Ts'o wrote:
> On Tue, May 01, 2018 at 05:43:17PM -0700, Sultan Alsawaf wrote:
> > 
> > I've attached what I think is a reasonable stopgap solution until this is
> > actually fixed. If you're willing to revert the CVE-2018-1108 patches
> > completely, then I don't think you'll mind using this patch in the meantime.
> 
> I would put it slightly differently; reverting the CVE-2018-1108
> patches is less dangerous than what you are proposing in your attached
> patch.
> 
> Again, I think the right answer is to fix userspace to not require
> cryptographic grade entropy during early system startup, and for
> people to *think* about what they are doing.  I've looked at the
> systemd's use of hmac in journal-authenticate, and as near as I can
> tell, there isn't any kind of explanation about why it was necessary,
> or what threat it was trying to protect against.
> 
>   - Ted

Why is /dev/urandom so much more dangerous than /dev/random? The
more I search, the more I see that many sources consider /dev/urandom
to be cryptographically secure... and since I hold down a single key on
the keyboard to make my computer boot without any kernel workarounds,
I'm sure the NSA would eventually notice my predictable behavior and get
their hands on my Richard Stallman photos.

Fixing all the "broken" userspace instances of entropy usage during early
system startup is a tall order. What about barebone machines used as
remote servers? I feel like just "fixing userspace" isn't going to cover
all of the usecases that the CVE-2018-1108 patches broke.

Sultan


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-01 Thread Sultan Alsawaf
On Tue, May 01, 2018 at 08:56:04PM -0400, Theodore Y. Ts'o wrote:
> On Tue, May 01, 2018 at 05:43:17PM -0700, Sultan Alsawaf wrote:
> > 
> > I've attached what I think is a reasonable stopgap solution until this is
> > actually fixed. If you're willing to revert the CVE-2018-1108 patches
> > completely, then I don't think you'll mind using this patch in the meantime.
> 
> I would put it slightly differently; reverting the CVE-2018-1108
> patches is less dangerous than what you are proposing in your attached
> patch.
> 
> Again, I think the right answer is to fix userspace to not require
> cryptographic grade entropy during early system startup, and for
> people to *think* about what they are doing.  I've looked at the
> systemd's use of hmac in journal-authenticate, and as near as I can
> tell, there isn't any kind of explanation about why it was necessary,
> or what threat it was trying to protect against.
> 
>   - Ted

Why is /dev/urandom so much more dangerous than /dev/random? The
more I search, the more I see that many sources consider /dev/urandom
to be cryptographically secure... and since I hold down a single key on
the keyboard to make my computer boot without any kernel workarounds,
I'm sure the NSA would eventually notice my predictable behavior and get
their hands on my Richard Stallman photos.

Fixing all the "broken" userspace instances of entropy usage during early
system startup is a tall order. What about barebone machines used as
remote servers? I feel like just "fixing userspace" isn't going to cover
all of the usecases that the CVE-2018-1108 patches broke.

Sultan


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-01 Thread Theodore Y. Ts'o
On Tue, May 01, 2018 at 05:43:17PM -0700, Sultan Alsawaf wrote:
> 
> I've attached what I think is a reasonable stopgap solution until this is
> actually fixed. If you're willing to revert the CVE-2018-1108 patches
> completely, then I don't think you'll mind using this patch in the meantime.

I would put it slightly differently; reverting the CVE-2018-1108
patches is less dangerous than what you are proposing in your attached
patch.

Again, I think the right answer is to fix userspace to not require
cryptographic grade entropy during early system startup, and for
people to *think* about what they are doing.  I've looked at the
systemd's use of hmac in journal-authenticate, and as near as I can
tell, there isn't any kind of explanation about why it was necessary,
or what threat it was trying to protect against.

- Ted


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-01 Thread Theodore Y. Ts'o
On Tue, May 01, 2018 at 05:43:17PM -0700, Sultan Alsawaf wrote:
> 
> I've attached what I think is a reasonable stopgap solution until this is
> actually fixed. If you're willing to revert the CVE-2018-1108 patches
> completely, then I don't think you'll mind using this patch in the meantime.

I would put it slightly differently; reverting the CVE-2018-1108
patches is less dangerous than what you are proposing in your attached
patch.

Again, I think the right answer is to fix userspace to not require
cryptographic grade entropy during early system startup, and for
people to *think* about what they are doing.  I've looked at the
systemd's use of hmac in journal-authenticate, and as near as I can
tell, there isn't any kind of explanation about why it was necessary,
or what threat it was trying to protect against.

- Ted


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-01 Thread Sultan Alsawaf
On Tue, May 01, 2018 at 05:35:56PM -0500, Justin Forbes wrote:
> 
> I have not reproduced in GCE myself.  We did get some confirmation
> that removing dracut-fips does make the problem less dire (but I
> wouldn't call a 4 minute boot a win, but booting in 4 minutes is
> better than not booting at all).  Specifically systemd calls libgcrypt
> before it even opens the log with fips there, and this is before
> virtio-rng modules could even load. Right now though, we are looking
> at pretty much any possible options as the majority of people are
> calling for me to backout the patches completely from rawhide.

I've attached what I think is a reasonable stopgap solution until this is
actually fixed. If you're willing to revert the CVE-2018-1108 patches
completely, then I don't think you'll mind using this patch in the meantime.

Sultan

>From 5be2efdde744d3c55db3df81c0493fc67dc35620 Mon Sep 17 00:00:00 2001
From: Sultan Alsawaf 
Date: Tue, 1 May 2018 17:36:17 -0700
Subject: [PATCH] random: use urandom instead of random for now and speed up
 crng init

With the fixes for CVE-2018-1108, /dev/random now requires user-provided
entropy on quite a few machines lacking high levels of boot entropy
in order to complete its initialization. This causes issues on environments
where userspace depends on /dev/random in order to finish booting
completely (i.e., userspace will remain stuck, unable to boot, waiting for
entropy more-or-less indefinitely until the user provides it via something
like keystrokes or mouse movements).

As a temporary workaround, redirect /dev/random to /dev/urandom instead,
and speed up the initialization process by slightly relaxing the
threshold for interrupts to go towards adding one bit of entropy credit
(only until initialization is complete).

Signed-off-by: Sultan Alsawaf 
---
 drivers/char/mem.c| 3 ++-
 drivers/char/random.c | 9 ++---
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index ffeb60d3434c..cc9507f01c79 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -870,7 +870,8 @@ static const struct memdev {
 #endif
 [5] = { "zero", 0666, _fops, 0 },
 [7] = { "full", 0666, _fops, 0 },
-[8] = { "random", 0666, _fops, 0 },
+/* Redirect /dev/random to /dev/urandom until /dev/random is fixed */
+[8] = { "random", 0666, _fops, 0 },
 [9] = { "urandom", 0666, _fops, 0 },
 #ifdef CONFIG_PRINTK
[11] = { "kmsg", 0644, _fops, 0 },
diff --git a/drivers/char/random.c b/drivers/char/random.c
index d9e38523b383..bce3b43cdd3b 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1200,9 +1200,12 @@ void add_interrupt_randomness(int irq)
return;
}
 
-   if ((fast_pool->count < 64) &&
-   !time_after(now, fast_pool->last + HZ))
-   return;
+   if (fast_pool->count < 64) {
+   unsigned long timeout = crng_ready() ? HZ : HZ / 4;
+
+   if (!time_after(now, fast_pool->last + timeout))
+   return;
+   }
 
r = _pool;
if (!spin_trylock(>lock))
-- 
2.14.1


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-01 Thread Sultan Alsawaf
On Tue, May 01, 2018 at 05:35:56PM -0500, Justin Forbes wrote:
> 
> I have not reproduced in GCE myself.  We did get some confirmation
> that removing dracut-fips does make the problem less dire (but I
> wouldn't call a 4 minute boot a win, but booting in 4 minutes is
> better than not booting at all).  Specifically systemd calls libgcrypt
> before it even opens the log with fips there, and this is before
> virtio-rng modules could even load. Right now though, we are looking
> at pretty much any possible options as the majority of people are
> calling for me to backout the patches completely from rawhide.

I've attached what I think is a reasonable stopgap solution until this is
actually fixed. If you're willing to revert the CVE-2018-1108 patches
completely, then I don't think you'll mind using this patch in the meantime.

Sultan

>From 5be2efdde744d3c55db3df81c0493fc67dc35620 Mon Sep 17 00:00:00 2001
From: Sultan Alsawaf 
Date: Tue, 1 May 2018 17:36:17 -0700
Subject: [PATCH] random: use urandom instead of random for now and speed up
 crng init

With the fixes for CVE-2018-1108, /dev/random now requires user-provided
entropy on quite a few machines lacking high levels of boot entropy
in order to complete its initialization. This causes issues on environments
where userspace depends on /dev/random in order to finish booting
completely (i.e., userspace will remain stuck, unable to boot, waiting for
entropy more-or-less indefinitely until the user provides it via something
like keystrokes or mouse movements).

As a temporary workaround, redirect /dev/random to /dev/urandom instead,
and speed up the initialization process by slightly relaxing the
threshold for interrupts to go towards adding one bit of entropy credit
(only until initialization is complete).

Signed-off-by: Sultan Alsawaf 
---
 drivers/char/mem.c| 3 ++-
 drivers/char/random.c | 9 ++---
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index ffeb60d3434c..cc9507f01c79 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -870,7 +870,8 @@ static const struct memdev {
 #endif
 [5] = { "zero", 0666, _fops, 0 },
 [7] = { "full", 0666, _fops, 0 },
-[8] = { "random", 0666, _fops, 0 },
+/* Redirect /dev/random to /dev/urandom until /dev/random is fixed */
+[8] = { "random", 0666, _fops, 0 },
 [9] = { "urandom", 0666, _fops, 0 },
 #ifdef CONFIG_PRINTK
[11] = { "kmsg", 0644, _fops, 0 },
diff --git a/drivers/char/random.c b/drivers/char/random.c
index d9e38523b383..bce3b43cdd3b 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1200,9 +1200,12 @@ void add_interrupt_randomness(int irq)
return;
}
 
-   if ((fast_pool->count < 64) &&
-   !time_after(now, fast_pool->last + HZ))
-   return;
+   if (fast_pool->count < 64) {
+   unsigned long timeout = crng_ready() ? HZ : HZ / 4;
+
+   if (!time_after(now, fast_pool->last + timeout))
+   return;
+   }
 
r = _pool;
if (!spin_trylock(>lock))
-- 
2.14.1


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-01 Thread Theodore Y. Ts'o
On Tue, May 01, 2018 at 05:35:56PM -0500, Justin Forbes wrote:
> 
> I have not reproduced in GCE myself.  We did get some confirmation
> that removing dracut-fips does make the problem less dire (but I
> wouldn't call a 4 minute boot a win, but booting in 4 minutes is
> better than not booting at all).  Specifically systemd calls libgcrypt
> before it even opens the log with fips there, and this is before
> virtio-rng modules could even load. Right now though, we are looking
> at pretty much any possible options as the majority of people are
> calling for me to backout the patches completely from rawhide.

FWIW, Debian Testing is using systemd 238, and from what I can tell
it's calling libgcrypt and it has the same (as near as I can tell)
totally pointless hmac nonsense, and it's not a problem that I can
see.  Of course, Debian and Fedora may have a different set of
patches

 - Ted


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-01 Thread Theodore Y. Ts'o
On Tue, May 01, 2018 at 05:35:56PM -0500, Justin Forbes wrote:
> 
> I have not reproduced in GCE myself.  We did get some confirmation
> that removing dracut-fips does make the problem less dire (but I
> wouldn't call a 4 minute boot a win, but booting in 4 minutes is
> better than not booting at all).  Specifically systemd calls libgcrypt
> before it even opens the log with fips there, and this is before
> virtio-rng modules could even load. Right now though, we are looking
> at pretty much any possible options as the majority of people are
> calling for me to backout the patches completely from rawhide.

FWIW, Debian Testing is using systemd 238, and from what I can tell
it's calling libgcrypt and it has the same (as near as I can tell)
totally pointless hmac nonsense, and it's not a problem that I can
see.  Of course, Debian and Fedora may have a different set of
patches

 - Ted


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-01 Thread Justin Forbes
On Tue, May 1, 2018 at 7:55 AM, Theodore Y. Ts'o  wrote:
> On Tue, May 01, 2018 at 06:52:47AM -0500, Justin Forbes wrote:
>>
>> We have also had reports that Fedora users are seeing this on Google
>> Compute Engine.
>
> Can you reproduce this yourself?  If so, could you confirm that
> removing the dracut-fips package makes the problem go away for you?
>

I have not reproduced in GCE myself.  We did get some confirmation
that removing dracut-fips does make the problem less dire (but I
wouldn't call a 4 minute boot a win, but booting in 4 minutes is
better than not booting at all).  Specifically systemd calls libgcrypt
before it even opens the log with fips there, and this is before
virtio-rng modules could even load. Right now though, we are looking
at pretty much any possible options as the majority of people are
calling for me to backout the patches completely from rawhide.


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-01 Thread Justin Forbes
On Tue, May 1, 2018 at 7:55 AM, Theodore Y. Ts'o  wrote:
> On Tue, May 01, 2018 at 06:52:47AM -0500, Justin Forbes wrote:
>>
>> We have also had reports that Fedora users are seeing this on Google
>> Compute Engine.
>
> Can you reproduce this yourself?  If so, could you confirm that
> removing the dracut-fips package makes the problem go away for you?
>

I have not reproduced in GCE myself.  We did get some confirmation
that removing dracut-fips does make the problem less dire (but I
wouldn't call a 4 minute boot a win, but booting in 4 minutes is
better than not booting at all).  Specifically systemd calls libgcrypt
before it even opens the log with fips there, and this is before
virtio-rng modules could even load. Right now though, we are looking
at pretty much any possible options as the majority of people are
calling for me to backout the patches completely from rawhide.


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-01 Thread Pavel Machek
On Mon 2018-04-30 12:11:43, Theodore Y. Ts'o wrote:
> On Sun, Apr 29, 2018 at 09:34:45PM -0700, Sultan Alsawaf wrote:
> > 
> > What about abusing high-resolution timers to get entropy? Since hrtimers 
> > can't
> > make guarantees down to the nanosecond, there's always a skew between the
> > requested expiry time and the actual expiry time.
> > 
> > Please see the attached patch and let me know just how horrible it is.
> 
> So think about exactly where the possible causes of the skew might be
> coming from.  Look very closely at the software implemntation.  The
> important thing here is to not get hung up on the software
> abstraction, but to look at the *implementation*.  (And if it's an
> implementation in architecture specific code, we need to look at all
> architectures.)
> 
> This applies on the hardware level as hard, but that gets harder
> because there many possible hardware implemntations in use out there.
> Remember that that on many systems there may be only single clock
> crystal, and all other hardware timers maybe derived from that clock
> using frequency dividers.  (At least for everything on the mainboard.)

On "many" systems? No, sorry, computers usually do not behave like
this (CMOS RTC has separate clock, for example). I'm pretty sure that
not a single machine problems were reported on has this problem.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-01 Thread Pavel Machek
On Mon 2018-04-30 12:11:43, Theodore Y. Ts'o wrote:
> On Sun, Apr 29, 2018 at 09:34:45PM -0700, Sultan Alsawaf wrote:
> > 
> > What about abusing high-resolution timers to get entropy? Since hrtimers 
> > can't
> > make guarantees down to the nanosecond, there's always a skew between the
> > requested expiry time and the actual expiry time.
> > 
> > Please see the attached patch and let me know just how horrible it is.
> 
> So think about exactly where the possible causes of the skew might be
> coming from.  Look very closely at the software implemntation.  The
> important thing here is to not get hung up on the software
> abstraction, but to look at the *implementation*.  (And if it's an
> implementation in architecture specific code, we need to look at all
> architectures.)
> 
> This applies on the hardware level as hard, but that gets harder
> because there many possible hardware implemntations in use out there.
> Remember that that on many systems there may be only single clock
> crystal, and all other hardware timers maybe derived from that clock
> using frequency dividers.  (At least for everything on the mainboard.)

On "many" systems? No, sorry, computers usually do not behave like
this (CMOS RTC has separate clock, for example). I'm pretty sure that
not a single machine problems were reported on has this problem.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-01 Thread Theodore Y. Ts'o
On Tue, May 01, 2018 at 06:52:47AM -0500, Justin Forbes wrote:
> 
> We have also had reports that Fedora users are seeing this on Google
> Compute Engine.

Can you reproduce this yourself?  If so, could you confirm that
removing the dracut-fips package makes the problem go away for you?

Thanks,

- Ted


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-01 Thread Theodore Y. Ts'o
On Tue, May 01, 2018 at 06:52:47AM -0500, Justin Forbes wrote:
> 
> We have also had reports that Fedora users are seeing this on Google
> Compute Engine.

Can you reproduce this yourself?  If so, could you confirm that
removing the dracut-fips package makes the problem go away for you?

Thanks,

- Ted


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-01 Thread Justin Forbes
On Mon, Apr 30, 2018 at 4:12 PM, Jeremy Cline  wrote:
> On 04/29/2018 06:05 PM, Theodore Y. Ts'o wrote:
>> On Sun, Apr 29, 2018 at 01:20:33PM -0700, Sultan Alsawaf wrote:
>>> On Sun, Apr 29, 2018 at 08:41:01PM +0200, Pavel Machek wrote:
 Umm. No. https://www.youtube.com/watch?v=xneBjc8z0DE
>>>
>>> Okay, but /dev/urandom isn't a solution to this problem because it isn't 
>>> usable
>>> until crng init is complete, so it suffers from the same init lag as
>>> /dev/random.
>>
>> It's more accurate to say that using /dev/urandom is no worse than
>> before (from a few years ago).  There are, alas, plenty of
>> distributions and user space application programmers that basically
>> got lazy using /dev/urandom, and assumed that there would be plenty of
>> entropy during early system startup.
>>
>> When they switched over the getrandom(2), the most egregious examples
>> of this caused pain (and they got fixed), but due to a bug in
>> drivers/char/random.c, if getrandom(2) was called after the entropy
>> pool was "half initialized", it would not block, but proceed.
>>
>> Is that exploitable?  Well, Jann and I didn't find an _obvious_ way to
>> exploit the short coming, which is this wasn't treated like an
>> emergency situation ala the embarassing situation we had five years
>> ago[1].
>>
>> [1] https://factorable.net/paper.html
>>
>> However, it was enough to make us be uncomfortable, which is why I
>> pushed the changes that I did.  At least on the devices we had at
>> hand, using the distributions that we typically use, the impact seemed
>> minimal.  Unfortuantely, there is no way to know for sure without
>> rolling out change and seeing who screams.  In the ideal world,
>> software would not require cryptographic randomness immediately after
>> boot, before the user logs in.  And ***really***, as in [1], softwaret
>> should not be generating long-term public keys that are essential to
>> the security of the box a few seconds immediately after the device is
>> first unboxed and plugged in.i
>>
>> What would be useful is if people gave reports that listed exactly
>> what laptop and distributions they are using.  Just "a high spec x86
>> laptop" isn't terribly useful, because *my* brand-new Dell XPS 13
>> running Debian testing is working just fine.  The year, model, make,
>> and CPU type plus what distribution (and distro version number) you
>> are running is useful, so I can assess how wide spread the unhappiness
>> is going to be, and what mitigation steps make sense.
>
> Fedora has started seeing some bug reports on this for Fedora 27[0] and
> I've asked reporters to include their hardware details.
>
> [0] https://bugzilla.redhat.com/show_bug.cgi?id=1572944
>

We have also had reports that Fedora users are seeing this on Google
Compute Engine.

Justin


Re: Linux messages full of `random: get_random_u32 called from`

2018-05-01 Thread Justin Forbes
On Mon, Apr 30, 2018 at 4:12 PM, Jeremy Cline  wrote:
> On 04/29/2018 06:05 PM, Theodore Y. Ts'o wrote:
>> On Sun, Apr 29, 2018 at 01:20:33PM -0700, Sultan Alsawaf wrote:
>>> On Sun, Apr 29, 2018 at 08:41:01PM +0200, Pavel Machek wrote:
 Umm. No. https://www.youtube.com/watch?v=xneBjc8z0DE
>>>
>>> Okay, but /dev/urandom isn't a solution to this problem because it isn't 
>>> usable
>>> until crng init is complete, so it suffers from the same init lag as
>>> /dev/random.
>>
>> It's more accurate to say that using /dev/urandom is no worse than
>> before (from a few years ago).  There are, alas, plenty of
>> distributions and user space application programmers that basically
>> got lazy using /dev/urandom, and assumed that there would be plenty of
>> entropy during early system startup.
>>
>> When they switched over the getrandom(2), the most egregious examples
>> of this caused pain (and they got fixed), but due to a bug in
>> drivers/char/random.c, if getrandom(2) was called after the entropy
>> pool was "half initialized", it would not block, but proceed.
>>
>> Is that exploitable?  Well, Jann and I didn't find an _obvious_ way to
>> exploit the short coming, which is this wasn't treated like an
>> emergency situation ala the embarassing situation we had five years
>> ago[1].
>>
>> [1] https://factorable.net/paper.html
>>
>> However, it was enough to make us be uncomfortable, which is why I
>> pushed the changes that I did.  At least on the devices we had at
>> hand, using the distributions that we typically use, the impact seemed
>> minimal.  Unfortuantely, there is no way to know for sure without
>> rolling out change and seeing who screams.  In the ideal world,
>> software would not require cryptographic randomness immediately after
>> boot, before the user logs in.  And ***really***, as in [1], softwaret
>> should not be generating long-term public keys that are essential to
>> the security of the box a few seconds immediately after the device is
>> first unboxed and plugged in.i
>>
>> What would be useful is if people gave reports that listed exactly
>> what laptop and distributions they are using.  Just "a high spec x86
>> laptop" isn't terribly useful, because *my* brand-new Dell XPS 13
>> running Debian testing is working just fine.  The year, model, make,
>> and CPU type plus what distribution (and distro version number) you
>> are running is useful, so I can assess how wide spread the unhappiness
>> is going to be, and what mitigation steps make sense.
>
> Fedora has started seeing some bug reports on this for Fedora 27[0] and
> I've asked reporters to include their hardware details.
>
> [0] https://bugzilla.redhat.com/show_bug.cgi?id=1572944
>

We have also had reports that Fedora users are seeing this on Google
Compute Engine.

Justin


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-30 Thread Jeremy Cline
On 04/29/2018 06:05 PM, Theodore Y. Ts'o wrote:
> On Sun, Apr 29, 2018 at 01:20:33PM -0700, Sultan Alsawaf wrote:
>> On Sun, Apr 29, 2018 at 08:41:01PM +0200, Pavel Machek wrote:
>>> Umm. No. https://www.youtube.com/watch?v=xneBjc8z0DE
>>
>> Okay, but /dev/urandom isn't a solution to this problem because it isn't 
>> usable
>> until crng init is complete, so it suffers from the same init lag as
>> /dev/random.
> 
> It's more accurate to say that using /dev/urandom is no worse than
> before (from a few years ago).  There are, alas, plenty of
> distributions and user space application programmers that basically
> got lazy using /dev/urandom, and assumed that there would be plenty of
> entropy during early system startup.
> 
> When they switched over the getrandom(2), the most egregious examples
> of this caused pain (and they got fixed), but due to a bug in
> drivers/char/random.c, if getrandom(2) was called after the entropy
> pool was "half initialized", it would not block, but proceed.
> 
> Is that exploitable?  Well, Jann and I didn't find an _obvious_ way to
> exploit the short coming, which is this wasn't treated like an
> emergency situation ala the embarassing situation we had five years
> ago[1].
> 
> [1] https://factorable.net/paper.html
> 
> However, it was enough to make us be uncomfortable, which is why I
> pushed the changes that I did.  At least on the devices we had at
> hand, using the distributions that we typically use, the impact seemed
> minimal.  Unfortuantely, there is no way to know for sure without
> rolling out change and seeing who screams.  In the ideal world,
> software would not require cryptographic randomness immediately after
> boot, before the user logs in.  And ***really***, as in [1], softwaret
> should not be generating long-term public keys that are essential to
> the security of the box a few seconds immediately after the device is
> first unboxed and plugged in.i
> 
> What would be useful is if people gave reports that listed exactly
> what laptop and distributions they are using.  Just "a high spec x86
> laptop" isn't terribly useful, because *my* brand-new Dell XPS 13
> running Debian testing is working just fine.  The year, model, make,
> and CPU type plus what distribution (and distro version number) you
> are running is useful, so I can assess how wide spread the unhappiness
> is going to be, and what mitigation steps make sense.

Fedora has started seeing some bug reports on this for Fedora 27[0] and
I've asked reporters to include their hardware details.

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1572944


Regards,
Jeremy


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-30 Thread Jeremy Cline
On 04/29/2018 06:05 PM, Theodore Y. Ts'o wrote:
> On Sun, Apr 29, 2018 at 01:20:33PM -0700, Sultan Alsawaf wrote:
>> On Sun, Apr 29, 2018 at 08:41:01PM +0200, Pavel Machek wrote:
>>> Umm. No. https://www.youtube.com/watch?v=xneBjc8z0DE
>>
>> Okay, but /dev/urandom isn't a solution to this problem because it isn't 
>> usable
>> until crng init is complete, so it suffers from the same init lag as
>> /dev/random.
> 
> It's more accurate to say that using /dev/urandom is no worse than
> before (from a few years ago).  There are, alas, plenty of
> distributions and user space application programmers that basically
> got lazy using /dev/urandom, and assumed that there would be plenty of
> entropy during early system startup.
> 
> When they switched over the getrandom(2), the most egregious examples
> of this caused pain (and they got fixed), but due to a bug in
> drivers/char/random.c, if getrandom(2) was called after the entropy
> pool was "half initialized", it would not block, but proceed.
> 
> Is that exploitable?  Well, Jann and I didn't find an _obvious_ way to
> exploit the short coming, which is this wasn't treated like an
> emergency situation ala the embarassing situation we had five years
> ago[1].
> 
> [1] https://factorable.net/paper.html
> 
> However, it was enough to make us be uncomfortable, which is why I
> pushed the changes that I did.  At least on the devices we had at
> hand, using the distributions that we typically use, the impact seemed
> minimal.  Unfortuantely, there is no way to know for sure without
> rolling out change and seeing who screams.  In the ideal world,
> software would not require cryptographic randomness immediately after
> boot, before the user logs in.  And ***really***, as in [1], softwaret
> should not be generating long-term public keys that are essential to
> the security of the box a few seconds immediately after the device is
> first unboxed and plugged in.i
> 
> What would be useful is if people gave reports that listed exactly
> what laptop and distributions they are using.  Just "a high spec x86
> laptop" isn't terribly useful, because *my* brand-new Dell XPS 13
> running Debian testing is working just fine.  The year, model, make,
> and CPU type plus what distribution (and distro version number) you
> are running is useful, so I can assess how wide spread the unhappiness
> is going to be, and what mitigation steps make sense.

Fedora has started seeing some bug reports on this for Fedora 27[0] and
I've asked reporters to include their hardware details.

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1572944


Regards,
Jeremy


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-30 Thread Theodore Y. Ts'o
On Sun, Apr 29, 2018 at 09:34:45PM -0700, Sultan Alsawaf wrote:
> 
> What about abusing high-resolution timers to get entropy? Since hrtimers can't
> make guarantees down to the nanosecond, there's always a skew between the
> requested expiry time and the actual expiry time.
> 
> Please see the attached patch and let me know just how horrible it is.

So think about exactly where the possible causes of the skew might be
coming from.  Look very closely at the software implemntation.  The
important thing here is to not get hung up on the software
abstraction, but to look at the *implementation*.  (And if it's an
implementation in architecture specific code, we need to look at all
architectures.)

This applies on the hardware level as hard, but that gets harder
because there many possible hardware implemntations in use out there.
Remember that that on many systems there may be only single clock
crystal, and all other hardware timers maybe derived from that clock
using frequency dividers.  (At least for everything on the mainboard.)

- Ted


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-30 Thread Theodore Y. Ts'o
On Sun, Apr 29, 2018 at 09:34:45PM -0700, Sultan Alsawaf wrote:
> 
> What about abusing high-resolution timers to get entropy? Since hrtimers can't
> make guarantees down to the nanosecond, there's always a skew between the
> requested expiry time and the actual expiry time.
> 
> Please see the attached patch and let me know just how horrible it is.

So think about exactly where the possible causes of the skew might be
coming from.  Look very closely at the software implemntation.  The
important thing here is to not get hung up on the software
abstraction, but to look at the *implementation*.  (And if it's an
implementation in architecture specific code, we need to look at all
architectures.)

This applies on the hardware level as hard, but that gets harder
because there many possible hardware implemntations in use out there.
Remember that that on many systems there may be only single clock
crystal, and all other hardware timers maybe derived from that clock
using frequency dividers.  (At least for everything on the mainboard.)

- Ted


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Sultan Alsawaf
On Sun, Apr 29, 2018 at 08:11:07PM -0400, Theodore Y. Ts'o wrote:
>
> What your patch does is assume that there is a full bit of uncertainty
> that can be obtained from the information gathered from each
> interrupt.  I *might* be willing to assume that to be valid on x86
> systems that have a high resolution cycle counter.  But on ARM
> platforms, especially during system bootup when the user isn't typing
> anything and SSD's and flash storage tend to have very predictable
> timing patterns?  Not a bet I'd be willing to take.  Even with a cycle
> counter, there's a reason why we assumed that we need to mix in timing
> results from 64 interrupts or one second's worth before we would give
> a single bit's worth of entropy credit.
> 
>   - Ted

What about abusing high-resolution timers to get entropy? Since hrtimers can't
make guarantees down to the nanosecond, there's always a skew between the
requested expiry time and the actual expiry time.

Please see the attached patch and let me know just how horrible it is.

Sultan

>From b0d21c38558c661531d4cb46816fbb36b874a169 Mon Sep 17 00:00:00 2001
From: Sultan Alsawaf 
Date: Sun, 29 Apr 2018 21:28:08 -0700
Subject: [PATCH] random: use high-res timers to generate entropy until crng
 init is done

---
 drivers/char/random.c | 47 +++
 1 file changed, 47 insertions(+)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index d9e38523b383..af2d60bbcec3 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -286,6 +286,7 @@
 #define OUTPUT_POOL_WORDS  (1 << (OUTPUT_POOL_SHIFT-5))
 #define SEC_XFER_SIZE  512
 #define EXTRACT_SIZE   10
+#define ENTROPY_GEN_INTVL_NS   (1 * NSEC_PER_MSEC)
 
 
 #define LONGS(x) (((x) + sizeof(unsigned long) - 1)/sizeof(unsigned long))
@@ -408,6 +409,8 @@ static struct fasync_struct *fasync;
 static DEFINE_SPINLOCK(random_ready_list_lock);
 static LIST_HEAD(random_ready_list);
 
+static struct hrtimer entropy_gen_hrtimer;
+
 struct crng_state {
__u32   state[16];
unsigned long   init_time;
@@ -2287,3 +2290,47 @@ void add_hwgenerator_randomness(const char *buffer, 
size_t count,
credit_entropy_bits(poolp, entropy);
 }
 EXPORT_SYMBOL_GPL(add_hwgenerator_randomness);
+
+/*
+ * Generate entropy on init using high-res timers. Although high-res timers
+ * provide nanosecond precision, they don't actually honor requests to the
+ * nanosecond. The skew between the expected time difference in nanoseconds and
+ * the actual time difference can be used as a way to generate entropy on boot
+ * for machines that lack sufficient boot-time entropy.
+ */
+static enum hrtimer_restart entropy_timer_cb(struct hrtimer *timer)
+{
+   static u64 prev_ns;
+   u64 curr_ns, delta;
+
+   if (crng_ready())
+   return HRTIMER_NORESTART;
+
+   curr_ns = ktime_get_mono_fast_ns();
+   delta = curr_ns - prev_ns;
+
+   add_interrupt_randomness(delta);
+
+   /* Use the hrtimer skew to make the next interval more unpredictable */
+   if (likely(prev_ns))
+   hrtimer_add_expires_ns(timer, delta);
+   else
+   hrtimer_add_expires_ns(timer, ENTROPY_GEN_INTVL_NS);
+
+   prev_ns = curr_ns;
+   return HRTIMER_RESTART;
+}
+
+static int entropy_gen_hrtimer_init(void)
+{
+   if (!IS_ENABLED(CONFIG_HIGH_RES_TIMERS))
+   return 0;
+
+   hrtimer_init(_gen_hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+
+   entropy_gen_hrtimer.function = entropy_timer_cb;
+   hrtimer_start(_gen_hrtimer, ns_to_ktime(ENTROPY_GEN_INTVL_NS),
+   HRTIMER_MODE_REL);
+   return 0;
+}
+core_initcall(entropy_gen_hrtimer_init);
-- 
2.14.1


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Sultan Alsawaf
On Sun, Apr 29, 2018 at 08:11:07PM -0400, Theodore Y. Ts'o wrote:
>
> What your patch does is assume that there is a full bit of uncertainty
> that can be obtained from the information gathered from each
> interrupt.  I *might* be willing to assume that to be valid on x86
> systems that have a high resolution cycle counter.  But on ARM
> platforms, especially during system bootup when the user isn't typing
> anything and SSD's and flash storage tend to have very predictable
> timing patterns?  Not a bet I'd be willing to take.  Even with a cycle
> counter, there's a reason why we assumed that we need to mix in timing
> results from 64 interrupts or one second's worth before we would give
> a single bit's worth of entropy credit.
> 
>   - Ted

What about abusing high-resolution timers to get entropy? Since hrtimers can't
make guarantees down to the nanosecond, there's always a skew between the
requested expiry time and the actual expiry time.

Please see the attached patch and let me know just how horrible it is.

Sultan

>From b0d21c38558c661531d4cb46816fbb36b874a169 Mon Sep 17 00:00:00 2001
From: Sultan Alsawaf 
Date: Sun, 29 Apr 2018 21:28:08 -0700
Subject: [PATCH] random: use high-res timers to generate entropy until crng
 init is done

---
 drivers/char/random.c | 47 +++
 1 file changed, 47 insertions(+)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index d9e38523b383..af2d60bbcec3 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -286,6 +286,7 @@
 #define OUTPUT_POOL_WORDS  (1 << (OUTPUT_POOL_SHIFT-5))
 #define SEC_XFER_SIZE  512
 #define EXTRACT_SIZE   10
+#define ENTROPY_GEN_INTVL_NS   (1 * NSEC_PER_MSEC)
 
 
 #define LONGS(x) (((x) + sizeof(unsigned long) - 1)/sizeof(unsigned long))
@@ -408,6 +409,8 @@ static struct fasync_struct *fasync;
 static DEFINE_SPINLOCK(random_ready_list_lock);
 static LIST_HEAD(random_ready_list);
 
+static struct hrtimer entropy_gen_hrtimer;
+
 struct crng_state {
__u32   state[16];
unsigned long   init_time;
@@ -2287,3 +2290,47 @@ void add_hwgenerator_randomness(const char *buffer, 
size_t count,
credit_entropy_bits(poolp, entropy);
 }
 EXPORT_SYMBOL_GPL(add_hwgenerator_randomness);
+
+/*
+ * Generate entropy on init using high-res timers. Although high-res timers
+ * provide nanosecond precision, they don't actually honor requests to the
+ * nanosecond. The skew between the expected time difference in nanoseconds and
+ * the actual time difference can be used as a way to generate entropy on boot
+ * for machines that lack sufficient boot-time entropy.
+ */
+static enum hrtimer_restart entropy_timer_cb(struct hrtimer *timer)
+{
+   static u64 prev_ns;
+   u64 curr_ns, delta;
+
+   if (crng_ready())
+   return HRTIMER_NORESTART;
+
+   curr_ns = ktime_get_mono_fast_ns();
+   delta = curr_ns - prev_ns;
+
+   add_interrupt_randomness(delta);
+
+   /* Use the hrtimer skew to make the next interval more unpredictable */
+   if (likely(prev_ns))
+   hrtimer_add_expires_ns(timer, delta);
+   else
+   hrtimer_add_expires_ns(timer, ENTROPY_GEN_INTVL_NS);
+
+   prev_ns = curr_ns;
+   return HRTIMER_RESTART;
+}
+
+static int entropy_gen_hrtimer_init(void)
+{
+   if (!IS_ENABLED(CONFIG_HIGH_RES_TIMERS))
+   return 0;
+
+   hrtimer_init(_gen_hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+
+   entropy_gen_hrtimer.function = entropy_timer_cb;
+   hrtimer_start(_gen_hrtimer, ns_to_ktime(ENTROPY_GEN_INTVL_NS),
+   HRTIMER_MODE_REL);
+   return 0;
+}
+core_initcall(entropy_gen_hrtimer_init);
-- 
2.14.1


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Laura Abbott

On 04/29/2018 03:05 PM, Theodore Y. Ts'o wrote:

What would be useful is if people gave reports that listed exactly
what laptop and distributions they are using.  Just "a high spec x86
laptop" isn't terribly useful, because*my*  brand-new Dell XPS 13
running Debian testing is working just fine.  The year, model, make,
and CPU type plus what distribution (and distro version number) you
are running is useful, so I can assess how wide spread the unhappiness
is going to be, and what mitigation steps make sense.


I'm pretty sure Fedora is hitting this in our VMs. I just spent some
time debugging an issue of a boot delay with someone from the
infrastructure team where it would take upwards of 2 minutes to boot.
If someone holds down a key, it boots in 4 seconds. There's a qemu
reproducer at https://bugzilla.redhat.com/show_bug.cgi?id=1572916#c3
I suggested a cat on the keyboard as a workaround.

Independently, we also got a report of a boot hang in GCE with 4.16.4
where as 4.16.3 works which corresponds to the previous report of a
stable regression. This was just via IRC so I didn't have time to
dig into this.

Thanks,
Laura


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Laura Abbott

On 04/29/2018 03:05 PM, Theodore Y. Ts'o wrote:

What would be useful is if people gave reports that listed exactly
what laptop and distributions they are using.  Just "a high spec x86
laptop" isn't terribly useful, because*my*  brand-new Dell XPS 13
running Debian testing is working just fine.  The year, model, make,
and CPU type plus what distribution (and distro version number) you
are running is useful, so I can assess how wide spread the unhappiness
is going to be, and what mitigation steps make sense.


I'm pretty sure Fedora is hitting this in our VMs. I just spent some
time debugging an issue of a boot delay with someone from the
infrastructure team where it would take upwards of 2 minutes to boot.
If someone holds down a key, it boots in 4 seconds. There's a qemu
reproducer at https://bugzilla.redhat.com/show_bug.cgi?id=1572916#c3
I suggested a cat on the keyboard as a workaround.

Independently, we also got a report of a boot hang in GCE with 4.16.4
where as 4.16.3 works which corresponds to the previous report of a
stable regression. This was just via IRC so I didn't have time to
dig into this.

Thanks,
Laura


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Theodore Y. Ts'o
On Sun, Apr 29, 2018 at 07:07:29PM -0400, Dave Jones wrote:
>  > Why do we continue to print this stuff out when crng_init=1 though ?
> 
> answering my own question, I think.. This is a tristate, and we need it
> to be >1 to be quiet, which doesn't happen until..
> 
>  > [  165.806247] random: crng init done
> 
> this point.

Right.  What happens is that we divert the first 64 bits of entropy
credits directly into the crng state, without initializing the
input_pool.  So when we hit crng_init=1, the crng has only 64 bits of
entropy (conservatively speaking); furthermore, since we aren't doing
catastrophic reseeding, if something is continuously reading from
/dev/urandom or get_random_bytes() during that time, then the attacker
could be able to detremine which one of the 32 states the entropy pool
was when the entropy count was 5, and then 5 bits later, poll the
output of the pool again, and guess which of the 32 states the pool
was in, etc., and effectively keep up with the entropy as it trickles
in.

This is the reasoning behind catastrophic reseeding; we wait until we
have 128 bits of entropy in the input pool, and then we reseed the
pool all at once.

Why do we have the crng_init=1 state?  Because it provides some basic
protection for super-early users of the entropy pool.  It's
essentially a bandaid, and we could improve the time to get to fully
initialize by about 33% if we left the pool totally unititalized and
only focused on filling the input pool.  But given that on many
distributions, ssh still insists on initializing long-term public keys
at first boot from /dev/urandom, instead of *waiting* until the first
time someone attempts to ssh into box, or waiting until getrandom(2)
doesn't block --- without hanging the boot --- we have the crng_init=1
hack essentially as a palliative.

I view this as working around broken user space.  But userspace has
been broken for a long time, and users tend to blame the kernel, not
userspace

- Ted


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Theodore Y. Ts'o
On Sun, Apr 29, 2018 at 07:07:29PM -0400, Dave Jones wrote:
>  > Why do we continue to print this stuff out when crng_init=1 though ?
> 
> answering my own question, I think.. This is a tristate, and we need it
> to be >1 to be quiet, which doesn't happen until..
> 
>  > [  165.806247] random: crng init done
> 
> this point.

Right.  What happens is that we divert the first 64 bits of entropy
credits directly into the crng state, without initializing the
input_pool.  So when we hit crng_init=1, the crng has only 64 bits of
entropy (conservatively speaking); furthermore, since we aren't doing
catastrophic reseeding, if something is continuously reading from
/dev/urandom or get_random_bytes() during that time, then the attacker
could be able to detremine which one of the 32 states the entropy pool
was when the entropy count was 5, and then 5 bits later, poll the
output of the pool again, and guess which of the 32 states the pool
was in, etc., and effectively keep up with the entropy as it trickles
in.

This is the reasoning behind catastrophic reseeding; we wait until we
have 128 bits of entropy in the input pool, and then we reseed the
pool all at once.

Why do we have the crng_init=1 state?  Because it provides some basic
protection for super-early users of the entropy pool.  It's
essentially a bandaid, and we could improve the time to get to fully
initialize by about 33% if we left the pool totally unititalized and
only focused on filling the input pool.  But given that on many
distributions, ssh still insists on initializing long-term public keys
at first boot from /dev/urandom, instead of *waiting* until the first
time someone attempts to ssh into box, or waiting until getrandom(2)
doesn't block --- without hanging the boot --- we have the crng_init=1
hack essentially as a palliative.

I view this as working around broken user space.  But userspace has
been broken for a long time, and users tend to blame the kernel, not
userspace

- Ted


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Theodore Y. Ts'o
On Sun, Apr 29, 2018 at 03:49:28PM -0700, Sultan Alsawaf wrote:
> On Mon, Apr 30, 2018 at 12:43:48AM +0200, Jason A. Donenfeld wrote:
> > > -   if ((fast_pool->count < 64) &&
> > > -   !time_after(now, fast_pool->last + HZ))
> > > -   return;
> > > -
> > 
> > I suspect you still want the rate-limiting in place. But if you _do_
> > want to cheat like this, you could instead just modify the condition
> > to only relax the rate limiting when !crng_init().
> 
> Good idea. Attached a new patch that's less intrusive. It still fixes my 
> issue,
> of course.

What your patch does is assume that there is a full bit of uncertainty
that can be obtained from the information gathered from each
interrupt.  I *might* be willing to assume that to be valid on x86
systems that have a high resolution cycle counter.  But on ARM
platforms, especially during system bootup when the user isn't typing
anything and SSD's and flash storage tend to have very predictable
timing patterns?  Not a bet I'd be willing to take.  Even with a cycle
counter, there's a reason why we assumed that we need to mix in timing
results from 64 interrupts or one second's worth before we would give
a single bit's worth of entropy credit.

- Ted


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Theodore Y. Ts'o
On Sun, Apr 29, 2018 at 03:49:28PM -0700, Sultan Alsawaf wrote:
> On Mon, Apr 30, 2018 at 12:43:48AM +0200, Jason A. Donenfeld wrote:
> > > -   if ((fast_pool->count < 64) &&
> > > -   !time_after(now, fast_pool->last + HZ))
> > > -   return;
> > > -
> > 
> > I suspect you still want the rate-limiting in place. But if you _do_
> > want to cheat like this, you could instead just modify the condition
> > to only relax the rate limiting when !crng_init().
> 
> Good idea. Attached a new patch that's less intrusive. It still fixes my 
> issue,
> of course.

What your patch does is assume that there is a full bit of uncertainty
that can be obtained from the information gathered from each
interrupt.  I *might* be willing to assume that to be valid on x86
systems that have a high resolution cycle counter.  But on ARM
platforms, especially during system bootup when the user isn't typing
anything and SSD's and flash storage tend to have very predictable
timing patterns?  Not a bet I'd be willing to take.  Even with a cycle
counter, there's a reason why we assumed that we need to mix in timing
results from 64 interrupts or one second's worth before we would give
a single bit's worth of entropy credit.

- Ted


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Dave Jones
On Sun, Apr 29, 2018 at 07:02:02PM -0400, Dave Jones wrote:
 > On Tue, Apr 24, 2018 at 09:56:21AM -0400, Theodore Y. Ts'o wrote:
 > 
 >  > Can you tell me a bit about your system?  What distribution, what
 >  > hardware is present in your sytsem (what architecture, what
 >  > peripherals are attached, etc.)?
 >  > 
 >  > There's a reason why we made this --- we were declaring the random
 >  > number pool to be fully intialized before it really was, and that was
 >  > a potential security concern.  It's not as bad as the weakness
 >  > discovered by Nadia Heninger in 2012.  (See https://factorable.net for
 >  > more details.)  However, this is not one of those things where we like
 >  > to fool around.
 >  > 
 >  > So I want to understand if this is an issue with a particular hardware
 >  > configuration, or whether it's just a badly designed Linux init system
 >  > or embedded setup, or something else.  After all, you wouldn't want
 >  > the NSA spying on all of your network traffic, would you?  :-)
 > 
 > Why do we continue to print this stuff out when crng_init=1 though ?

answering my own question, I think.. This is a tristate, and we need it
to be >1 to be quiet, which doesn't happen until..

 > [  165.806247] random: crng init done

this point.

Dave



Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Dave Jones
On Sun, Apr 29, 2018 at 07:02:02PM -0400, Dave Jones wrote:
 > On Tue, Apr 24, 2018 at 09:56:21AM -0400, Theodore Y. Ts'o wrote:
 > 
 >  > Can you tell me a bit about your system?  What distribution, what
 >  > hardware is present in your sytsem (what architecture, what
 >  > peripherals are attached, etc.)?
 >  > 
 >  > There's a reason why we made this --- we were declaring the random
 >  > number pool to be fully intialized before it really was, and that was
 >  > a potential security concern.  It's not as bad as the weakness
 >  > discovered by Nadia Heninger in 2012.  (See https://factorable.net for
 >  > more details.)  However, this is not one of those things where we like
 >  > to fool around.
 >  > 
 >  > So I want to understand if this is an issue with a particular hardware
 >  > configuration, or whether it's just a badly designed Linux init system
 >  > or embedded setup, or something else.  After all, you wouldn't want
 >  > the NSA spying on all of your network traffic, would you?  :-)
 > 
 > Why do we continue to print this stuff out when crng_init=1 though ?

answering my own question, I think.. This is a tristate, and we need it
to be >1 to be quiet, which doesn't happen until..

 > [  165.806247] random: crng init done

this point.

Dave



Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Dave Jones
On Tue, Apr 24, 2018 at 09:56:21AM -0400, Theodore Y. Ts'o wrote:

 > Can you tell me a bit about your system?  What distribution, what
 > hardware is present in your sytsem (what architecture, what
 > peripherals are attached, etc.)?
 > 
 > There's a reason why we made this --- we were declaring the random
 > number pool to be fully intialized before it really was, and that was
 > a potential security concern.  It's not as bad as the weakness
 > discovered by Nadia Heninger in 2012.  (See https://factorable.net for
 > more details.)  However, this is not one of those things where we like
 > to fool around.
 > 
 > So I want to understand if this is an issue with a particular hardware
 > configuration, or whether it's just a badly designed Linux init system
 > or embedded setup, or something else.  After all, you wouldn't want
 > the NSA spying on all of your network traffic, would you?  :-)

Why do we continue to print this stuff out when crng_init=1 though ?

(This from debian stable, on a pretty basic atom box, but similar
dmesg's on everything else I've put 4.17-rc on so far)

[0.00] random: get_random_bytes called from start_kernel+0x96/0x519 
with crng_init=0
[0.00] random: get_random_u64 called from 
__kmem_cache_create+0x39/0x450 with crng_init=0
[0.00] random: get_random_u64 called from 
cache_random_seq_create+0x76/0x120 with crng_init=0
[0.151401] calling  initialize_ptr_random+0x0/0x36 @ 1
[0.151527] initcall initialize_ptr_random+0x0/0x36 returned 0 after 0 usecs
[0.294661] calling  prandom_init+0x0/0xbd @ 1
[0.294763] initcall prandom_init+0x0/0xbd returned 0 after 0 usecs
[1.430529] _warn_unseeded_randomness: 165 callbacks suppressed
[1.430540] random: get_random_u64 called from 
__kmem_cache_create+0x39/0x450 with crng_init=0
[1.430860] random: get_random_u64 called from 
cache_random_seq_create+0x76/0x120 with crng_init=0
[1.452240] random: get_random_u64 called from 
copy_process.part.67+0x1ae/0x1e60 with crng_init=0
[2.954901] _warn_unseeded_randomness: 54 callbacks suppressed
[2.954910] random: get_random_u64 called from 
__kmem_cache_create+0x39/0x450 with crng_init=0
[2.955185] random: get_random_u64 called from 
cache_random_seq_create+0x76/0x120 with crng_init=0
[2.957701] random: get_random_u64 called from 
__kmem_cache_create+0x39/0x450 with crng_init=0
[6.017364] _warn_unseeded_randomness: 88 callbacks suppressed
[6.017373] random: get_random_u64 called from 
__kmem_cache_create+0x39/0x450 with crng_init=0
[6.042652] random: get_random_u64 called from 
cache_random_seq_create+0x76/0x120 with crng_init=0
[6.060333] random: get_random_u64 called from 
__kmem_cache_create+0x39/0x450 with crng_init=0
[6.951978] calling  prandom_reseed+0x0/0x2a @ 1
[6.960627] initcall prandom_reseed+0x0/0x2a returned 0 after 105 usecs
[7.371745] _warn_unseeded_randomness: 37 callbacks suppressed
[7.371759] random: get_random_u64 called from 
arch_pick_mmap_layout+0x64/0x130 with crng_init=0
[7.395926] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 
with crng_init=0
[7.411549] random: get_random_u32 called from arch_align_stack+0x37/0x50 
with crng_init=0
[7.553379] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[7.563210] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[7.571498] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[8.449679] _warn_unseeded_randomness: 154 callbacks suppressed
[8.449691] random: get_random_u64 called from 
copy_process.part.67+0x1ae/0x1e60 with crng_init=0
[8.483097] random: get_random_u64 called from 
arch_pick_mmap_layout+0x64/0x130 with crng_init=0
[8.497999] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 
with crng_init=0
[9.353904] random: fast init done
[9.770384] _warn_unseeded_randomness: 187 callbacks suppressed
[9.770398] random: get_random_u32 called from bucket_table_alloc+0x84/0x1b0 
with crng_init=1
[9.791514] random: get_random_u32 called from new_slab+0x174/0x680 with 
crng_init=1
[9.834909] random: get_random_u64 called from 
copy_process.part.67+0x1ae/0x1e60 with crng_init=1
[   10.802200] _warn_unseeded_randomness: 168 callbacks suppressed
[   10.802214] random: get_random_u64 called from 
arch_pick_mmap_layout+0x64/0x130 with crng_init=1
[   10.802276] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 
with crng_init=1
[   10.802289] random: get_random_u32 called from arch_align_stack+0x37/0x50 
with crng_init=1
[   11.821109] _warn_unseeded_randomness: 160 callbacks suppressed
[   11.821122] random: get_random_u64 called from 
copy_process.part.67+0x1ae/0x1e60 with crng_init=1
[   11.863770] random: get_random_u32 called from bucket_table_alloc+0x84/0x1b0 
with crng_init=1
[   11.869384] random: get_random_u32 called from new_slab+0x174/0x680 with 
crng_init=1
[   12.843237] 

Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Dave Jones
On Tue, Apr 24, 2018 at 09:56:21AM -0400, Theodore Y. Ts'o wrote:

 > Can you tell me a bit about your system?  What distribution, what
 > hardware is present in your sytsem (what architecture, what
 > peripherals are attached, etc.)?
 > 
 > There's a reason why we made this --- we were declaring the random
 > number pool to be fully intialized before it really was, and that was
 > a potential security concern.  It's not as bad as the weakness
 > discovered by Nadia Heninger in 2012.  (See https://factorable.net for
 > more details.)  However, this is not one of those things where we like
 > to fool around.
 > 
 > So I want to understand if this is an issue with a particular hardware
 > configuration, or whether it's just a badly designed Linux init system
 > or embedded setup, or something else.  After all, you wouldn't want
 > the NSA spying on all of your network traffic, would you?  :-)

Why do we continue to print this stuff out when crng_init=1 though ?

(This from debian stable, on a pretty basic atom box, but similar
dmesg's on everything else I've put 4.17-rc on so far)

[0.00] random: get_random_bytes called from start_kernel+0x96/0x519 
with crng_init=0
[0.00] random: get_random_u64 called from 
__kmem_cache_create+0x39/0x450 with crng_init=0
[0.00] random: get_random_u64 called from 
cache_random_seq_create+0x76/0x120 with crng_init=0
[0.151401] calling  initialize_ptr_random+0x0/0x36 @ 1
[0.151527] initcall initialize_ptr_random+0x0/0x36 returned 0 after 0 usecs
[0.294661] calling  prandom_init+0x0/0xbd @ 1
[0.294763] initcall prandom_init+0x0/0xbd returned 0 after 0 usecs
[1.430529] _warn_unseeded_randomness: 165 callbacks suppressed
[1.430540] random: get_random_u64 called from 
__kmem_cache_create+0x39/0x450 with crng_init=0
[1.430860] random: get_random_u64 called from 
cache_random_seq_create+0x76/0x120 with crng_init=0
[1.452240] random: get_random_u64 called from 
copy_process.part.67+0x1ae/0x1e60 with crng_init=0
[2.954901] _warn_unseeded_randomness: 54 callbacks suppressed
[2.954910] random: get_random_u64 called from 
__kmem_cache_create+0x39/0x450 with crng_init=0
[2.955185] random: get_random_u64 called from 
cache_random_seq_create+0x76/0x120 with crng_init=0
[2.957701] random: get_random_u64 called from 
__kmem_cache_create+0x39/0x450 with crng_init=0
[6.017364] _warn_unseeded_randomness: 88 callbacks suppressed
[6.017373] random: get_random_u64 called from 
__kmem_cache_create+0x39/0x450 with crng_init=0
[6.042652] random: get_random_u64 called from 
cache_random_seq_create+0x76/0x120 with crng_init=0
[6.060333] random: get_random_u64 called from 
__kmem_cache_create+0x39/0x450 with crng_init=0
[6.951978] calling  prandom_reseed+0x0/0x2a @ 1
[6.960627] initcall prandom_reseed+0x0/0x2a returned 0 after 105 usecs
[7.371745] _warn_unseeded_randomness: 37 callbacks suppressed
[7.371759] random: get_random_u64 called from 
arch_pick_mmap_layout+0x64/0x130 with crng_init=0
[7.395926] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 
with crng_init=0
[7.411549] random: get_random_u32 called from arch_align_stack+0x37/0x50 
with crng_init=0
[7.553379] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[7.563210] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[7.571498] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[8.449679] _warn_unseeded_randomness: 154 callbacks suppressed
[8.449691] random: get_random_u64 called from 
copy_process.part.67+0x1ae/0x1e60 with crng_init=0
[8.483097] random: get_random_u64 called from 
arch_pick_mmap_layout+0x64/0x130 with crng_init=0
[8.497999] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 
with crng_init=0
[9.353904] random: fast init done
[9.770384] _warn_unseeded_randomness: 187 callbacks suppressed
[9.770398] random: get_random_u32 called from bucket_table_alloc+0x84/0x1b0 
with crng_init=1
[9.791514] random: get_random_u32 called from new_slab+0x174/0x680 with 
crng_init=1
[9.834909] random: get_random_u64 called from 
copy_process.part.67+0x1ae/0x1e60 with crng_init=1
[   10.802200] _warn_unseeded_randomness: 168 callbacks suppressed
[   10.802214] random: get_random_u64 called from 
arch_pick_mmap_layout+0x64/0x130 with crng_init=1
[   10.802276] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 
with crng_init=1
[   10.802289] random: get_random_u32 called from arch_align_stack+0x37/0x50 
with crng_init=1
[   11.821109] _warn_unseeded_randomness: 160 callbacks suppressed
[   11.821122] random: get_random_u64 called from 
copy_process.part.67+0x1ae/0x1e60 with crng_init=1
[   11.863770] random: get_random_u32 called from bucket_table_alloc+0x84/0x1b0 
with crng_init=1
[   11.869384] random: get_random_u32 called from new_slab+0x174/0x680 with 
crng_init=1
[   12.843237] 

Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Sultan Alsawaf
On Mon, Apr 30, 2018 at 12:43:48AM +0200, Jason A. Donenfeld wrote:
> > -   if ((fast_pool->count < 64) &&
> > -   !time_after(now, fast_pool->last + HZ))
> > -   return;
> > -
> 
> I suspect you still want the rate-limiting in place. But if you _do_
> want to cheat like this, you could instead just modify the condition
> to only relax the rate limiting when !crng_init().

Good idea. Attached a new patch that's less intrusive. It still fixes my issue,
of course.

Sultan

>From 6870b0383b88438d842599aa8608a260e6fb0ed2 Mon Sep 17 00:00:00 2001
From: Sultan Alsawaf 
Date: Sun, 29 Apr 2018 15:44:27 -0700
Subject: [PATCH] random: don't ratelimit add_interrupt_randomness() until crng
 is ready

---
 drivers/char/random.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 38729baed6ee..8c00c008e797 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1201,7 +1201,7 @@ void add_interrupt_randomness(int irq, int irq_flags)
}
 
if ((fast_pool->count < 64) &&
-   !time_after(now, fast_pool->last + HZ))
+   !time_after(now, fast_pool->last + HZ) && crng_ready())
return;
 
r = _pool;
-- 
2.14.1



Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Sultan Alsawaf
On Mon, Apr 30, 2018 at 12:43:48AM +0200, Jason A. Donenfeld wrote:
> > -   if ((fast_pool->count < 64) &&
> > -   !time_after(now, fast_pool->last + HZ))
> > -   return;
> > -
> 
> I suspect you still want the rate-limiting in place. But if you _do_
> want to cheat like this, you could instead just modify the condition
> to only relax the rate limiting when !crng_init().

Good idea. Attached a new patch that's less intrusive. It still fixes my issue,
of course.

Sultan

>From 6870b0383b88438d842599aa8608a260e6fb0ed2 Mon Sep 17 00:00:00 2001
From: Sultan Alsawaf 
Date: Sun, 29 Apr 2018 15:44:27 -0700
Subject: [PATCH] random: don't ratelimit add_interrupt_randomness() until crng
 is ready

---
 drivers/char/random.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 38729baed6ee..8c00c008e797 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1201,7 +1201,7 @@ void add_interrupt_randomness(int irq, int irq_flags)
}
 
if ((fast_pool->count < 64) &&
-   !time_after(now, fast_pool->last + HZ))
+   !time_after(now, fast_pool->last + HZ) && crng_ready())
return;
 
r = _pool;
-- 
2.14.1



Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Jason A. Donenfeld
> -   if ((fast_pool->count < 64) &&
> -   !time_after(now, fast_pool->last + HZ))
> -   return;
> -

I suspect you still want the rate-limiting in place. But if you _do_
want to cheat like this, you could instead just modify the condition
to only relax the rate limiting when !crng_init().


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Jason A. Donenfeld
> -   if ((fast_pool->count < 64) &&
> -   !time_after(now, fast_pool->last + HZ))
> -   return;
> -

I suspect you still want the rate-limiting in place. But if you _do_
want to cheat like this, you could instead just modify the condition
to only relax the rate limiting when !crng_init().


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Pavel Machek
Hi!

> What would be useful is if people gave reports that listed exactly
> what laptop and distributions they are using.  Just "a high spec x86
> laptop" isn't terribly useful, because *my* brand-new Dell XPS 13
> running Debian testing is working just fine.  The year, model, make,
> and CPU type plus what distribution (and distro version number) you
> are running is useful, so I can assess how wide spread the unhappiness
> is going to be, and what mitigation steps make sense.

Thinkpad X60,
model name  : Genuine Intel(R) CPU   T2400  @ 1.83GHz
pavel@amd:~$ cat /etc/debian_version
8.10

I already posted some dmesg snippets, but system boots. On _this_
boot, it was ok, and I do not see anything:

pavel@amd:/data/l/linux-next-32$ dmesg | grep urandom
pavel@amd:/data/l/linux-next-32$

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Pavel Machek
Hi!

> What would be useful is if people gave reports that listed exactly
> what laptop and distributions they are using.  Just "a high spec x86
> laptop" isn't terribly useful, because *my* brand-new Dell XPS 13
> running Debian testing is working just fine.  The year, model, make,
> and CPU type plus what distribution (and distro version number) you
> are running is useful, so I can assess how wide spread the unhappiness
> is going to be, and what mitigation steps make sense.

Thinkpad X60,
model name  : Genuine Intel(R) CPU   T2400  @ 1.83GHz
pavel@amd:~$ cat /etc/debian_version
8.10

I already posted some dmesg snippets, but system boots. On _this_
boot, it was ok, and I do not see anything:

pavel@amd:/data/l/linux-next-32$ dmesg | grep urandom
pavel@amd:/data/l/linux-next-32$

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Sultan Alsawaf
On Sun, Apr 29, 2018 at 06:05:19PM -0400, Theodore Y. Ts'o wrote:
> It's more accurate to say that using /dev/urandom is no worse than
> before (from a few years ago).  There are, alas, plenty of
> distributions and user space application programmers that basically
> got lazy using /dev/urandom, and assumed that there would be plenty of
> entropy during early system startup.
> 
> When they switched over the getrandom(2), the most egregious examples
> of this caused pain (and they got fixed), but due to a bug in
> drivers/char/random.c, if getrandom(2) was called after the entropy
> pool was "half initialized", it would not block, but proceed.
> 
> Is that exploitable?  Well, Jann and I didn't find an _obvious_ way to
> exploit the short coming, which is this wasn't treated like an
> emergency situation ala the embarassing situation we had five years
> ago[1].
> 
> [1] https://factorable.net/paper.html
> 
> However, it was enough to make us be uncomfortable, which is why I
> pushed the changes that I did.  At least on the devices we had at
> hand, using the distributions that we typically use, the impact seemed
> minimal.  Unfortuantely, there is no way to know for sure without
> rolling out change and seeing who screams.  In the ideal world,
> software would not require cryptographic randomness immediately after
> boot, before the user logs in.  And ***really***, as in [1], softwaret
> should not be generating long-term public keys that are essential to
> the security of the box a few seconds immediately after the device is
> first unboxed and plugged in.i
> 
> What would be useful is if people gave reports that listed exactly
> what laptop and distributions they are using.  Just "a high spec x86
> laptop" isn't terribly useful, because *my* brand-new Dell XPS 13
> running Debian testing is working just fine.  The year, model, make,
> and CPU type plus what distribution (and distro version number) you
> are running is useful, so I can assess how wide spread the unhappiness
> is going to be, and what mitigation steps make sense.
> 
> 
> What mitigations steps can be taken?
> 
> If you believe in security-through-complexity (the cache architecture
> of x86 is *so* complicated no one can understand it, so
> Jitterentropy / Haveged *must* be secure), or security-through-secrecy
> (the cache architecture of x86 is only avilable to internal architects
> inside Intel, so Jitterentropy / Haveged *must* be secure, never mind
> that the Intel CPU architects who were asked about it were "nervous"),
> then wiring up CONFIG_JITTERENTROPY or using haveged might be one
> approach.
> 
> If you believe that Intel hasn't backdoored RDRAND, then installing
> rng-tools and running rngd with --enable-drng will enable RDRAND.
> That seems to be popular with various defense contractors, perhaps on
> the assumption that if it _was_ backdoored (no one knows for sure), it
> was probably with the connivance or request of the US government, who
> doesn't need to worry about spying on itself.
> 
> Or you can use some kind of open hardware design RNG, such as
> ChoasKey[2] from Altus Metrum.  But that requires using specially
> ordered hardware plugged into a USB slot, and it's probably not a mass
> solution.
> 
> [2] https://altusmetrum.org/ChaosKey/
> 
> 
> Personally, I prefer fixing the software to simply not require
> cryptographic grade entropy before the user has logged in.  Because
> it's better than the alternatives.
> 
>   - Ted
> 

The attached patch fixes my crng init woes. With it, crng init completes 0.86
seconds into boot, but I can't help but feel like a solution this obvious would
just expose my Richard Stallman photo collection to prying eyes at the NSA.

Thoughts on the patch?

Sultan

>From 597b0f2b3c986f853bf1d30a7fb9d76869e47fe8 Mon Sep 17 00:00:00 2001
From: Sultan Alsawaf 
Date: Sun, 29 Apr 2018 15:22:59 -0700
Subject: [PATCH] random: remove ratelimiting from add_interrupt_randomness()

---
 drivers/char/random.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 38729baed6ee..5b38277b104a 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -574,7 +574,6 @@ static void mix_pool_bytes(struct entropy_store *r, const 
void *in,
 
 struct fast_pool {
__u32   pool[4];
-   unsigned long   last;
unsigned short  reg_idx;
unsigned char   count;
 };
@@ -1195,20 +1194,14 @@ void add_interrupt_randomness(int irq, int irq_flags)
crng_fast_load((char *) fast_pool->pool,
   sizeof(fast_pool->pool))) {
fast_pool->count = 0;
-   fast_pool->last = now;
}
return;
}
 
-   if ((fast_pool->count < 64) &&
-   !time_after(now, fast_pool->last + HZ))
-   return;
-
r = _pool;
if 

Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Sultan Alsawaf
On Sun, Apr 29, 2018 at 06:05:19PM -0400, Theodore Y. Ts'o wrote:
> It's more accurate to say that using /dev/urandom is no worse than
> before (from a few years ago).  There are, alas, plenty of
> distributions and user space application programmers that basically
> got lazy using /dev/urandom, and assumed that there would be plenty of
> entropy during early system startup.
> 
> When they switched over the getrandom(2), the most egregious examples
> of this caused pain (and they got fixed), but due to a bug in
> drivers/char/random.c, if getrandom(2) was called after the entropy
> pool was "half initialized", it would not block, but proceed.
> 
> Is that exploitable?  Well, Jann and I didn't find an _obvious_ way to
> exploit the short coming, which is this wasn't treated like an
> emergency situation ala the embarassing situation we had five years
> ago[1].
> 
> [1] https://factorable.net/paper.html
> 
> However, it was enough to make us be uncomfortable, which is why I
> pushed the changes that I did.  At least on the devices we had at
> hand, using the distributions that we typically use, the impact seemed
> minimal.  Unfortuantely, there is no way to know for sure without
> rolling out change and seeing who screams.  In the ideal world,
> software would not require cryptographic randomness immediately after
> boot, before the user logs in.  And ***really***, as in [1], softwaret
> should not be generating long-term public keys that are essential to
> the security of the box a few seconds immediately after the device is
> first unboxed and plugged in.i
> 
> What would be useful is if people gave reports that listed exactly
> what laptop and distributions they are using.  Just "a high spec x86
> laptop" isn't terribly useful, because *my* brand-new Dell XPS 13
> running Debian testing is working just fine.  The year, model, make,
> and CPU type plus what distribution (and distro version number) you
> are running is useful, so I can assess how wide spread the unhappiness
> is going to be, and what mitigation steps make sense.
> 
> 
> What mitigations steps can be taken?
> 
> If you believe in security-through-complexity (the cache architecture
> of x86 is *so* complicated no one can understand it, so
> Jitterentropy / Haveged *must* be secure), or security-through-secrecy
> (the cache architecture of x86 is only avilable to internal architects
> inside Intel, so Jitterentropy / Haveged *must* be secure, never mind
> that the Intel CPU architects who were asked about it were "nervous"),
> then wiring up CONFIG_JITTERENTROPY or using haveged might be one
> approach.
> 
> If you believe that Intel hasn't backdoored RDRAND, then installing
> rng-tools and running rngd with --enable-drng will enable RDRAND.
> That seems to be popular with various defense contractors, perhaps on
> the assumption that if it _was_ backdoored (no one knows for sure), it
> was probably with the connivance or request of the US government, who
> doesn't need to worry about spying on itself.
> 
> Or you can use some kind of open hardware design RNG, such as
> ChoasKey[2] from Altus Metrum.  But that requires using specially
> ordered hardware plugged into a USB slot, and it's probably not a mass
> solution.
> 
> [2] https://altusmetrum.org/ChaosKey/
> 
> 
> Personally, I prefer fixing the software to simply not require
> cryptographic grade entropy before the user has logged in.  Because
> it's better than the alternatives.
> 
>   - Ted
> 

The attached patch fixes my crng init woes. With it, crng init completes 0.86
seconds into boot, but I can't help but feel like a solution this obvious would
just expose my Richard Stallman photo collection to prying eyes at the NSA.

Thoughts on the patch?

Sultan

>From 597b0f2b3c986f853bf1d30a7fb9d76869e47fe8 Mon Sep 17 00:00:00 2001
From: Sultan Alsawaf 
Date: Sun, 29 Apr 2018 15:22:59 -0700
Subject: [PATCH] random: remove ratelimiting from add_interrupt_randomness()

---
 drivers/char/random.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 38729baed6ee..5b38277b104a 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -574,7 +574,6 @@ static void mix_pool_bytes(struct entropy_store *r, const 
void *in,
 
 struct fast_pool {
__u32   pool[4];
-   unsigned long   last;
unsigned short  reg_idx;
unsigned char   count;
 };
@@ -1195,20 +1194,14 @@ void add_interrupt_randomness(int irq, int irq_flags)
crng_fast_load((char *) fast_pool->pool,
   sizeof(fast_pool->pool))) {
fast_pool->count = 0;
-   fast_pool->last = now;
}
return;
}
 
-   if ((fast_pool->count < 64) &&
-   !time_after(now, fast_pool->last + HZ))
-   return;
-
r = _pool;
if (!spin_trylock(>lock))

Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Theodore Y. Ts'o
On Sun, Apr 29, 2018 at 01:20:33PM -0700, Sultan Alsawaf wrote:
> On Sun, Apr 29, 2018 at 08:41:01PM +0200, Pavel Machek wrote:
> > Umm. No. https://www.youtube.com/watch?v=xneBjc8z0DE
> 
> Okay, but /dev/urandom isn't a solution to this problem because it isn't 
> usable
> until crng init is complete, so it suffers from the same init lag as
> /dev/random.

It's more accurate to say that using /dev/urandom is no worse than
before (from a few years ago).  There are, alas, plenty of
distributions and user space application programmers that basically
got lazy using /dev/urandom, and assumed that there would be plenty of
entropy during early system startup.

When they switched over the getrandom(2), the most egregious examples
of this caused pain (and they got fixed), but due to a bug in
drivers/char/random.c, if getrandom(2) was called after the entropy
pool was "half initialized", it would not block, but proceed.

Is that exploitable?  Well, Jann and I didn't find an _obvious_ way to
exploit the short coming, which is this wasn't treated like an
emergency situation ala the embarassing situation we had five years
ago[1].

[1] https://factorable.net/paper.html

However, it was enough to make us be uncomfortable, which is why I
pushed the changes that I did.  At least on the devices we had at
hand, using the distributions that we typically use, the impact seemed
minimal.  Unfortuantely, there is no way to know for sure without
rolling out change and seeing who screams.  In the ideal world,
software would not require cryptographic randomness immediately after
boot, before the user logs in.  And ***really***, as in [1], softwaret
should not be generating long-term public keys that are essential to
the security of the box a few seconds immediately after the device is
first unboxed and plugged in.i

What would be useful is if people gave reports that listed exactly
what laptop and distributions they are using.  Just "a high spec x86
laptop" isn't terribly useful, because *my* brand-new Dell XPS 13
running Debian testing is working just fine.  The year, model, make,
and CPU type plus what distribution (and distro version number) you
are running is useful, so I can assess how wide spread the unhappiness
is going to be, and what mitigation steps make sense.


What mitigations steps can be taken?

If you believe in security-through-complexity (the cache architecture
of x86 is *so* complicated no one can understand it, so
Jitterentropy / Haveged *must* be secure), or security-through-secrecy
(the cache architecture of x86 is only avilable to internal architects
inside Intel, so Jitterentropy / Haveged *must* be secure, never mind
that the Intel CPU architects who were asked about it were "nervous"),
then wiring up CONFIG_JITTERENTROPY or using haveged might be one
approach.

If you believe that Intel hasn't backdoored RDRAND, then installing
rng-tools and running rngd with --enable-drng will enable RDRAND.
That seems to be popular with various defense contractors, perhaps on
the assumption that if it _was_ backdoored (no one knows for sure), it
was probably with the connivance or request of the US government, who
doesn't need to worry about spying on itself.

Or you can use some kind of open hardware design RNG, such as
ChoasKey[2] from Altus Metrum.  But that requires using specially
ordered hardware plugged into a USB slot, and it's probably not a mass
solution.

[2] https://altusmetrum.org/ChaosKey/


Personally, I prefer fixing the software to simply not require
cryptographic grade entropy before the user has logged in.  Because
it's better than the alternatives.

- Ted



Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Theodore Y. Ts'o
On Sun, Apr 29, 2018 at 01:20:33PM -0700, Sultan Alsawaf wrote:
> On Sun, Apr 29, 2018 at 08:41:01PM +0200, Pavel Machek wrote:
> > Umm. No. https://www.youtube.com/watch?v=xneBjc8z0DE
> 
> Okay, but /dev/urandom isn't a solution to this problem because it isn't 
> usable
> until crng init is complete, so it suffers from the same init lag as
> /dev/random.

It's more accurate to say that using /dev/urandom is no worse than
before (from a few years ago).  There are, alas, plenty of
distributions and user space application programmers that basically
got lazy using /dev/urandom, and assumed that there would be plenty of
entropy during early system startup.

When they switched over the getrandom(2), the most egregious examples
of this caused pain (and they got fixed), but due to a bug in
drivers/char/random.c, if getrandom(2) was called after the entropy
pool was "half initialized", it would not block, but proceed.

Is that exploitable?  Well, Jann and I didn't find an _obvious_ way to
exploit the short coming, which is this wasn't treated like an
emergency situation ala the embarassing situation we had five years
ago[1].

[1] https://factorable.net/paper.html

However, it was enough to make us be uncomfortable, which is why I
pushed the changes that I did.  At least on the devices we had at
hand, using the distributions that we typically use, the impact seemed
minimal.  Unfortuantely, there is no way to know for sure without
rolling out change and seeing who screams.  In the ideal world,
software would not require cryptographic randomness immediately after
boot, before the user logs in.  And ***really***, as in [1], softwaret
should not be generating long-term public keys that are essential to
the security of the box a few seconds immediately after the device is
first unboxed and plugged in.i

What would be useful is if people gave reports that listed exactly
what laptop and distributions they are using.  Just "a high spec x86
laptop" isn't terribly useful, because *my* brand-new Dell XPS 13
running Debian testing is working just fine.  The year, model, make,
and CPU type plus what distribution (and distro version number) you
are running is useful, so I can assess how wide spread the unhappiness
is going to be, and what mitigation steps make sense.


What mitigations steps can be taken?

If you believe in security-through-complexity (the cache architecture
of x86 is *so* complicated no one can understand it, so
Jitterentropy / Haveged *must* be secure), or security-through-secrecy
(the cache architecture of x86 is only avilable to internal architects
inside Intel, so Jitterentropy / Haveged *must* be secure, never mind
that the Intel CPU architects who were asked about it were "nervous"),
then wiring up CONFIG_JITTERENTROPY or using haveged might be one
approach.

If you believe that Intel hasn't backdoored RDRAND, then installing
rng-tools and running rngd with --enable-drng will enable RDRAND.
That seems to be popular with various defense contractors, perhaps on
the assumption that if it _was_ backdoored (no one knows for sure), it
was probably with the connivance or request of the US government, who
doesn't need to worry about spying on itself.

Or you can use some kind of open hardware design RNG, such as
ChoasKey[2] from Altus Metrum.  But that requires using specially
ordered hardware plugged into a USB slot, and it's probably not a mass
solution.

[2] https://altusmetrum.org/ChaosKey/


Personally, I prefer fixing the software to simply not require
cryptographic grade entropy before the user has logged in.  Because
it's better than the alternatives.

- Ted



Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Sultan Alsawaf
On Sun, Apr 29, 2018 at 11:18:55PM +0200, Pavel Machek wrote:
> So -- I'm pretty sure systemd and friends should be using
> /dev/urandom. Maybe gpg wants to use /dev/random. _Maybe_.
> 
> [2.948192] random: systemd: uninitialized urandom read (16 bytes
> read)
> [2.953526] systemd[1]: systemd 215 running in system mode. (+PAM
> +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ
> -SECCOMP -APPARMOR)
> [2.980278] systemd[1]: Detected architecture 'x86'.
> [3.115072] usb 5-2: New USB device found, idVendor=0483,
> idProduct=2016, bcdDevice= 0.01
> [3.119633] usb 5-2: New USB device strings: Mfr=1, Product=2,
> SerialNumber=0
> [3.124147] usb 5-2: Product: Biometric Coprocessor
> [3.128621] usb 5-2: Manufacturer: STMicroelectronics
> [3.163839] systemd[1]: Failed to insert module 'ipv6'
> [3.181266] systemd[1]: Set hostname to .
> [3.267243] random: systemd-sysv-ge: uninitialized urandom read (16
> bytes read)
> [3.669590] random: systemd-sysv-ge: uninitialized urandom read (16
> bytes read)
> [3.696242] random: systemd: uninitialized urandom read (16 bytes
> read)
> [3.700066] random: systemd: uninitialized urandom read (16 bytes
> read)
> [3.703716] random: systemd: uninitialized urandom read (16 bytes
> read)
> 
> Anyway, urandom should need to be seeded once, and then provide random
> data forever... which is not impression I get from the dmesg output
> above. Boot clearly proceeds... somehow. So now I'm confused.

Hmm... Well, the attached patch (which redirects /dev/random to /dev/urandom)
didn't fix my boot issue, so I'm at a loss as well.

Sultan

>From 15f54e2756866956d8713fdec92b54c6c69eb1bb Mon Sep 17 00:00:00 2001
From: Sultan Alsawaf 
Date: Sun, 29 Apr 2018 12:53:44 -0700
Subject: [PATCH] char: mem: Link /dev/random to /dev/urandom

---
 drivers/char/mem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index ffeb60d3434c..0cd22e6100ad 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -870,7 +870,7 @@ static const struct memdev {
 #endif
 [5] = { "zero", 0666, _fops, 0 },
 [7] = { "full", 0666, _fops, 0 },
-[8] = { "random", 0666, _fops, 0 },
+[8] = { "random", 0666, _fops, 0 },
 [9] = { "urandom", 0666, _fops, 0 },
 #ifdef CONFIG_PRINTK
[11] = { "kmsg", 0644, _fops, 0 },
-- 
2.14.1


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Sultan Alsawaf
On Sun, Apr 29, 2018 at 11:18:55PM +0200, Pavel Machek wrote:
> So -- I'm pretty sure systemd and friends should be using
> /dev/urandom. Maybe gpg wants to use /dev/random. _Maybe_.
> 
> [2.948192] random: systemd: uninitialized urandom read (16 bytes
> read)
> [2.953526] systemd[1]: systemd 215 running in system mode. (+PAM
> +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ
> -SECCOMP -APPARMOR)
> [2.980278] systemd[1]: Detected architecture 'x86'.
> [3.115072] usb 5-2: New USB device found, idVendor=0483,
> idProduct=2016, bcdDevice= 0.01
> [3.119633] usb 5-2: New USB device strings: Mfr=1, Product=2,
> SerialNumber=0
> [3.124147] usb 5-2: Product: Biometric Coprocessor
> [3.128621] usb 5-2: Manufacturer: STMicroelectronics
> [3.163839] systemd[1]: Failed to insert module 'ipv6'
> [3.181266] systemd[1]: Set hostname to .
> [3.267243] random: systemd-sysv-ge: uninitialized urandom read (16
> bytes read)
> [3.669590] random: systemd-sysv-ge: uninitialized urandom read (16
> bytes read)
> [3.696242] random: systemd: uninitialized urandom read (16 bytes
> read)
> [3.700066] random: systemd: uninitialized urandom read (16 bytes
> read)
> [3.703716] random: systemd: uninitialized urandom read (16 bytes
> read)
> 
> Anyway, urandom should need to be seeded once, and then provide random
> data forever... which is not impression I get from the dmesg output
> above. Boot clearly proceeds... somehow. So now I'm confused.

Hmm... Well, the attached patch (which redirects /dev/random to /dev/urandom)
didn't fix my boot issue, so I'm at a loss as well.

Sultan

>From 15f54e2756866956d8713fdec92b54c6c69eb1bb Mon Sep 17 00:00:00 2001
From: Sultan Alsawaf 
Date: Sun, 29 Apr 2018 12:53:44 -0700
Subject: [PATCH] char: mem: Link /dev/random to /dev/urandom

---
 drivers/char/mem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index ffeb60d3434c..0cd22e6100ad 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -870,7 +870,7 @@ static const struct memdev {
 #endif
 [5] = { "zero", 0666, _fops, 0 },
 [7] = { "full", 0666, _fops, 0 },
-[8] = { "random", 0666, _fops, 0 },
+[8] = { "random", 0666, _fops, 0 },
 [9] = { "urandom", 0666, _fops, 0 },
 #ifdef CONFIG_PRINTK
[11] = { "kmsg", 0644, _fops, 0 },
-- 
2.14.1


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Pavel Machek
On Sun 2018-04-29 13:20:33, Sultan Alsawaf wrote:
> On Sun, Apr 29, 2018 at 08:41:01PM +0200, Pavel Machek wrote:
> > Umm. No. https://www.youtube.com/watch?v=xneBjc8z0DE
> 
> Okay, but /dev/urandom isn't a solution to this problem because it isn't 
> usable
> until crng init is complete, so it suffers from the same init lag as
> /dev/random.

So -- I'm pretty sure systemd and friends should be using
/dev/urandom. Maybe gpg wants to use /dev/random. _Maybe_.

[2.948192] random: systemd: uninitialized urandom read (16 bytes
read)
[2.953526] systemd[1]: systemd 215 running in system mode. (+PAM
+AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ
-SECCOMP -APPARMOR)
[2.980278] systemd[1]: Detected architecture 'x86'.
[3.115072] usb 5-2: New USB device found, idVendor=0483,
idProduct=2016, bcdDevice= 0.01
[3.119633] usb 5-2: New USB device strings: Mfr=1, Product=2,
SerialNumber=0
[3.124147] usb 5-2: Product: Biometric Coprocessor
[3.128621] usb 5-2: Manufacturer: STMicroelectronics
[3.163839] systemd[1]: Failed to insert module 'ipv6'
[3.181266] systemd[1]: Set hostname to .
[3.267243] random: systemd-sysv-ge: uninitialized urandom read (16
bytes read)
[3.669590] random: systemd-sysv-ge: uninitialized urandom read (16
bytes read)
[3.696242] random: systemd: uninitialized urandom read (16 bytes
read)
[3.700066] random: systemd: uninitialized urandom read (16 bytes
read)
[3.703716] random: systemd: uninitialized urandom read (16 bytes
read)

Anyway, urandom should need to be seeded once, and then provide random
data forever... which is not impression I get from the dmesg output
above. Boot clearly proceeds... somehow. So now I'm confused.

Best regards,
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Pavel Machek
On Sun 2018-04-29 13:20:33, Sultan Alsawaf wrote:
> On Sun, Apr 29, 2018 at 08:41:01PM +0200, Pavel Machek wrote:
> > Umm. No. https://www.youtube.com/watch?v=xneBjc8z0DE
> 
> Okay, but /dev/urandom isn't a solution to this problem because it isn't 
> usable
> until crng init is complete, so it suffers from the same init lag as
> /dev/random.

So -- I'm pretty sure systemd and friends should be using
/dev/urandom. Maybe gpg wants to use /dev/random. _Maybe_.

[2.948192] random: systemd: uninitialized urandom read (16 bytes
read)
[2.953526] systemd[1]: systemd 215 running in system mode. (+PAM
+AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ
-SECCOMP -APPARMOR)
[2.980278] systemd[1]: Detected architecture 'x86'.
[3.115072] usb 5-2: New USB device found, idVendor=0483,
idProduct=2016, bcdDevice= 0.01
[3.119633] usb 5-2: New USB device strings: Mfr=1, Product=2,
SerialNumber=0
[3.124147] usb 5-2: Product: Biometric Coprocessor
[3.128621] usb 5-2: Manufacturer: STMicroelectronics
[3.163839] systemd[1]: Failed to insert module 'ipv6'
[3.181266] systemd[1]: Set hostname to .
[3.267243] random: systemd-sysv-ge: uninitialized urandom read (16
bytes read)
[3.669590] random: systemd-sysv-ge: uninitialized urandom read (16
bytes read)
[3.696242] random: systemd: uninitialized urandom read (16 bytes
read)
[3.700066] random: systemd: uninitialized urandom read (16 bytes
read)
[3.703716] random: systemd: uninitialized urandom read (16 bytes
read)

Anyway, urandom should need to be seeded once, and then provide random
data forever... which is not impression I get from the dmesg output
above. Boot clearly proceeds... somehow. So now I'm confused.

Best regards,
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Sultan Alsawaf
On Sun, Apr 29, 2018 at 08:41:01PM +0200, Pavel Machek wrote:
> Umm. No. https://www.youtube.com/watch?v=xneBjc8z0DE

Okay, but /dev/urandom isn't a solution to this problem because it isn't usable
until crng init is complete, so it suffers from the same init lag as
/dev/random.

Sultan


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Sultan Alsawaf
On Sun, Apr 29, 2018 at 08:41:01PM +0200, Pavel Machek wrote:
> Umm. No. https://www.youtube.com/watch?v=xneBjc8z0DE

Okay, but /dev/urandom isn't a solution to this problem because it isn't usable
until crng init is complete, so it suffers from the same init lag as
/dev/random.

Sultan


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Theodore Y. Ts'o
On Sun, Apr 29, 2018 at 11:30:57AM -0700, Sultan Alsawaf wrote:
> 
> Mind you, this laptop has a 45W CPU, so power savings were definitely not
> considered in its design. Do you have any machines that can provide enough
> boot entropy to satisfy crng init without requiring user-provided entropy?

My 2018 Dell XPS 13 laptop, running "egrep '(random|EXT4)' /var/log/kern.log":

Apr 24 17:05:01 cwcc kernel: [0.00] random: get_random_bytes called 
from start_kernel+0x83/0x500 with crng_init=0
Apr 24 17:05:01 cwcc kernel: [1.363383] random: fast init done
Apr 24 17:05:01 cwcc kernel: [3.567432] random: lvm: uninitialized urandom 
read (4 bytes read)
Apr 24 17:05:01 cwcc kernel: [3.593132] random: lvm: uninitialized urandom 
read (4 bytes read)
Apr 24 17:05:01 cwcc kernel: [7.584838] random: cryptsetup: uninitialized 
urandom read (2 bytes read)
Apr 24 17:05:01 cwcc kernel: [7.600685] random: cryptsetup: uninitialized 
urandom read (2 bytes read)
Apr 24 17:05:01 cwcc kernel: [7.803194] random: cryptsetup: uninitialized 
urandom read (2 bytes read)
Apr 24 17:05:01 cwcc kernel: [7.831050] random: lvm: uninitialized urandom 
read (4 bytes read)
Apr 24 17:05:01 cwcc kernel: [7.851884] random: lvm: uninitialized urandom 
read (4 bytes read)
Apr 24 17:05:01 cwcc kernel: [7.875382] random: lvm: uninitialized urandom 
read (2 bytes read)
Apr 24 17:05:01 cwcc kernel: [8.162552] EXT4-fs (dm-1): mounted filesystem 
with ordered data mode. Opts: (null)
Apr 24 17:05:01 cwcc kernel: [8.646497] random: crng init done

 - Ted


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Theodore Y. Ts'o
On Sun, Apr 29, 2018 at 11:30:57AM -0700, Sultan Alsawaf wrote:
> 
> Mind you, this laptop has a 45W CPU, so power savings were definitely not
> considered in its design. Do you have any machines that can provide enough
> boot entropy to satisfy crng init without requiring user-provided entropy?

My 2018 Dell XPS 13 laptop, running "egrep '(random|EXT4)' /var/log/kern.log":

Apr 24 17:05:01 cwcc kernel: [0.00] random: get_random_bytes called 
from start_kernel+0x83/0x500 with crng_init=0
Apr 24 17:05:01 cwcc kernel: [1.363383] random: fast init done
Apr 24 17:05:01 cwcc kernel: [3.567432] random: lvm: uninitialized urandom 
read (4 bytes read)
Apr 24 17:05:01 cwcc kernel: [3.593132] random: lvm: uninitialized urandom 
read (4 bytes read)
Apr 24 17:05:01 cwcc kernel: [7.584838] random: cryptsetup: uninitialized 
urandom read (2 bytes read)
Apr 24 17:05:01 cwcc kernel: [7.600685] random: cryptsetup: uninitialized 
urandom read (2 bytes read)
Apr 24 17:05:01 cwcc kernel: [7.803194] random: cryptsetup: uninitialized 
urandom read (2 bytes read)
Apr 24 17:05:01 cwcc kernel: [7.831050] random: lvm: uninitialized urandom 
read (4 bytes read)
Apr 24 17:05:01 cwcc kernel: [7.851884] random: lvm: uninitialized urandom 
read (4 bytes read)
Apr 24 17:05:01 cwcc kernel: [7.875382] random: lvm: uninitialized urandom 
read (2 bytes read)
Apr 24 17:05:01 cwcc kernel: [8.162552] EXT4-fs (dm-1): mounted filesystem 
with ordered data mode. Opts: (null)
Apr 24 17:05:01 cwcc kernel: [8.646497] random: crng init done

 - Ted


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Pavel Machek
On Sun 2018-04-29 10:05:41, Sultan Alsawaf wrote:
> On Sun, Apr 29, 2018 at 04:32:05PM +0200, Pavel Machek wrote:
> > Hi!
> > 
> > > This is why ultimately, we do need to attack this problem from both
> > > ends, which means teaching userspace programs to only request
> > > cryptographic-grade randomness when it is really needed --- and most
> > > of the time, if the user has not logged in yet, you probably don't
> > > need cryptographic-grade randomness
> > 
> > IOW moving them from /dev/random to /dev/urandom?
> 
> /dev/urandom isn't cryptographically secure, so that's not an
> option.

Umm. No. https://www.youtube.com/watch?v=xneBjc8z0DE


Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Pavel Machek
On Sun 2018-04-29 10:05:41, Sultan Alsawaf wrote:
> On Sun, Apr 29, 2018 at 04:32:05PM +0200, Pavel Machek wrote:
> > Hi!
> > 
> > > This is why ultimately, we do need to attack this problem from both
> > > ends, which means teaching userspace programs to only request
> > > cryptographic-grade randomness when it is really needed --- and most
> > > of the time, if the user has not logged in yet, you probably don't
> > > need cryptographic-grade randomness
> > 
> > IOW moving them from /dev/random to /dev/urandom?
> 
> /dev/urandom isn't cryptographically secure, so that's not an
> option.

Umm. No. https://www.youtube.com/watch?v=xneBjc8z0DE


Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Sultan Alsawaf
I'd also like to add that my high-spec x86 laptop exhibits the same issue as
my Edgar Chromebook.

Here's my dmesg: https://hastebin.com/dofejolobi.go

The most interesting line:
[   90.811633] random: crng init done

I waited 90 seconds after boot to provide entropy myself, at which point crng
init completed. In other words, crng init only completed because I provided
the entropy by smashing the keyboard. I could've waited longer and crng init
wouldn't have completed without my input.

Mind you, this laptop has a 45W CPU, so power savings were definitely not
considered in its design. Do you have any machines that can provide enough
boot entropy to satisfy crng init without requiring user-provided entropy?

Sultan


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Sultan Alsawaf
I'd also like to add that my high-spec x86 laptop exhibits the same issue as
my Edgar Chromebook.

Here's my dmesg: https://hastebin.com/dofejolobi.go

The most interesting line:
[   90.811633] random: crng init done

I waited 90 seconds after boot to provide entropy myself, at which point crng
init completed. In other words, crng init only completed because I provided
the entropy by smashing the keyboard. I could've waited longer and crng init
wouldn't have completed without my input.

Mind you, this laptop has a 45W CPU, so power savings were definitely not
considered in its design. Do you have any machines that can provide enough
boot entropy to satisfy crng init without requiring user-provided entropy?

Sultan


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Sultan Alsawaf
On Sun, Apr 29, 2018 at 04:32:05PM +0200, Pavel Machek wrote:
> Hi!
> 
> > This is why ultimately, we do need to attack this problem from both
> > ends, which means teaching userspace programs to only request
> > cryptographic-grade randomness when it is really needed --- and most
> > of the time, if the user has not logged in yet, you probably don't
> > need cryptographic-grade randomness
> 
> IOW moving them from /dev/random to /dev/urandom?
>   Pavel
> 
> -- 
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) 
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

/dev/urandom isn't cryptographically secure, so that's not an option.


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Sultan Alsawaf
On Sun, Apr 29, 2018 at 04:32:05PM +0200, Pavel Machek wrote:
> Hi!
> 
> > This is why ultimately, we do need to attack this problem from both
> > ends, which means teaching userspace programs to only request
> > cryptographic-grade randomness when it is really needed --- and most
> > of the time, if the user has not logged in yet, you probably don't
> > need cryptographic-grade randomness
> 
> IOW moving them from /dev/random to /dev/urandom?
>   Pavel
> 
> -- 
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) 
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

/dev/urandom isn't cryptographically secure, so that's not an option.


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Pavel Machek
Hi!

> This is why ultimately, we do need to attack this problem from both
> ends, which means teaching userspace programs to only request
> cryptographic-grade randomness when it is really needed --- and most
> of the time, if the user has not logged in yet, you probably don't
> need cryptographic-grade randomness

IOW moving them from /dev/random to /dev/urandom?
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Pavel Machek
Hi!

> This is why ultimately, we do need to attack this problem from both
> ends, which means teaching userspace programs to only request
> cryptographic-grade randomness when it is really needed --- and most
> of the time, if the user has not logged in yet, you probably don't
> need cryptographic-grade randomness

IOW moving them from /dev/random to /dev/urandom?
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Pavel Machek
Hi!
On Thu 2018-04-26 19:56:30, Theodore Y. Ts'o wrote:
> On Thu, Apr 26, 2018 at 01:22:02PM -0700, Sultan Alsawaf wrote:
> > 
> > Also, regardless of what's hanging on CRNG init, CRNG should be able to 
> > init on its own in a timely
> > manner without the need for user-provided entropy. Userspace was working 
> > fine before the recent CRNG
> > kernel changes, so I don't think this is a userspace bug.
> 
> The CRNG changes were needed because were erroneously saying that the
> entropy pool was securely initialized before it really was.  Saying
> that CRNG should be able to init on its own is much like saying, "Ted
> should be able to fly wherever he wants in his own personal Gulfstream
> V."  It would certainly be _nice_ if I could afford my personal jet.
> I certainly wish I were that rich.  But the problem is that dollars
> (or Euro's) are like entropy, they don't just magically drop out of
> the sky.
> 
> If there isn't user-provided entropy, and the hardware isn't providing
> sufficient entropy, where did you think the kernel is supposed to get
> the entropy from?  Should it dial 1-800-TRUST-NSA?

Yes, we could dial 1-800-TRUST-NSA. Then nicely ask them to provide us
some unbackdoored randomness. Then we'd ignore whatever they say, but
would collect randomness from timing and noise on the telephone line.

> The other approach would be to compile the kernel with
> CONFIG_HW_RANDOM_TPM and to modify drivers/char/tpm/tpm-chip.c tot
> initalize chip->hwrng.quality = 500.  We've historically made this
> something that the system administrator must set via sysfs.  This is
> because we wanted system adminisrators to explicitly say that they
> trust the any hardware manufacturer that (a) they haven't been paid by
> your choice of the Chinese MSS or the US NSA to introduce a backdoor,i
> and (b) they are competent to actually implemnt a _secure_ hardware
> random number generator.  Sadly, this has not always been the case.

Well, we could actively start accessing suitable device (SD card ? HDD
? CMOS RTC?) when we detect entropy is low. Yes, that would eat power,
but that would be better than machine that hangs at boot.

We could also access the hwrng, then collect entropy from the
timing. TPM is slow chip...
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Pavel Machek
Hi!
On Thu 2018-04-26 19:56:30, Theodore Y. Ts'o wrote:
> On Thu, Apr 26, 2018 at 01:22:02PM -0700, Sultan Alsawaf wrote:
> > 
> > Also, regardless of what's hanging on CRNG init, CRNG should be able to 
> > init on its own in a timely
> > manner without the need for user-provided entropy. Userspace was working 
> > fine before the recent CRNG
> > kernel changes, so I don't think this is a userspace bug.
> 
> The CRNG changes were needed because were erroneously saying that the
> entropy pool was securely initialized before it really was.  Saying
> that CRNG should be able to init on its own is much like saying, "Ted
> should be able to fly wherever he wants in his own personal Gulfstream
> V."  It would certainly be _nice_ if I could afford my personal jet.
> I certainly wish I were that rich.  But the problem is that dollars
> (or Euro's) are like entropy, they don't just magically drop out of
> the sky.
> 
> If there isn't user-provided entropy, and the hardware isn't providing
> sufficient entropy, where did you think the kernel is supposed to get
> the entropy from?  Should it dial 1-800-TRUST-NSA?

Yes, we could dial 1-800-TRUST-NSA. Then nicely ask them to provide us
some unbackdoored randomness. Then we'd ignore whatever they say, but
would collect randomness from timing and noise on the telephone line.

> The other approach would be to compile the kernel with
> CONFIG_HW_RANDOM_TPM and to modify drivers/char/tpm/tpm-chip.c tot
> initalize chip->hwrng.quality = 500.  We've historically made this
> something that the system administrator must set via sysfs.  This is
> because we wanted system adminisrators to explicitly say that they
> trust the any hardware manufacturer that (a) they haven't been paid by
> your choice of the Chinese MSS or the US NSA to introduce a backdoor,i
> and (b) they are competent to actually implemnt a _secure_ hardware
> random number generator.  Sadly, this has not always been the case.

Well, we could actively start accessing suitable device (SD card ? HDD
? CMOS RTC?) when we detect entropy is low. Yes, that would eat power,
but that would be better than machine that hangs at boot.

We could also access the hwrng, then collect entropy from the
timing. TPM is slow chip...
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Pavel Machek
Hi!

> Am 25.04.2018 um 09:41 schrieb Theodore Y. Ts'o:
> >Does this help on your system?
> 
> Thank you, after figuring out how to apply the paste, yes it helped on my
> Lenovo X60.
> 
> >commit 4e00b339e264802851aff8e73cde7d24b57b18ce
> >Author: Theodore Ts'o 
> >Date:   Wed Apr 25 01:12:32 2018 -0400
> >
> > random: rate limit unseeded randomness warnings
> > On systems without sufficient boot randomness, no point spamming dmesg.
> 
> I guess this is a problem with old hardware?

Ok, I see it too, thinkpad x60.

But... this machine has spinning harddrive and independend RTC; there
really should be enough randomness...

Could we exploit either of them as randomness source when we run out
of entropy?

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Pavel Machek
Hi!

> Am 25.04.2018 um 09:41 schrieb Theodore Y. Ts'o:
> >Does this help on your system?
> 
> Thank you, after figuring out how to apply the paste, yes it helped on my
> Lenovo X60.
> 
> >commit 4e00b339e264802851aff8e73cde7d24b57b18ce
> >Author: Theodore Ts'o 
> >Date:   Wed Apr 25 01:12:32 2018 -0400
> >
> > random: rate limit unseeded randomness warnings
> > On systems without sufficient boot randomness, no point spamming dmesg.
> 
> I guess this is a problem with old hardware?

Ok, I see it too, thinkpad x60.

But... this machine has spinning harddrive and independend RTC; there
really should be enough randomness...

Could we exploit either of them as randomness source when we run out
of entropy?

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-27 Thread Sultan Alsawaf
> On Thu, Apr 26, 2018 at 10:20:44PM -0700, Sultan Alsawaf wrote:
>> I noted at least 20,000 mmc interrupts before I intervened in the boot 
>> process to provide entropy
>> myself. That's just for mmc, so I'm sure there were even more interrupts 
>> elsewhere. Is 20k+ interrupts
>> really not sufficient?
> How did you determine that there were 20,000 mmc interrupts before you
> had logged in?  Did you have access to the machine w/o having access
> to the login prompt?
>
> I can send a patch (see attached) that will spew large amounts of logs
> as each interrupt comes in and the entropy pool is getting intialized.
> That's how I test things on QEMU, and Jann did something similar on a
> (physical) test machine, so I'm pretty confident that if you were
> getting interrupts, it would result in them contributing into the
> pool.
>
> You will need a serial console, or build a kernel with a much larger
> dmesg buffer, since if you really are getting that many interrupts it
> will cause a lot of log spew.  
>> There are lots of other sources of entropy available as well, like
>> the ever-changing CPU frequencies reported by any recent Intel chip
>> (i.e., they report precision down to 1 kHz).
> That's something we could look at, but the problem is if there is some
> systemd unit during early boot that blocks waiting for the entropy
> pool to be initalized, the system will come to a dead halt, and even
> the CPU frequency shifts will probably not move much --- just as there
> weren't any interrupts while some system startup on the boot path
> wedges the system startup waiting for entropy.
>
> This is why ultimately, we do need to attack this problem from both
> ends, which means teaching userspace programs to only request
> cryptographic-grade randomness when it is really needed --- and most
> of the time, if the user has not logged in yet, you probably don't
> need cryptographic-grade randomness
>
>   - Ted
>
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index cd888d4ee605..69bd29f039e7 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -916,6 +916,10 @@ static void crng_reseed(struct crng_state *crng, struct 
> entropy_store *r)
>   __u32   key[8];
>   } buf;
>  
> + if (crng == _crng)
> + pr_notice("random: crng_reseed primary from %px\n", r);
> + else
> + pr_notice("random: crng_reseed crng %px from %px\n", crng, r);
>   if (r) {
>   num = extract_entropy(r, , 32, 16, 0);
>   if (num == 0)
> @@ -1241,6 +1245,10 @@ void add_interrupt_randomness(int irq, int irq_flags)
>   fast_pool->pool[2] ^= ip;
>   fast_pool->pool[3] ^= (sizeof(ip) > 4) ? ip >> 32 :
>   get_reg(fast_pool, regs);
> + if (crng_init < 2)
> + pr_notice("random: add_interrupt(cycles=0x%08llx, now=%ld, "
> +   "irq=%d, ip=0x%08lx)\n",
> +   cycles, now, irq, _RET_IP_);
>  
>   fast_mix(fast_pool);
>   add_interrupt_bench(cycles);
> @@ -1282,6 +1290,9 @@ void add_interrupt_randomness(int irq, int irq_flags)
>  
>   /* award one bit for the contents of the fast pool */
>   credit_entropy_bits(r, credit + 1);
> + if (crng_init < 2)
> + pr_notice("random: batched into pool in stage %d, bits now %d",
> +   crng_init, ENTROPY_BITS(r));
>  }
>  EXPORT_SYMBOL_GPL(add_interrupt_randomness);

I dumped the contents of /proc/interrupts to dmesg using the attached patch I 
threw together,
and then waited a sufficient amount of time before introducing entropy myself 
in order to ensure
that the interrupt readings were not contaminated by user-provided interrupts.

Here is the interesting snippet from my dmesg:
[   30.689076] /proc/interrupts dump: 
   |CPU0   CPU1   CPU2   CPU3   
  0:  6  0  0  0   IO-APIC
2-edge  timer
  8:  0  0  1  0   IO-APIC
8-edge  rtc0
  9:  0533  0  0   IO-APIC
9-fasteoi   acpi
 10:  0  0  0  0   IO-APIC   
10-edge  tpm0
 29:  0  0  0  0   IO-APIC   
29-fasteoi   intel_sst_driver
 36:203  0  0  0   IO-APIC   
36-fasteoi   808622C1:04
 37:  0264  0  0   IO-APIC   
37-fasteoi   808622C1:05
 42:  0  0  0  0   IO-APIC   
42-fasteoi   dw:dmac-1
 43:  0  0  0  0   IO-APIC   
43-fasteoi   dw:dmac-1
 45:  0  0  0  11402   IO-APIC   
45-fasteoi   mmc0
168:  0  0  1  0  chv-gpio   

Re: Linux messages full of `random: get_random_u32 called from`

2018-04-27 Thread Sultan Alsawaf
> On Thu, Apr 26, 2018 at 10:20:44PM -0700, Sultan Alsawaf wrote:
>> I noted at least 20,000 mmc interrupts before I intervened in the boot 
>> process to provide entropy
>> myself. That's just for mmc, so I'm sure there were even more interrupts 
>> elsewhere. Is 20k+ interrupts
>> really not sufficient?
> How did you determine that there were 20,000 mmc interrupts before you
> had logged in?  Did you have access to the machine w/o having access
> to the login prompt?
>
> I can send a patch (see attached) that will spew large amounts of logs
> as each interrupt comes in and the entropy pool is getting intialized.
> That's how I test things on QEMU, and Jann did something similar on a
> (physical) test machine, so I'm pretty confident that if you were
> getting interrupts, it would result in them contributing into the
> pool.
>
> You will need a serial console, or build a kernel with a much larger
> dmesg buffer, since if you really are getting that many interrupts it
> will cause a lot of log spew.  
>> There are lots of other sources of entropy available as well, like
>> the ever-changing CPU frequencies reported by any recent Intel chip
>> (i.e., they report precision down to 1 kHz).
> That's something we could look at, but the problem is if there is some
> systemd unit during early boot that blocks waiting for the entropy
> pool to be initalized, the system will come to a dead halt, and even
> the CPU frequency shifts will probably not move much --- just as there
> weren't any interrupts while some system startup on the boot path
> wedges the system startup waiting for entropy.
>
> This is why ultimately, we do need to attack this problem from both
> ends, which means teaching userspace programs to only request
> cryptographic-grade randomness when it is really needed --- and most
> of the time, if the user has not logged in yet, you probably don't
> need cryptographic-grade randomness
>
>   - Ted
>
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index cd888d4ee605..69bd29f039e7 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -916,6 +916,10 @@ static void crng_reseed(struct crng_state *crng, struct 
> entropy_store *r)
>   __u32   key[8];
>   } buf;
>  
> + if (crng == _crng)
> + pr_notice("random: crng_reseed primary from %px\n", r);
> + else
> + pr_notice("random: crng_reseed crng %px from %px\n", crng, r);
>   if (r) {
>   num = extract_entropy(r, , 32, 16, 0);
>   if (num == 0)
> @@ -1241,6 +1245,10 @@ void add_interrupt_randomness(int irq, int irq_flags)
>   fast_pool->pool[2] ^= ip;
>   fast_pool->pool[3] ^= (sizeof(ip) > 4) ? ip >> 32 :
>   get_reg(fast_pool, regs);
> + if (crng_init < 2)
> + pr_notice("random: add_interrupt(cycles=0x%08llx, now=%ld, "
> +   "irq=%d, ip=0x%08lx)\n",
> +   cycles, now, irq, _RET_IP_);
>  
>   fast_mix(fast_pool);
>   add_interrupt_bench(cycles);
> @@ -1282,6 +1290,9 @@ void add_interrupt_randomness(int irq, int irq_flags)
>  
>   /* award one bit for the contents of the fast pool */
>   credit_entropy_bits(r, credit + 1);
> + if (crng_init < 2)
> + pr_notice("random: batched into pool in stage %d, bits now %d",
> +   crng_init, ENTROPY_BITS(r));
>  }
>  EXPORT_SYMBOL_GPL(add_interrupt_randomness);

I dumped the contents of /proc/interrupts to dmesg using the attached patch I 
threw together,
and then waited a sufficient amount of time before introducing entropy myself 
in order to ensure
that the interrupt readings were not contaminated by user-provided interrupts.

Here is the interesting snippet from my dmesg:
[   30.689076] /proc/interrupts dump: 
   |CPU0   CPU1   CPU2   CPU3   
  0:  6  0  0  0   IO-APIC
2-edge  timer
  8:  0  0  1  0   IO-APIC
8-edge  rtc0
  9:  0533  0  0   IO-APIC
9-fasteoi   acpi
 10:  0  0  0  0   IO-APIC   
10-edge  tpm0
 29:  0  0  0  0   IO-APIC   
29-fasteoi   intel_sst_driver
 36:203  0  0  0   IO-APIC   
36-fasteoi   808622C1:04
 37:  0264  0  0   IO-APIC   
37-fasteoi   808622C1:05
 42:  0  0  0  0   IO-APIC   
42-fasteoi   dw:dmac-1
 43:  0  0  0  0   IO-APIC   
43-fasteoi   dw:dmac-1
 45:  0  0  0  11402   IO-APIC   
45-fasteoi   mmc0
168:  0  0  1  0  chv-gpio   

Re: Linux messages full of `random: get_random_u32 called from`

2018-04-27 Thread Theodore Y. Ts'o
On Thu, Apr 26, 2018 at 10:20:44PM -0700, Sultan Alsawaf wrote:
> 
> I noted at least 20,000 mmc interrupts before I intervened in the boot 
> process to provide entropy
> myself. That's just for mmc, so I'm sure there were even more interrupts 
> elsewhere. Is 20k+ interrupts
> really not sufficient?

How did you determine that there were 20,000 mmc interrupts before you
had logged in?  Did you have access to the machine w/o having access
to the login prompt?

I can send a patch (see attached) that will spew large amounts of logs
as each interrupt comes in and the entropy pool is getting intialized.
That's how I test things on QEMU, and Jann did something similar on a
(physical) test machine, so I'm pretty confident that if you were
getting interrupts, it would result in them contributing into the
pool.

You will need a serial console, or build a kernel with a much larger
dmesg buffer, since if you really are getting that many interrupts it
will cause a lot of log spew.

> There are lots of other sources of entropy available as well, like
> the ever-changing CPU frequencies reported by any recent Intel chip
> (i.e., they report precision down to 1 kHz).

That's something we could look at, but the problem is if there is some
systemd unit during early boot that blocks waiting for the entropy
pool to be initalized, the system will come to a dead halt, and even
the CPU frequency shifts will probably not move much --- just as there
weren't any interrupts while some system startup on the boot path
wedges the system startup waiting for entropy.

This is why ultimately, we do need to attack this problem from both
ends, which means teaching userspace programs to only request
cryptographic-grade randomness when it is really needed --- and most
of the time, if the user has not logged in yet, you probably don't
need cryptographic-grade randomness

- Ted

diff --git a/drivers/char/random.c b/drivers/char/random.c
index cd888d4ee605..69bd29f039e7 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -916,6 +916,10 @@ static void crng_reseed(struct crng_state *crng, struct 
entropy_store *r)
__u32   key[8];
} buf;
 
+   if (crng == _crng)
+   pr_notice("random: crng_reseed primary from %px\n", r);
+   else
+   pr_notice("random: crng_reseed crng %px from %px\n", crng, r);
if (r) {
num = extract_entropy(r, , 32, 16, 0);
if (num == 0)
@@ -1241,6 +1245,10 @@ void add_interrupt_randomness(int irq, int irq_flags)
fast_pool->pool[2] ^= ip;
fast_pool->pool[3] ^= (sizeof(ip) > 4) ? ip >> 32 :
get_reg(fast_pool, regs);
+   if (crng_init < 2)
+   pr_notice("random: add_interrupt(cycles=0x%08llx, now=%ld, "
+ "irq=%d, ip=0x%08lx)\n",
+ cycles, now, irq, _RET_IP_);
 
fast_mix(fast_pool);
add_interrupt_bench(cycles);
@@ -1282,6 +1290,9 @@ void add_interrupt_randomness(int irq, int irq_flags)
 
/* award one bit for the contents of the fast pool */
credit_entropy_bits(r, credit + 1);
+   if (crng_init < 2)
+   pr_notice("random: batched into pool in stage %d, bits now %d",
+ crng_init, ENTROPY_BITS(r));
 }
 EXPORT_SYMBOL_GPL(add_interrupt_randomness);
 


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-27 Thread Theodore Y. Ts'o
On Thu, Apr 26, 2018 at 10:20:44PM -0700, Sultan Alsawaf wrote:
> 
> I noted at least 20,000 mmc interrupts before I intervened in the boot 
> process to provide entropy
> myself. That's just for mmc, so I'm sure there were even more interrupts 
> elsewhere. Is 20k+ interrupts
> really not sufficient?

How did you determine that there were 20,000 mmc interrupts before you
had logged in?  Did you have access to the machine w/o having access
to the login prompt?

I can send a patch (see attached) that will spew large amounts of logs
as each interrupt comes in and the entropy pool is getting intialized.
That's how I test things on QEMU, and Jann did something similar on a
(physical) test machine, so I'm pretty confident that if you were
getting interrupts, it would result in them contributing into the
pool.

You will need a serial console, or build a kernel with a much larger
dmesg buffer, since if you really are getting that many interrupts it
will cause a lot of log spew.

> There are lots of other sources of entropy available as well, like
> the ever-changing CPU frequencies reported by any recent Intel chip
> (i.e., they report precision down to 1 kHz).

That's something we could look at, but the problem is if there is some
systemd unit during early boot that blocks waiting for the entropy
pool to be initalized, the system will come to a dead halt, and even
the CPU frequency shifts will probably not move much --- just as there
weren't any interrupts while some system startup on the boot path
wedges the system startup waiting for entropy.

This is why ultimately, we do need to attack this problem from both
ends, which means teaching userspace programs to only request
cryptographic-grade randomness when it is really needed --- and most
of the time, if the user has not logged in yet, you probably don't
need cryptographic-grade randomness

- Ted

diff --git a/drivers/char/random.c b/drivers/char/random.c
index cd888d4ee605..69bd29f039e7 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -916,6 +916,10 @@ static void crng_reseed(struct crng_state *crng, struct 
entropy_store *r)
__u32   key[8];
} buf;
 
+   if (crng == _crng)
+   pr_notice("random: crng_reseed primary from %px\n", r);
+   else
+   pr_notice("random: crng_reseed crng %px from %px\n", crng, r);
if (r) {
num = extract_entropy(r, , 32, 16, 0);
if (num == 0)
@@ -1241,6 +1245,10 @@ void add_interrupt_randomness(int irq, int irq_flags)
fast_pool->pool[2] ^= ip;
fast_pool->pool[3] ^= (sizeof(ip) > 4) ? ip >> 32 :
get_reg(fast_pool, regs);
+   if (crng_init < 2)
+   pr_notice("random: add_interrupt(cycles=0x%08llx, now=%ld, "
+ "irq=%d, ip=0x%08lx)\n",
+ cycles, now, irq, _RET_IP_);
 
fast_mix(fast_pool);
add_interrupt_bench(cycles);
@@ -1282,6 +1290,9 @@ void add_interrupt_randomness(int irq, int irq_flags)
 
/* award one bit for the contents of the fast pool */
credit_entropy_bits(r, credit + 1);
+   if (crng_init < 2)
+   pr_notice("random: batched into pool in stage %d, bits now %d",
+ crng_init, ENTROPY_BITS(r));
 }
 EXPORT_SYMBOL_GPL(add_interrupt_randomness);
 


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-27 Thread Theodore Y. Ts'o
On Fri, Apr 27, 2018 at 05:38:52PM +0200, Jason A. Donenfeld wrote:
> 
> Please correct me if I'm wrong, but my present understanding of this
> is that crng readiness used to be broken, meaning people would have a
> seeded rng without it actually being seeded. You fixed this bug, and
> now people are discovering that they don't have crng readiness during
> a late stage of their init, which is breaking all sorts of entirely
> reasonable and widely deployed userspaces.

I'd say the problem is a combination of some classes of x86 hardware
devices (so far I've mainly seen repurposed chromebooks and VM's that
don't have virtio-rng enabled) combined with some distributions that
could make themselves more amenable to platforms with minimal amounts
of entropy available to them during system startup.

> Sultan mentioned that his machine actually does trigger large
> quantities of interrupts. Is it possible that the entropy gathering
> algorithm has some issues, and Sultan's report points to a real bug
> here? Considering the crng readiness state hasn't been working until
> your recent fix, I suspect the actual entropy gathering code probably
> hasn't prompted too many bug reports, until now that is.

It's not clear when his machine is triggering the "large quantity of
interrupts".  Is it during the system startup, or after he's logged
into the machine?  I suspect what is going on is the Chromebook has
been engineered so that when it's idle, it doesn't issue any
interrupts at all --- which is a good thing from a power management
perspective.  So if nothing is actually _querying_ the SD Card reader,
it's not generating any interrupts.

This is a feature, and not a bug.  That being said, a laptop which
sends some number of interrupts as it receives, say, WiFi packets, and
a system which automatically starts looking for suitable access points
as soon as the machine is started gives us timing events which is not
easily available to an analyst sitting in Fort Meade, Maryland.  In
practice, that seems to be much more of the rule and not the
exception.  However, as laptops try to become much more sparing
interrupts to save power, then we either have to (a) be willing to
trust hardware random number generators available to the laptop,
and/or (b) change userspace to *wait* until after the user has logged
in to try to obtain cryptographic-graded randomness.

If you think there is an alternative besides those two, I'm all ears...

  - Ted


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-27 Thread Theodore Y. Ts'o
On Fri, Apr 27, 2018 at 05:38:52PM +0200, Jason A. Donenfeld wrote:
> 
> Please correct me if I'm wrong, but my present understanding of this
> is that crng readiness used to be broken, meaning people would have a
> seeded rng without it actually being seeded. You fixed this bug, and
> now people are discovering that they don't have crng readiness during
> a late stage of their init, which is breaking all sorts of entirely
> reasonable and widely deployed userspaces.

I'd say the problem is a combination of some classes of x86 hardware
devices (so far I've mainly seen repurposed chromebooks and VM's that
don't have virtio-rng enabled) combined with some distributions that
could make themselves more amenable to platforms with minimal amounts
of entropy available to them during system startup.

> Sultan mentioned that his machine actually does trigger large
> quantities of interrupts. Is it possible that the entropy gathering
> algorithm has some issues, and Sultan's report points to a real bug
> here? Considering the crng readiness state hasn't been working until
> your recent fix, I suspect the actual entropy gathering code probably
> hasn't prompted too many bug reports, until now that is.

It's not clear when his machine is triggering the "large quantity of
interrupts".  Is it during the system startup, or after he's logged
into the machine?  I suspect what is going on is the Chromebook has
been engineered so that when it's idle, it doesn't issue any
interrupts at all --- which is a good thing from a power management
perspective.  So if nothing is actually _querying_ the SD Card reader,
it's not generating any interrupts.

This is a feature, and not a bug.  That being said, a laptop which
sends some number of interrupts as it receives, say, WiFi packets, and
a system which automatically starts looking for suitable access points
as soon as the machine is started gives us timing events which is not
easily available to an analyst sitting in Fort Meade, Maryland.  In
practice, that seems to be much more of the rule and not the
exception.  However, as laptops try to become much more sparing
interrupts to save power, then we either have to (a) be willing to
trust hardware random number generators available to the laptop,
and/or (b) change userspace to *wait* until after the user has logged
in to try to obtain cryptographic-graded randomness.

If you think there is an alternative besides those two, I'm all ears...

  - Ted


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-27 Thread Jason A. Donenfeld
Hi Ted,

Please correct me if I'm wrong, but my present understanding of this
is that crng readiness used to be broken, meaning people would have a
seeded rng without it actually being seeded. You fixed this bug, and
now people are discovering that they don't have crng readiness during
a late stage of their init, which is breaking all sorts of entirely
reasonable and widely deployed userspaces.

You could argue that those userspaces were "only designed for machines
that have enough [by what measure?] boot time entropy", but obviously
they didn't have that in mind. And now here we have an example of an
ordinary x86 machine -- not some weird embedded device -- hitting
these issues. I'd suspect that the problem here isn't one that we can
exclusively punt onto userspace.

Sultan mentioned that his machine actually does trigger large
quantities of interrupts. Is it possible that the entropy gathering
algorithm has some issues, and Sultan's report points to a real bug
here? Considering the crng readiness state hasn't been working until
your recent fix, I suspect the actual entropy gathering code probably
hasn't prompted too many bug reports, until now that is.

Jason


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-27 Thread Jason A. Donenfeld
Hi Ted,

Please correct me if I'm wrong, but my present understanding of this
is that crng readiness used to be broken, meaning people would have a
seeded rng without it actually being seeded. You fixed this bug, and
now people are discovering that they don't have crng readiness during
a late stage of their init, which is breaking all sorts of entirely
reasonable and widely deployed userspaces.

You could argue that those userspaces were "only designed for machines
that have enough [by what measure?] boot time entropy", but obviously
they didn't have that in mind. And now here we have an example of an
ordinary x86 machine -- not some weird embedded device -- hitting
these issues. I'd suspect that the problem here isn't one that we can
exclusively punt onto userspace.

Sultan mentioned that his machine actually does trigger large
quantities of interrupts. Is it possible that the entropy gathering
algorithm has some issues, and Sultan's report points to a real bug
here? Considering the crng readiness state hasn't been working until
your recent fix, I suspect the actual entropy gathering code probably
hasn't prompted too many bug reports, until now that is.

Jason


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-26 Thread Sultan Alsawaf
> The CRNG changes were needed because were erroneously saying that the
> entropy pool was securely initialized before it really was.  Saying
> that CRNG should be able to init on its own is much like saying, "Ted
> should be able to fly wherever he wants in his own personal Gulfstream
> V."  It would certainly be _nice_ if I could afford my personal jet.
> I certainly wish I were that rich.  But the problem is that dollars
> (or Euro's) are like entropy, they don't just magically drop out of
> the sky.
>
> If there isn't user-provided entropy, and the hardware isn't providing
> sufficient entropy, where did you think the kernel is supposed to get
> the entropy from?  Should it dial 1-800-TRUST-NSA?
>
> From the dmesg log, you have a Chromebook Acer 14.  I'm guessing the
> problem is that Chromebooks have hardware tries *very* hard not to
> issue interrupts, since that helps with power savings.  The following
> from your dmesg is very interesting:
>
> [0.526786] tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling 
> instead
>
> I suspect this isn't a firmware bug; it's the hardware working as
> intended / working as designed, for power savings reasons.
>
> So there are two ways to fix this that I can see.  One is to try to
> adjust userspace so that it allows the boot to proceed.  As there is
> more activity, the disk completion interrupts, the user typing their
> username/password into the login screen, etc., there will be timing
> events which can be used to harvest entropy.
>
> The other approach would be to compile the kernel with
> CONFIG_HW_RANDOM_TPM and to modify drivers/char/tpm/tpm-chip.c tot
> initalize chip->hwrng.quality = 500.  We've historically made this
> something that the system administrator must set via sysfs.  This is
> because we wanted system adminisrators to explicitly say that they
> trust the any hardware manufacturer that (a) they haven't been paid by
> your choice of the Chinese MSS or the US NSA to introduce a backdoor,i
> and (b) they are competent to actually implemnt a _secure_ hardware
> random number generator.  Sadly, this has not always been the case.
> Please see:
>
>   https://www.chromium.org/chromium-os/tpm_firmware_update
>
> And note that your Edgar Chromebook is one the list of devices that
> have a TPM with the buggy firmware.  Fortunately this particular TPM
> bug only affects RSA prime generation, so as far as I know there is no
> _known_ vulerability in your TPM's hardware random number generator.
> B ut we want it to be _your_ responsibility to decide you are willing
> to truste it.  I certainly don't want to be legally liable --- or even
> have the moral responsibility --- of guaranteeing that every single
> TPM out there is bug-free(tm).
>
>   - Ted

Why don't we tell users that they need to smash their keyboards to make their 
computers boot
then? And if they question it, we can tell them that it certainly would be 
_nice_ to not have
to smash their keyboards to make their computers boot, but alas, a part of me 
has a feeling that
users would not take kindly to that :)

I noted at least 20,000 mmc interrupts before I intervened in the boot process 
to provide entropy
myself. That's just for mmc, so I'm sure there were even more interrupts 
elsewhere. Is 20k+ interrupts
really not sufficient?

There are lots of other sources of entropy available as well, like the 
ever-changing CPU frequencies reported
by any recent Intel chip (i.e., they report precision down to 1 kHz). Why are 
we so limited to h/w interrupts?

Sultan



Re: Linux messages full of `random: get_random_u32 called from`

2018-04-26 Thread Sultan Alsawaf
> The CRNG changes were needed because were erroneously saying that the
> entropy pool was securely initialized before it really was.  Saying
> that CRNG should be able to init on its own is much like saying, "Ted
> should be able to fly wherever he wants in his own personal Gulfstream
> V."  It would certainly be _nice_ if I could afford my personal jet.
> I certainly wish I were that rich.  But the problem is that dollars
> (or Euro's) are like entropy, they don't just magically drop out of
> the sky.
>
> If there isn't user-provided entropy, and the hardware isn't providing
> sufficient entropy, where did you think the kernel is supposed to get
> the entropy from?  Should it dial 1-800-TRUST-NSA?
>
> From the dmesg log, you have a Chromebook Acer 14.  I'm guessing the
> problem is that Chromebooks have hardware tries *very* hard not to
> issue interrupts, since that helps with power savings.  The following
> from your dmesg is very interesting:
>
> [0.526786] tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling 
> instead
>
> I suspect this isn't a firmware bug; it's the hardware working as
> intended / working as designed, for power savings reasons.
>
> So there are two ways to fix this that I can see.  One is to try to
> adjust userspace so that it allows the boot to proceed.  As there is
> more activity, the disk completion interrupts, the user typing their
> username/password into the login screen, etc., there will be timing
> events which can be used to harvest entropy.
>
> The other approach would be to compile the kernel with
> CONFIG_HW_RANDOM_TPM and to modify drivers/char/tpm/tpm-chip.c tot
> initalize chip->hwrng.quality = 500.  We've historically made this
> something that the system administrator must set via sysfs.  This is
> because we wanted system adminisrators to explicitly say that they
> trust the any hardware manufacturer that (a) they haven't been paid by
> your choice of the Chinese MSS or the US NSA to introduce a backdoor,i
> and (b) they are competent to actually implemnt a _secure_ hardware
> random number generator.  Sadly, this has not always been the case.
> Please see:
>
>   https://www.chromium.org/chromium-os/tpm_firmware_update
>
> And note that your Edgar Chromebook is one the list of devices that
> have a TPM with the buggy firmware.  Fortunately this particular TPM
> bug only affects RSA prime generation, so as far as I know there is no
> _known_ vulerability in your TPM's hardware random number generator.
> B ut we want it to be _your_ responsibility to decide you are willing
> to truste it.  I certainly don't want to be legally liable --- or even
> have the moral responsibility --- of guaranteeing that every single
> TPM out there is bug-free(tm).
>
>   - Ted

Why don't we tell users that they need to smash their keyboards to make their 
computers boot
then? And if they question it, we can tell them that it certainly would be 
_nice_ to not have
to smash their keyboards to make their computers boot, but alas, a part of me 
has a feeling that
users would not take kindly to that :)

I noted at least 20,000 mmc interrupts before I intervened in the boot process 
to provide entropy
myself. That's just for mmc, so I'm sure there were even more interrupts 
elsewhere. Is 20k+ interrupts
really not sufficient?

There are lots of other sources of entropy available as well, like the 
ever-changing CPU frequencies reported
by any recent Intel chip (i.e., they report precision down to 1 kHz). Why are 
we so limited to h/w interrupts?

Sultan



Re: Linux messages full of `random: get_random_u32 called from`

2018-04-26 Thread Theodore Y. Ts'o
On Thu, Apr 26, 2018 at 10:47:49PM +0200, Christian Brauner wrote:
> 
> We have observed a similiar problem  with libvirt. As soon as entropy is
> provided the boot finishes otherwise it hangs for a long time.
> This is not happening with v4.17-rc1 afaict.

For libvirt there is at least an easy workaround.  Make surue the
guest kernel has CONFIG_HW_RANDOM_VIRTIO enabled, and then make sure
qemu is started with the options:

-object rng-random,filename=/dev/urandom,id=rng0 \
-device virtio-rng-pci,rng=rng0

Cheers,

- Ted


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-26 Thread Theodore Y. Ts'o
On Thu, Apr 26, 2018 at 10:47:49PM +0200, Christian Brauner wrote:
> 
> We have observed a similiar problem  with libvirt. As soon as entropy is
> provided the boot finishes otherwise it hangs for a long time.
> This is not happening with v4.17-rc1 afaict.

For libvirt there is at least an easy workaround.  Make surue the
guest kernel has CONFIG_HW_RANDOM_VIRTIO enabled, and then make sure
qemu is started with the options:

-object rng-random,filename=/dev/urandom,id=rng0 \
-device virtio-rng-pci,rng=rng0

Cheers,

- Ted


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-26 Thread Theodore Y. Ts'o
On Thu, Apr 26, 2018 at 01:22:02PM -0700, Sultan Alsawaf wrote:
> 
> Also, regardless of what's hanging on CRNG init, CRNG should be able to init 
> on its own in a timely
> manner without the need for user-provided entropy. Userspace was working fine 
> before the recent CRNG
> kernel changes, so I don't think this is a userspace bug.

The CRNG changes were needed because were erroneously saying that the
entropy pool was securely initialized before it really was.  Saying
that CRNG should be able to init on its own is much like saying, "Ted
should be able to fly wherever he wants in his own personal Gulfstream
V."  It would certainly be _nice_ if I could afford my personal jet.
I certainly wish I were that rich.  But the problem is that dollars
(or Euro's) are like entropy, they don't just magically drop out of
the sky.

If there isn't user-provided entropy, and the hardware isn't providing
sufficient entropy, where did you think the kernel is supposed to get
the entropy from?  Should it dial 1-800-TRUST-NSA?

>From the dmesg log, you have a Chromebook Acer 14.  I'm guessing the
problem is that Chromebooks have hardware tries *very* hard not to
issue interrupts, since that helps with power savings.  The following
from your dmesg is very interesting:

[0.526786] tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling 
instead

I suspect this isn't a firmware bug; it's the hardware working as
intended / working as designed, for power savings reasons.

So there are two ways to fix this that I can see.  One is to try to
adjust userspace so that it allows the boot to proceed.  As there is
more activity, the disk completion interrupts, the user typing their
username/password into the login screen, etc., there will be timing
events which can be used to harvest entropy.

The other approach would be to compile the kernel with
CONFIG_HW_RANDOM_TPM and to modify drivers/char/tpm/tpm-chip.c tot
initalize chip->hwrng.quality = 500.  We've historically made this
something that the system administrator must set via sysfs.  This is
because we wanted system adminisrators to explicitly say that they
trust the any hardware manufacturer that (a) they haven't been paid by
your choice of the Chinese MSS or the US NSA to introduce a backdoor,i
and (b) they are competent to actually implemnt a _secure_ hardware
random number generator.  Sadly, this has not always been the case.
Please see:

https://www.chromium.org/chromium-os/tpm_firmware_update

And note that your Edgar Chromebook is one the list of devices that
have a TPM with the buggy firmware.  Fortunately this particular TPM
bug only affects RSA prime generation, so as far as I know there is no
_known_ vulerability in your TPM's hardware random number generator.
B ut we want it to be _your_ responsibility to decide you are willing
to truste it.  I certainly don't want to be legally liable --- or even
have the moral responsibility --- of guaranteeing that every single
TPM out there is bug-free(tm).

- Ted


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-26 Thread Theodore Y. Ts'o
On Thu, Apr 26, 2018 at 01:22:02PM -0700, Sultan Alsawaf wrote:
> 
> Also, regardless of what's hanging on CRNG init, CRNG should be able to init 
> on its own in a timely
> manner without the need for user-provided entropy. Userspace was working fine 
> before the recent CRNG
> kernel changes, so I don't think this is a userspace bug.

The CRNG changes were needed because were erroneously saying that the
entropy pool was securely initialized before it really was.  Saying
that CRNG should be able to init on its own is much like saying, "Ted
should be able to fly wherever he wants in his own personal Gulfstream
V."  It would certainly be _nice_ if I could afford my personal jet.
I certainly wish I were that rich.  But the problem is that dollars
(or Euro's) are like entropy, they don't just magically drop out of
the sky.

If there isn't user-provided entropy, and the hardware isn't providing
sufficient entropy, where did you think the kernel is supposed to get
the entropy from?  Should it dial 1-800-TRUST-NSA?

>From the dmesg log, you have a Chromebook Acer 14.  I'm guessing the
problem is that Chromebooks have hardware tries *very* hard not to
issue interrupts, since that helps with power savings.  The following
from your dmesg is very interesting:

[0.526786] tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling 
instead

I suspect this isn't a firmware bug; it's the hardware working as
intended / working as designed, for power savings reasons.

So there are two ways to fix this that I can see.  One is to try to
adjust userspace so that it allows the boot to proceed.  As there is
more activity, the disk completion interrupts, the user typing their
username/password into the login screen, etc., there will be timing
events which can be used to harvest entropy.

The other approach would be to compile the kernel with
CONFIG_HW_RANDOM_TPM and to modify drivers/char/tpm/tpm-chip.c tot
initalize chip->hwrng.quality = 500.  We've historically made this
something that the system administrator must set via sysfs.  This is
because we wanted system adminisrators to explicitly say that they
trust the any hardware manufacturer that (a) they haven't been paid by
your choice of the Chinese MSS or the US NSA to introduce a backdoor,i
and (b) they are competent to actually implemnt a _secure_ hardware
random number generator.  Sadly, this has not always been the case.
Please see:

https://www.chromium.org/chromium-os/tpm_firmware_update

And note that your Edgar Chromebook is one the list of devices that
have a TPM with the buggy firmware.  Fortunately this particular TPM
bug only affects RSA prime generation, so as far as I know there is no
_known_ vulerability in your TPM's hardware random number generator.
B ut we want it to be _your_ responsibility to decide you are willing
to truste it.  I certainly don't want to be legally liable --- or even
have the moral responsibility --- of guaranteeing that every single
TPM out there is bug-free(tm).

- Ted


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-26 Thread Christian Brauner
On Thu, Apr 26, 2018 at 01:22:02PM -0700, Sultan Alsawaf wrote:
> > Hmm, it looks like the multiuser startup is getting blocked on snapd:
> >
> >  29.060s snapd.service
> >
> > graphical.target @1min 32.145s
> > └─multi-user.target @1min 32.145s
> >   └─hddtemp.service @6.512s +28ms
> > └─network-online.target @6.508s
> >   └─NetworkManager-wait-online.service @2.428s +4.079s
> > └─NetworkManager.service @2.016s +404ms
> >   └─dbus.service @1.869s
> > └─basic.target @1.824s
> >   └─sockets.target @1.824s
> > └─snapd.socket @1.821s +1ms
> >   └─sysinit.target @1.812s
> > └─apparmor.service @587ms +1.224s
> >   └─local-fs.target @585ms
> > └─local-fs-pre.target @585ms
> >   └─keyboard-setup.service @235ms +346ms
> > └─systemd-journald.socket @226ms
> >   └─system.slice @225ms
> > └─-.slice @220ms
> >
> > This appears to be some kind of new package management system for
> > Ubuntu:
> >
> > Description-en: Tool to interact with Ubuntu Core Snappy.
> >  Install, configure, refresh and remove snap packages. Snaps are
> >  'universal' packages that work across many different Linux systems,
> >  enabling secure distribution of the latest apps and utilities for
> >  cloud, servers, desktops and the internet of things.
> >
> > Why it the Ubuntu package believes it needs to be fully started before
> > the login screen can display is unclear to me.  It might be worth
> > using systemctl to disable snapd.serivce and see if that makes things
> > work better for you.
> >
> > - Ted
> 
> I removed snapd completely which did nothing.
> 
> Here are new logs:
> systemd-analyze blame: https://hastebin.com/edehikuyeb.css
> systemd-analyze critical-chain: https://hastebin.com/vedufafema.pl
> dmesg: https://hastebin.com/zuwuwoxadu.vbs
> 
> I should also note that leaving the system untouched does not result in it 
> booting: I must
> provide a source of entropy, otherwise it just stays stuck. In both of the 
> dmesgs I've given, I

We have observed a similiar problem  with libvirt. As soon as entropy is
provided the boot finishes otherwise it hangs for a long time.
This is not happening with v4.17-rc1 afaict.

Christian

> manually provided entropy to the system after about 5 minutes of waiting.
> 
> Also, regardless of what's hanging on CRNG init, CRNG should be able to init 
> on its own in a timely
> manner without the need for user-provided entropy. Userspace was working fine 
> before the recent CRNG
> kernel changes, so I don't think this is a userspace bug.
> 
> -Sultan
> 


Re: Linux messages full of `random: get_random_u32 called from`

2018-04-26 Thread Christian Brauner
On Thu, Apr 26, 2018 at 01:22:02PM -0700, Sultan Alsawaf wrote:
> > Hmm, it looks like the multiuser startup is getting blocked on snapd:
> >
> >  29.060s snapd.service
> >
> > graphical.target @1min 32.145s
> > └─multi-user.target @1min 32.145s
> >   └─hddtemp.service @6.512s +28ms
> > └─network-online.target @6.508s
> >   └─NetworkManager-wait-online.service @2.428s +4.079s
> > └─NetworkManager.service @2.016s +404ms
> >   └─dbus.service @1.869s
> > └─basic.target @1.824s
> >   └─sockets.target @1.824s
> > └─snapd.socket @1.821s +1ms
> >   └─sysinit.target @1.812s
> > └─apparmor.service @587ms +1.224s
> >   └─local-fs.target @585ms
> > └─local-fs-pre.target @585ms
> >   └─keyboard-setup.service @235ms +346ms
> > └─systemd-journald.socket @226ms
> >   └─system.slice @225ms
> > └─-.slice @220ms
> >
> > This appears to be some kind of new package management system for
> > Ubuntu:
> >
> > Description-en: Tool to interact with Ubuntu Core Snappy.
> >  Install, configure, refresh and remove snap packages. Snaps are
> >  'universal' packages that work across many different Linux systems,
> >  enabling secure distribution of the latest apps and utilities for
> >  cloud, servers, desktops and the internet of things.
> >
> > Why it the Ubuntu package believes it needs to be fully started before
> > the login screen can display is unclear to me.  It might be worth
> > using systemctl to disable snapd.serivce and see if that makes things
> > work better for you.
> >
> > - Ted
> 
> I removed snapd completely which did nothing.
> 
> Here are new logs:
> systemd-analyze blame: https://hastebin.com/edehikuyeb.css
> systemd-analyze critical-chain: https://hastebin.com/vedufafema.pl
> dmesg: https://hastebin.com/zuwuwoxadu.vbs
> 
> I should also note that leaving the system untouched does not result in it 
> booting: I must
> provide a source of entropy, otherwise it just stays stuck. In both of the 
> dmesgs I've given, I

We have observed a similiar problem  with libvirt. As soon as entropy is
provided the boot finishes otherwise it hangs for a long time.
This is not happening with v4.17-rc1 afaict.

Christian

> manually provided entropy to the system after about 5 minutes of waiting.
> 
> Also, regardless of what's hanging on CRNG init, CRNG should be able to init 
> on its own in a timely
> manner without the need for user-provided entropy. Userspace was working fine 
> before the recent CRNG
> kernel changes, so I don't think this is a userspace bug.
> 
> -Sultan
> 


  1   2   >