from:"Justin Forbes"

Re: [PATCH] tpm: Fix typo in tpmrm class definition

2023-09-12 Thread Justin Forbes

On Tue, Sep 12, 2023 at 4:41 AM Jarkko Sakkinen  wrote:
>
> On Tue Sep 12, 2023 at 1:32 AM EEST, Justin M. Forbes wrote:
> > Commit d2e8071bed0be ("tpm: make all 'class' structures const")
> > unfortunately had a typo for the name on tpmrm.
> >
> > Fixes: d2e8071bed0b ("tpm: make all 'class' structures const")
> > Signed-off-by: Justin M. Forbes 
> > ---
> >  drivers/char/tpm/tpm-chip.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
> > index 23f6f2eda84c..42b1062e33cd 100644
> > --- a/drivers/char/tpm/tpm-chip.c
> > +++ b/drivers/char/tpm/tpm-chip.c
> > @@ -33,7 +33,7 @@ const struct class tpm_class = {
> >   .shutdown_pre = tpm_class_shutdown,
> >  };
> >  const struct class tpmrm_class = {
> > - .name = "tmprm",
> > + .name = "tpmrm",
> >  };
> >  dev_t tpm_devt;
> >
> > --
> > 2.41.0
>
> Unfortunately your patch does not apply:

Fixed with the V2 I just sent out. Seems I had suppress-blank-empty =
true in a config file somewhere. Apologies for wasting your time.

Justin

> $ git-tip
> 0bb80ecc33a8 (HEAD -> next, tag: v6.6-rc1, upstream/master, origin/next) 
> Linux 6.6-rc1
>
> $ b4 am 20230911223238.3495955-1-jfor...@fedoraproject.org
> Analyzing 1 messages in the thread
> Checking attestation on all messages, may take a moment...
> ---
>   ✓ [PATCH] tpm: Fix typo in tpmrm class definition
>   ---
>   ✓ Signed: DKIM/linuxtx.org (From: jfor...@fedoraproject.org)
> ---
> Total patches: 1
> ---
>  Link: 
> https://lore.kernel.org/r/20230911223238.3495955-1-jfor...@fedoraproject.org
>  Base: applies clean to current tree
>git checkout -b 20230911_jforbes_fedoraproject_org HEAD
>git am ./20230911_jforbes_tpm_fix_typo_in_tpmrm_class_definition.mbx
>
> $ git am -3 20230911_jforbes_tpm_fix_typo_in_tpmrm_class_definition.mbx
> Applying: tpm: Fix typo in tpmrm class definition
> error: corrupt patch at line 18
> error: could not build fake ancestor
> Patch failed at 0001 tpm: Fix typo in tpmrm class definition
> hint: Use 'git am --show-current-patch=diff' to see the failed patch
> When you have resolved this problem, run "git am --continue".
> If you prefer to skip this patch, run "git am --skip" instead.
> To restore the original branch and stop patching, run "git am --abort".
>
> BR, Jarkko

Re: [PATCH] Fix typo in tpmrm class definition

2023-09-11 Thread Justin Forbes

On Mon, Sep 11, 2023 at 5:09 PM Jarkko Sakkinen  wrote:
>
> On Fri Sep 8, 2023 at 5:06 PM EEST, Justin M. Forbes wrote:
> > Commit d2e8071bed0be ("tpm: make all 'class' structures const")
> > unfortunately had a typo for the name on tpmrm.
> >
> > Fixes: d2e8071bed0b ("tpm: make all 'class' structures const")
> > Signed-off-by: Justin M. Forbes 
> > ---
> >  drivers/char/tpm/tpm-chip.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
> > index 23f6f2eda84c..42b1062e33cd 100644
> > --- a/drivers/char/tpm/tpm-chip.c
> > +++ b/drivers/char/tpm/tpm-chip.c
> > @@ -33,7 +33,7 @@ const struct class tpm_class = {
> >   .shutdown_pre = tpm_class_shutdown,
> >  };
> >  const struct class tpmrm_class = {
> > - .name = "tmprm",
> > + .name = "tpmrm",
> >  };
> >  dev_t tpm_devt;
> >
> > --
> > 2.41.0
>
> I have issues applying the patch:

Sorry, not sure what the issue is, but I did a git am of it myself to
a fresh checkout of linus' tree and just recreated and sent it. So,
new thread, but hopefully the patch will apply

Justin

>
> $ git am -3 20230908_jforbes_fix_typo_in_tpmrm_class_definition.mbx
> Applying: Fix typo in tpmrm class definition
> error: corrupt patch at line 18
> error: could not build fake ancestor
> Patch failed at 0001 Fix typo in tpmrm class definition
> hint: Use 'git am --show-current-patch=diff' to see the failed patch
> When you have resolved this problem, run "git am --continue".
> If you prefer to skip this patch, run "git am --skip" instead.
> To restore the original branch and stop patching, run "git am --abort".
>
> $ git log -2
> commit ba46245183940de39e42c8456b85ceaf3519b764 (HEAD -> master, 
> origin/master, origin/HEAD)
> Author: Sumit Garg 
> Date:   Tue Aug 22 16:59:33 2023 +0530
>
> KEYS: trusted: tee: Refactor register SHM usage
>
> The OP-TEE driver using the old SMC based ABI permits overlapping shared
> buffers, but with the new FF-A based ABI each physical page may only
> be registered once.
>
> As the key and blob buffer are allocated adjancently, there is no need
> for redundant register shared memory invocation. Also, it is incompatibile
> with FF-A based ABI limitation. So refactor register shared memory
> implementation to use only single invocation to register both key and blob
> buffers.
>
> [jarkko: Added cc to stable.]
> Cc: sta...@vger.kernel.org # v5.16+
> Fixes: 4615e5a34b95 ("optee: add FF-A support")
> Reported-by: Jens Wiklander 
> Signed-off-by: Sumit Garg 
> Tested-by: Jens Wiklander 
> Reviewed-by: Jens Wiklander 
> Signed-off-by: Jarkko Sakkinen 
>
> commit 0bb80ecc33a8fb5a682236443c1e740d5c917d1d (tag: v6.6-rc1, 
> upstream/master, origin/next, next)
> Author: Linus Torvalds 
> Date:   Sun Sep 10 16:28:41 2023 -0700
>
> Linux 6.6-rc1
>
> BR, Jarkko
>

Re: [PATCH RFC] gcc-plugins: Handle GCC version mismatch for OOT modules

2021-01-26 Thread Justin Forbes

On Tue, Jan 26, 2021 at 11:07 AM Greg KH  wrote:
>
> On Tue, Jan 26, 2021 at 10:19:34AM -0600, Josh Poimboeuf wrote:
> > On Tue, Jan 26, 2021 at 10:15:52AM -0600, Justin Forbes wrote:
> > > On Tue, Jan 26, 2021 at 10:05 AM Peter Zijlstra  
> > > wrote:
> > > >
> > > > On Tue, Jan 26, 2021 at 09:46:51AM -0600, Josh Poimboeuf wrote:
> > > > > On Tue, Jan 26, 2021 at 04:15:37PM +0100, Peter Zijlstra wrote:
> > > > > > On Tue, Jan 26, 2021 at 08:51:55AM -0600, Josh Poimboeuf wrote:
> > > > > > > User space mixes compiler versions all the time.  The C ABI is 
> > > > > > > stable.
> > > > > > >
> > > > > > > What specifically is the harder issue you're referring to?
> > > > > >
> > > > > > I don't think the C ABI captures nearly enough. Imagine trying to 
> > > > > > mix a
> > > > > > compiler with and without asm-goto support (ok, we fail to build 
> > > > > > without
> > > > > > by now, but just imagine).
> > > > > >
> > > > > > No C ABI violated, but having that GCC extention vs not having it
> > > > > > radically changes the kernel ABI.
> > > > > >
> > > > > > I think I'm with Greg here, just don't do it.
> > > > >
> > > > > Ok, thank you for an actual example.  asm goto is a good one.
> > > > >
> > > > > But it's not a cut-and-dry issue.  Otherwise how could modversions
> > > > > possibly work?
> > > > >
> > > > > So yes, we should enforce GCC versions, but I still haven't seen a
> > > > > reason it should be more than just "same compiler and *major* 
> > > > > version".
> > > >
> > > > Why bother? rebuilding the kernel and all modules is a matter of 10
> > > > minutes at most on a decently beefy build box.
> > > >
> > > > What actual problem are we trying to solve here?
> > >
> > > This is true for those of us used to working with source and building
> > > by hand. For users who want everything packaged, rebuilding a kernel
> > > package for install is considerably longer, and for distros providing
> > > builds for multiple arches, we are looking at a couple of hours at
> > > best.  From a Fedora standpoint, I am perfectly fine with it failing
> > > if someone tries to build a module on gcc10 when the kernel was built
> > > with gcc11.  It's tedious when the kernel was built with gcc11
> > > yesterday, and a new gcc11 build today means that kernel needs to be
> > > rebuilt.
> >
> > Right.  It's a problem for distro users.  The compiler and kernel are in
> > separate packages, with separate release cadences.  The latest compiler
> > version may not exactly match what was used to build the latest kernel.
>
> Given that distros _should_ be updating their kernel faster than the
> compiler updates, what's the real issue here?  You build a kernel, and
> all external modules, at the same time.  If you want to build them at
> different times, you make your build system ensure they were the same
> compiler so that you are "bug compatible".
>
> And yes, it might be a pain if gcc11 gets updated every other day, but
> as someone living with a "rolling-distro" you get used to it, otherwise
> you just "pin" the build tools and keep that from happening.
>
> This isn't a new thing, we've been doing this for decades, why is this
> surprising?

We definitely build considerably more kernels than toolchains. From a
rawhide standpoint though, a number of testers are willing to test RC
releases, but are not willing to run debug kernels, so they installed
rc4 yesterday, but will not install today's snapshot.  I will build
3-5 new kernels before they update to rc5.  We have been doing things
this way for over a decade. It has never been a problem until we
turned on CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL.  Suddenly I am
getting complaints.

Re: [PATCH RFC] gcc-plugins: Handle GCC version mismatch for OOT modules

2021-01-26 Thread Justin Forbes

On Tue, Jan 26, 2021 at 10:05 AM Peter Zijlstra  wrote:
>
> On Tue, Jan 26, 2021 at 09:46:51AM -0600, Josh Poimboeuf wrote:
> > On Tue, Jan 26, 2021 at 04:15:37PM +0100, Peter Zijlstra wrote:
> > > On Tue, Jan 26, 2021 at 08:51:55AM -0600, Josh Poimboeuf wrote:
> > > > User space mixes compiler versions all the time.  The C ABI is stable.
> > > >
> > > > What specifically is the harder issue you're referring to?
> > >
> > > I don't think the C ABI captures nearly enough. Imagine trying to mix a
> > > compiler with and without asm-goto support (ok, we fail to build without
> > > by now, but just imagine).
> > >
> > > No C ABI violated, but having that GCC extention vs not having it
> > > radically changes the kernel ABI.
> > >
> > > I think I'm with Greg here, just don't do it.
> >
> > Ok, thank you for an actual example.  asm goto is a good one.
> >
> > But it's not a cut-and-dry issue.  Otherwise how could modversions
> > possibly work?
> >
> > So yes, we should enforce GCC versions, but I still haven't seen a
> > reason it should be more than just "same compiler and *major* version".
>
> Why bother? rebuilding the kernel and all modules is a matter of 10
> minutes at most on a decently beefy build box.
>
> What actual problem are we trying to solve here?

This is true for those of us used to working with source and building
by hand. For users who want everything packaged, rebuilding a kernel
package for install is considerably longer, and for distros providing
builds for multiple arches, we are looking at a couple of hours at
best.  From a Fedora standpoint, I am perfectly fine with it failing
if someone tries to build a module on gcc10 when the kernel was built
with gcc11.  It's tedious when the kernel was built with gcc11
yesterday, and a new gcc11 build today means that kernel needs to be
rebuilt.

Re: [PATCH RFC] gcc-plugins: Handle GCC version mismatch for OOT modules

2021-01-26 Thread Justin Forbes

On Tue, Jan 26, 2021 at 2:21 AM Greg KH  wrote:
>
> On Mon, Jan 25, 2021 at 04:07:57PM -0600, Josh Poimboeuf wrote:
> > On Tue, Jan 26, 2021 at 06:44:35AM +0900, Masahiro Yamada wrote:
> > > > > If people use a different compiler, they must be
> > > > > prepared for any possible problem.
> > > > >
> > > > > Using different compiler flags for in-tree and out-of-tree
> > > > > is even more dangerous.
> > > > >
> > > > > For example, CONFIG_GCC_PLUGIN_RANDSTRUCT is enabled
> > > > > for in-tree build, and then disabled for out-of-tree modules,
> > > > > the struct layout will mismatch, won't it?
> > > >
> > > > If you read the patch you'll notice that it handles that case, when it's
> > > > caused by GCC mismatch.
> > > >
> > > > However, as alluded to in the [1] footnote, it doesn't handle the case
> > > > where the OOT build system doesn't have gcc-plugin-devel installed.
> > > > Then CONFIG_GCC_PLUGIN_RANDSTRUCT gets silently disabled and the build
> > > > succeeds!  That happens even without a GCC mismatch.
> > >
> > >
> > > Ah, sorry.
> > >
> > > I responded too early before reading the patch fully.
> > >
> > > But, I do not like to make RANDSTRUCT a special case.
> > >
> > > I'd rather want to stop building for any plugin.
> >
> > Other than RANDSTRUCT there doesn't seem to be any problem with
> > disabling them (and printing a warning) in the OOT build.  Why not give
> > users that option?  It's harmless, and will make distro's (and their
> > users') lives easier.
> >
> > Either GCC mismatch is ok, or it's not.  Let's not half-enforce it.
>
> As I said earlier, it's not ok, we can not support it at all.
>

Support and enforce are 2 completely different things.  To shed a bit
more light on this, the real issue that prompted this was breaking CI
systems.  As we enabled gcc plugins in Fedora, and the toolchain folks
went through 3 different snapshots of gcc 11 in a week. Any CI process
that built an out of tree module failed. I don't think this is nearly
as much of a concern for stable distros, as it is for CI in
development cycles.

Justin

Re: [PATCH] mm/filemap: add static for function __add_to_page_cache_locked

2020-12-07 Thread Justin Forbes

On Mon, Dec 7, 2020 at 2:16 AM Michal Kubecek  wrote:
>
> On Thu, Nov 12, 2020 at 08:18:57AM +0800, Alex Shi wrote:
> >
> >
> > 在 2020/11/11 上午3:50, Andrew Morton 写道:
> > > On Tue, 10 Nov 2020 08:39:24 +0530 Souptick Joarder 
> > >  wrote:
> > >
> > >> On Fri, Nov 6, 2020 at 4:55 PM Alex Shi  
> > >> wrote:
> > >>>
> > >>> Otherwise it cause gcc warning:
> > >>>   ^~~
> > >>> ../mm/filemap.c:830:14: warning: no previous prototype for
> > >>> ‘__add_to_page_cache_locked’ [-Wmissing-prototypes]
> > >>>  noinline int __add_to_page_cache_locked(struct page *page,
> > >>>   ^~
> > >>
> > >> Is CONFIG_DEBUG_INFO_BTF enabled in your .config ?
> > >
> > > hm, yes.
> >
> > When the config enabled, compiling looks good untill pahole tool
> > used to get BTF info, but I still failed on a right version pahole
> > > 1.16. Sorry.
> >
> > >
> > >>>
> > >>> Signed-off-by: Alex Shi 
> > >>> Cc: Andrew Morton 
> > >>> Cc: linux...@kvack.org
> > >>> Cc: linux-kernel@vger.kernel.org
> > >>> ---
> > >>>  mm/filemap.c | 2 +-
> > >>>  1 file changed, 1 insertion(+), 1 deletion(-)
> > >>>
> > >>> diff --git a/mm/filemap.c b/mm/filemap.c
> > >>> index d90614f501da..249cf489f5df 100644
> > >>> --- a/mm/filemap.c
> > >>> +++ b/mm/filemap.c
> > >>> @@ -827,7 +827,7 @@ int replace_page_cache_page(struct page *old, 
> > >>> struct page *new, gfp_t gfp_mask)
> > >>>  }
> > >>>  EXPORT_SYMBOL_GPL(replace_page_cache_page);
> > >>>
> > >>> -noinline int __add_to_page_cache_locked(struct page *page,
> > >>> +static noinline int __add_to_page_cache_locked(struct page *page,
> > >>> struct address_space *mapping,
> > >>> pgoff_t offset, gfp_t gfp,
> > >>> void **shadowp)
> > >
> > > It's unclear to me whether BTF_ID() requires that the target symbol be
> > > non-static.  It doesn't actually reference the symbol:
> > >
> > > #define BTF_ID(prefix, name) \
> > > __BTF_ID(__ID(__BTF_ID__##prefix##__##name##__))
> > >
> >
> > The above usage make me thought BTF don't require the symbol, though
> > the symbol still exist in vmlinux with 'static'.
> >
> > So any comments of this, Alexei?
>
> It's probably more complicated: our v5.10-rc7 builds with
> CONFIG_DEBUG_INFO_BTF=y fail on ppc64 and ppc64le with
>
>  BTFIDS  vmlinux
>FAILED unresolved symbol __add_to_page_cache_locked
>
>
> but succeed on x86_64, i586, aarch64 and s390x. So far I don't see why
> this should depend on architecture.
>
Fedora is failing with rc7 on the same issue on PPC only.

Justin

Re: [PATCH 5.8 35/99] tools/libbpf: Avoid counting local symbols in ABI check

2020-09-30 Thread Justin Forbes

On Wed, Sep 30, 2020 at 12:02 AM Tony Ambardar  wrote:
>
> [adding Michael Ellerman, linux-ppc maintainer]
>
> Hello Justin,
>
> On Tue, 29 Sep 2020 at 14:54, Justin Forbes  wrote:
> >
> > On Tue, Sep 29, 2020 at 6:53 AM Greg Kroah-Hartman
> >  wrote:
> > >
> > > From: Tony Ambardar 
> > >
> > > [ Upstream commit 746f534a4809e07f427f7d13d10f3a6a9641e5c3 ]
> > >
> > > Encountered the following failure building libbpf from kernel 5.8.5 
> > > sources
> > > with GCC 8.4.0 and binutils 2.34: (long paths shortened)
> > >
> > >   Warning: Num of global symbols in sharedobjs/libbpf-in.o (234) does NOT
> > >   match with num of versioned symbols in libbpf.so (236). Please make sure
> > >   all LIBBPF_API symbols are versioned in libbpf.map.
> > > #  --- libbpf_global_syms.tmp2020-09-02 07:30:58.920084380 +
> > > #  +++ libbpf_versioned_syms.tmp 2020-09-02 07:30:58.924084388 +
> > >   @@ -1,3 +1,5 @@
> > >   +_fini
> > >   +_init
> > >bpf_btf_get_fd_by_id
> > >bpf_btf_get_next_id
> > >bpf_create_map
> > >   make[4]: *** [Makefile:210: check_abi] Error 1
> > >
> > > Investigation shows _fini and _init are actually local symbols counted
> > > amongst global ones:
> > >
> > >   $ readelf --dyn-syms --wide libbpf.so|head -10
> > >
> > >   Symbol table '.dynsym' contains 343 entries:
> > >  Num:Value  Size TypeBind   Vis  Ndx Name
> > >0:  0 NOTYPE  LOCAL  DEFAULT  UND
> > >1: 4098 0 SECTION LOCAL  DEFAULT   11
> > >2: 4098 8 FUNCLOCAL  DEFAULT   11 _init@@LIBBPF_0.0.1
> > >3: 00023040 8 FUNCLOCAL  DEFAULT   14 _fini@@LIBBPF_0.0.1
> > >4:  0 OBJECT  GLOBAL DEFAULT  ABS LIBBPF_0.0.4
> > >5:  0 OBJECT  GLOBAL DEFAULT  ABS LIBBPF_0.0.1
> > >6: ffa4 8 FUNCGLOBAL DEFAULT   12 
> > > bpf_object__find_map_by_offset@@LIBBPF_0.0.1
> > >
> > > A previous commit filtered global symbols in sharedobjs/libbpf-in.o. Do 
> > > the
> > > same with the libbpf.so DSO for consistent comparison.
> > >
> > > Fixes: 306b267cb3c4 ("libbpf: Verify versioned symbols")
> > > Signed-off-by: Tony Ambardar 
> > > Signed-off-by: Alexei Starovoitov 
> > > Acked-by: Andrii Nakryiko 
> > > Link: 
> > > https://lore.kernel.org/bpf/20200905214831.1565465-1-tony.ambar...@gmail.com
> > > Signed-off-by: Sasha Levin 
> >
> > This seems to work everywhere else, but breaks PPC64LE.
> >
>
> I also ran into a PPC build error while working on some bpf problems,
> but it seemed
> like a pre-existing PPC issue. I did submit an upstream fix, which is
> marked for stable
> and being reviewed by Michael. See here for discussion and the patch:
> https://lkml.org/lkml/2020/9/17/668.
>
> Is that the same problem you encountered? Does that patch address your issue?

It is not, the issue I see is:
Warning: Num of global symbols in sharedobjs/libbpf-in.o (259) does
NOT match with num of versioned symbols in libbpf.so (50). Please make
sure all LIBBPF_API symbols are versioned in libbpf.map.

I only see it on ppc64le with this patch, all other arch that Fedora
builds are fine (x86_64, i686, aarch64, armv7, s390).  If I revert
this patch, all builds succeed.  We are using gcc 10.2.1 though.

Justin

>
> Thanks,
> Tony
>
> > Justin
> >
> > > ---
> > >  tools/lib/bpf/Makefile |2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > --- a/tools/lib/bpf/Makefile
> > > +++ b/tools/lib/bpf/Makefile
> > > @@ -152,6 +152,7 @@ GLOBAL_SYM_COUNT = $(shell readelf -s --
> > >awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print 
> > > $$NF}' | \
> > >sort -u | wc -l)
> > >  VERSIONED_SYM_COUNT = $(shell readelf --dyn-syms --wide 
> > > $(OUTPUT)libbpf.so | \
> > > + awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print 
> > > $$NF}' | \
> > >   grep -Eo '[^ ]+@LIBBPF_' | cut -d@ -f1 | 
> > > sort -u | wc -l)
> > >
> > >  CMD_TARGETS = $(LIB_TARGET) $(PC_FILE)
> > > @@ -219,6 +220,7 @@ check_abi: $(OUTPUT)libbpf.so
> > > awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print $$NF}'|  \
> > > sort -u > $(OUTPUT)libbpf_global_syms.tmp;   \
> > > readelf --dyn-syms --wide $(OUTPUT)libbpf.so |   \
> > > +   awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print $$NF}'|  \
> > > grep -Eo '[^ ]+@LIBBPF_' | cut -d@ -f1 | \
> > > sort -u > $(OUTPUT)libbpf_versioned_syms.tmp;\
> > > diff -u $(OUTPUT)libbpf_global_syms.tmp  \
> > >
> > >

Re: [PATCH 5.8 35/99] tools/libbpf: Avoid counting local symbols in ABI check

2020-09-29 Thread Justin Forbes

On Tue, Sep 29, 2020 at 6:53 AM Greg Kroah-Hartman
 wrote:
>
> From: Tony Ambardar 
>
> [ Upstream commit 746f534a4809e07f427f7d13d10f3a6a9641e5c3 ]
>
> Encountered the following failure building libbpf from kernel 5.8.5 sources
> with GCC 8.4.0 and binutils 2.34: (long paths shortened)
>
>   Warning: Num of global symbols in sharedobjs/libbpf-in.o (234) does NOT
>   match with num of versioned symbols in libbpf.so (236). Please make sure
>   all LIBBPF_API symbols are versioned in libbpf.map.
> #  --- libbpf_global_syms.tmp2020-09-02 07:30:58.920084380 +
> #  +++ libbpf_versioned_syms.tmp 2020-09-02 07:30:58.924084388 +
>   @@ -1,3 +1,5 @@
>   +_fini
>   +_init
>bpf_btf_get_fd_by_id
>bpf_btf_get_next_id
>bpf_create_map
>   make[4]: *** [Makefile:210: check_abi] Error 1
>
> Investigation shows _fini and _init are actually local symbols counted
> amongst global ones:
>
>   $ readelf --dyn-syms --wide libbpf.so|head -10
>
>   Symbol table '.dynsym' contains 343 entries:
>  Num:Value  Size TypeBind   Vis  Ndx Name
>0:  0 NOTYPE  LOCAL  DEFAULT  UND
>1: 4098 0 SECTION LOCAL  DEFAULT   11
>2: 4098 8 FUNCLOCAL  DEFAULT   11 _init@@LIBBPF_0.0.1
>3: 00023040 8 FUNCLOCAL  DEFAULT   14 _fini@@LIBBPF_0.0.1
>4:  0 OBJECT  GLOBAL DEFAULT  ABS LIBBPF_0.0.4
>5:  0 OBJECT  GLOBAL DEFAULT  ABS LIBBPF_0.0.1
>6: ffa4 8 FUNCGLOBAL DEFAULT   12 
> bpf_object__find_map_by_offset@@LIBBPF_0.0.1
>
> A previous commit filtered global symbols in sharedobjs/libbpf-in.o. Do the
> same with the libbpf.so DSO for consistent comparison.
>
> Fixes: 306b267cb3c4 ("libbpf: Verify versioned symbols")
> Signed-off-by: Tony Ambardar 
> Signed-off-by: Alexei Starovoitov 
> Acked-by: Andrii Nakryiko 
> Link: 
> https://lore.kernel.org/bpf/20200905214831.1565465-1-tony.ambar...@gmail.com
> Signed-off-by: Sasha Levin 

This seems to work everywhere else, but breaks PPC64LE.

Justin

> ---
>  tools/lib/bpf/Makefile |2 ++
>  1 file changed, 2 insertions(+)
>
> --- a/tools/lib/bpf/Makefile
> +++ b/tools/lib/bpf/Makefile
> @@ -152,6 +152,7 @@ GLOBAL_SYM_COUNT = $(shell readelf -s --
>awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print $$NF}' 
> | \
>sort -u | wc -l)
>  VERSIONED_SYM_COUNT = $(shell readelf --dyn-syms --wide $(OUTPUT)libbpf.so | 
> \
> + awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print 
> $$NF}' | \
>   grep -Eo '[^ ]+@LIBBPF_' | cut -d@ -f1 | sort 
> -u | wc -l)
>
>  CMD_TARGETS = $(LIB_TARGET) $(PC_FILE)
> @@ -219,6 +220,7 @@ check_abi: $(OUTPUT)libbpf.so
> awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print $$NF}'|  \
> sort -u > $(OUTPUT)libbpf_global_syms.tmp;   \
> readelf --dyn-syms --wide $(OUTPUT)libbpf.so |   \
> +   awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print $$NF}'|  \
> grep -Eo '[^ ]+@LIBBPF_' | cut -d@ -f1 | \
> sort -u > $(OUTPUT)libbpf_versioned_syms.tmp;\
> diff -u $(OUTPUT)libbpf_global_syms.tmp  \
>
>

Re: crypto: aegis128: error: incompatible types when initializing type 'unsigned char' using type 'uint8x16_t'

2020-07-30 Thread Justin Forbes

On Mon, Jul 27, 2020 at 8:05 AM Andrea Righi  wrote:
>
> I'm experiencing this build error on arm64 after updating to gcc 10:
>
> crypto/aegis128-neon-inner.c: In function 'crypto_aegis128_init_neon':
> crypto/aegis128-neon-inner.c:151:3: error: incompatible types when 
> initializing type 'unsigned char' using type 'uint8x16_t'
>   151 |   k ^ vld1q_u8(const0),
>   |   ^
> crypto/aegis128-neon-inner.c:152:3: error: incompatible types when 
> initializing type 'unsigned char' using type 'uint8x16_t'
>   152 |   k ^ vld1q_u8(const1),
>   |   ^
>
> Anybody knows if there's a fix for this already? Otherwise I'll take a look 
> at it.


I hit it and have been working with Jakub on the issue.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96377

Justin

Re: [PATCH 5.2 123/413] PCI: Add missing link delays required by the PCIe spec

2019-08-05 Thread Justin Forbes

On Sat, Aug 3, 2019 at 1:50 AM Greg Kroah-Hartman
 wrote:
>
> On Fri, Aug 02, 2019 at 12:06:39PM -0500, Justin Forbes wrote:
> > On Wed, Jul 24, 2019 at 3:31 PM Greg Kroah-Hartman
> >  wrote:
> > >
> > > [ Upstream commit c2bf1fc212f7e6f25ace1af8f0b3ac061ea48ba5 ]
> > >
> > > Currently Linux does not follow PCIe spec regarding the required delays
> > > after reset. A concrete example is a Thunderbolt add-in-card that
> > > consists of a PCIe switch and two PCIe endpoints:
> > >
> > >   +-1b.0-[01-6b]00.0-[02-6b]--+-00.0-[03]00.0 TBT controller
> > >   +-01.0-[04-36]-- DS hotplug port
> > >   +-02.0-[37]00.0 xHCI controller
> > >   \-04.0-[38-6b]-- DS hotplug port
> > >
> > > The root port (1b.0) and the PCIe switch downstream ports are all PCIe
> > > gen3 so they support 8GT/s link speeds.
> > >
> > > We wait for the PCIe hierarchy to enter D3cold (runtime):
> > >
> > >   pcieport :00:1b.0: power state changed by ACPI to D3cold
> > >
> > > When it wakes up from D3cold, according to the PCIe 4.0 section 5.8 the
> > > PCIe switch is put to reset and its power is re-applied. This means that
> > > we must follow the rules in PCIe 4.0 section 6.6.1.
> > >
> > > For the PCIe gen3 ports we are dealing with here, the following applies:
> > >
> > >   With a Downstream Port that supports Link speeds greater than 5.0
> > >   GT/s, software must wait a minimum of 100 ms after Link training
> > >   completes before sending a Configuration Request to the device
> > >   immediately below that Port. Software can determine when Link training
> > >   completes by polling the Data Link Layer Link Active bit or by setting
> > >   up an associated interrupt (see Section 6.7.3.3).
> > >
> > > Translating this into the above topology we would need to do this (DLLLA
> > > stands for Data Link Layer Link Active):
> > >
> > >   pcieport :00:1b.0: wait for 100ms after DLLLA is set before access 
> > > to :01:00.0
> > >   pcieport :02:00.0: wait for 100ms after DLLLA is set before access 
> > > to :03:00.0
> > >   pcieport :02:02.0: wait for 100ms after DLLLA is set before access 
> > > to :37:00.0
> > >
> > > I've instrumented the kernel with additional logging so we can see the
> > > actual delays the kernel performs:
> > >
> > >   pcieport :00:1b.0: power state changed by ACPI to D0
> > >   pcieport :00:1b.0: waiting for D3cold delay of 100 ms
> > >   pcieport :00:1b.0: waking up bus
> > >   pcieport :00:1b.0: waiting for D3hot delay of 10 ms
> > >   pcieport :00:1b.0: restoring config space at offset 0x2c (was 0x60, 
> > > writing 0x60)
> > >   ...
> > >   pcieport :00:1b.0: PME# disabled
> > >   pcieport :01:00.0: restoring config space at offset 0x3c (was 
> > > 0x1ff, writing 0x201ff)
> > >   ...
> > >   pcieport :01:00.0: PME# disabled
> > >   pcieport :02:00.0: restoring config space at offset 0x3c (was 
> > > 0x1ff, writing 0x201ff)
> > >   ...
> > >   pcieport :02:00.0: PME# disabled
> > >   pcieport :02:01.0: restoring config space at offset 0x3c (was 
> > > 0x1ff, writing 0x201ff)
> > >   ...
> > >   pcieport :02:01.0: restoring config space at offset 0x4 (was 
> > > 0x10, writing 0x100407)
> > >   pcieport :02:01.0: PME# disabled
> > >   pcieport :02:02.0: restoring config space at offset 0x3c (was 
> > > 0x1ff, writing 0x201ff)
> > >   ...
> > >   pcieport :02:02.0: PME# disabled
> > >   pcieport :02:04.0: restoring config space at offset 0x3c (was 
> > > 0x1ff, writing 0x201ff)
> > >   ...
> > >   pcieport :02:04.0: PME# disabled
> > >   pcieport :02:01.0: PME# enabled
> > >   pcieport :02:01.0: waiting for D3hot delay of 10 ms
> > >   pcieport :02:04.0: PME# enabled
> > >   pcieport :02:04.0: waiting for D3hot delay of 10 ms
> > >   thunderbolt :03:00.0: restoring config space at offset 0x14 (was 
> > > 0x0, writing 0x8a04)
> > >   ...
> > >   thunderbolt :03:00.0: PME# disabled
> > >   xhci_hcd :37:00.0: restoring config space at offset 0x10 (was 0x0, 
> > > writing 0x73f0)
> > >   ...
> &g

Re: [PATCH 5.2 123/413] PCI: Add missing link delays required by the PCIe spec

2019-08-02 Thread Justin Forbes

On Wed, Jul 24, 2019 at 3:31 PM Greg Kroah-Hartman
 wrote:
>
> [ Upstream commit c2bf1fc212f7e6f25ace1af8f0b3ac061ea48ba5 ]
>
> Currently Linux does not follow PCIe spec regarding the required delays
> after reset. A concrete example is a Thunderbolt add-in-card that
> consists of a PCIe switch and two PCIe endpoints:
>
>   +-1b.0-[01-6b]00.0-[02-6b]--+-00.0-[03]00.0 TBT controller
>   +-01.0-[04-36]-- DS hotplug port
>   +-02.0-[37]00.0 xHCI controller
>   \-04.0-[38-6b]-- DS hotplug port
>
> The root port (1b.0) and the PCIe switch downstream ports are all PCIe
> gen3 so they support 8GT/s link speeds.
>
> We wait for the PCIe hierarchy to enter D3cold (runtime):
>
>   pcieport :00:1b.0: power state changed by ACPI to D3cold
>
> When it wakes up from D3cold, according to the PCIe 4.0 section 5.8 the
> PCIe switch is put to reset and its power is re-applied. This means that
> we must follow the rules in PCIe 4.0 section 6.6.1.
>
> For the PCIe gen3 ports we are dealing with here, the following applies:
>
>   With a Downstream Port that supports Link speeds greater than 5.0
>   GT/s, software must wait a minimum of 100 ms after Link training
>   completes before sending a Configuration Request to the device
>   immediately below that Port. Software can determine when Link training
>   completes by polling the Data Link Layer Link Active bit or by setting
>   up an associated interrupt (see Section 6.7.3.3).
>
> Translating this into the above topology we would need to do this (DLLLA
> stands for Data Link Layer Link Active):
>
>   pcieport :00:1b.0: wait for 100ms after DLLLA is set before access to 
> :01:00.0
>   pcieport :02:00.0: wait for 100ms after DLLLA is set before access to 
> :03:00.0
>   pcieport :02:02.0: wait for 100ms after DLLLA is set before access to 
> :37:00.0
>
> I've instrumented the kernel with additional logging so we can see the
> actual delays the kernel performs:
>
>   pcieport :00:1b.0: power state changed by ACPI to D0
>   pcieport :00:1b.0: waiting for D3cold delay of 100 ms
>   pcieport :00:1b.0: waking up bus
>   pcieport :00:1b.0: waiting for D3hot delay of 10 ms
>   pcieport :00:1b.0: restoring config space at offset 0x2c (was 0x60, 
> writing 0x60)
>   ...
>   pcieport :00:1b.0: PME# disabled
>   pcieport :01:00.0: restoring config space at offset 0x3c (was 0x1ff, 
> writing 0x201ff)
>   ...
>   pcieport :01:00.0: PME# disabled
>   pcieport :02:00.0: restoring config space at offset 0x3c (was 0x1ff, 
> writing 0x201ff)
>   ...
>   pcieport :02:00.0: PME# disabled
>   pcieport :02:01.0: restoring config space at offset 0x3c (was 0x1ff, 
> writing 0x201ff)
>   ...
>   pcieport :02:01.0: restoring config space at offset 0x4 (was 0x10, 
> writing 0x100407)
>   pcieport :02:01.0: PME# disabled
>   pcieport :02:02.0: restoring config space at offset 0x3c (was 0x1ff, 
> writing 0x201ff)
>   ...
>   pcieport :02:02.0: PME# disabled
>   pcieport :02:04.0: restoring config space at offset 0x3c (was 0x1ff, 
> writing 0x201ff)
>   ...
>   pcieport :02:04.0: PME# disabled
>   pcieport :02:01.0: PME# enabled
>   pcieport :02:01.0: waiting for D3hot delay of 10 ms
>   pcieport :02:04.0: PME# enabled
>   pcieport :02:04.0: waiting for D3hot delay of 10 ms
>   thunderbolt :03:00.0: restoring config space at offset 0x14 (was 0x0, 
> writing 0x8a04)
>   ...
>   thunderbolt :03:00.0: PME# disabled
>   xhci_hcd :37:00.0: restoring config space at offset 0x10 (was 0x0, 
> writing 0x73f0)
>   ...
>   xhci_hcd :37:00.0: PME# disabled
>
> For the switch upstream port (01:00.0) we wait for 100ms but not taking
> into account the DLLLA requirement. We then wait 10ms for D3hot -> D0
> transition of the root port and the two downstream hotplug ports. This
> means that we deviate from what the spec requires.
>
> Performing the same check for system sleep (s2idle) transitions we can
> see following when resuming from s2idle:
>
>   pcieport :00:1b.0: power state changed by ACPI to D0
>   pcieport :00:1b.0: restoring config space at offset 0x2c (was 0x60, 
> writing 0x60)
>   ...
>   pcieport :01:00.0: restoring config space at offset 0x3c (was 0x1ff, 
> writing 0x201ff)
>   ...
>   pcieport :02:02.0: restoring config space at offset 0x3c (was 0x1ff, 
> writing 0x201ff)
>   pcieport :02:02.0: restoring config space at offset 0x2c (was 0x0, 
> writing 0x0)
>   pcieport :02:01.0: restoring config space at offset 0x3c (was 0x1ff, 
> writing 0x201ff)
>   pcieport :02:04.0: restoring config space at offset 0x3c (was 0x1ff, 
> writing 0x201ff)
>   pcieport :02:02.0: restoring config space at offset 0x28 (was 0x0, 
> writing 0x0)
>   pcieport :02:00.0: restoring config space at offset 0x3c (was 0x1ff, 
> writing 0x201ff)
>

Re: [PATCH 5.0 119/123] s390/mm: convert to the generic get_user_pages_fast code

2019-05-22 Thread Justin Forbes

On Mon, May 20, 2019 at 7:30 AM Greg Kroah-Hartman
 wrote:
>
> From: Martin Schwidefsky 
>
> commit 1a42010cdc26bb7e5912984f3c91b8c6d55f089a upstream.
>
> Define the gup_fast_permitted to check against the asce_limit of the
> mm attached to the current task, then replace the s390 specific gup
> code with the generic implementation in mm/gup.c.
>
> Signed-off-by: Martin Schwidefsky 
> Signed-off-by: Greg Kroah-Hartman 

While this code seems to work fine upstream, when backported to 5.0 it
fails to build:

BUILDSTDERR: In file included from ./include/linux/mm.h:98,
BUILDSTDERR:  from mm/gup.c:6:
BUILDSTDERR: mm/gup.c: In function '__get_user_pages_fast':
BUILDSTDERR: ./arch/s390/include/asm/pgtable.h:1277:28: error: too
many arguments to function 'gup_fast_permitted'
BUILDSTDERR:  #define gup_fast_permitted gup_fast_permitted
BUILDSTDERR: ^~
BUILDSTDERR: mm/gup.c:1856:6: note: in expansion of macro 'gup_fast_permitted'
BUILDSTDERR:   if (gup_fast_permitted(start, nr_pages, write)) {

It is missing upstream commit ad8cfb9c42ef83ecf4079bc7d77e6557648e952b
mm/gup: Remove the 'write' parameter from gup_fast_permitted()

Justin
>
> ---
>  arch/s390/Kconfig   |1
>  arch/s390/include/asm/pgtable.h |   12 +
>  arch/s390/mm/Makefile   |2
>  arch/s390/mm/gup.c  |  291 
> 
>  4 files changed, 14 insertions(+), 292 deletions(-)
>
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -148,6 +148,7 @@ config S390
> select HAVE_FUNCTION_TRACER
> select HAVE_FUTEX_CMPXCHG if FUTEX
> select HAVE_GCC_PLUGINS
> +   select HAVE_GENERIC_GUP
> select HAVE_KERNEL_BZIP2
> select HAVE_KERNEL_GZIP
> select HAVE_KERNEL_LZ4
> --- a/arch/s390/include/asm/pgtable.h
> +++ b/arch/s390/include/asm/pgtable.h
> @@ -1264,6 +1264,18 @@ static inline pte_t *pte_offset(pmd_t *p
>  #define pte_offset_map(pmd, address) pte_offset_kernel(pmd, address)
>  #define pte_unmap(pte) do { } while (0)
>
> +static inline bool gup_fast_permitted(unsigned long start, int nr_pages)
> +{
> +   unsigned long len, end;
> +
> +   len = (unsigned long) nr_pages << PAGE_SHIFT;
> +   end = start + len;
> +   if (end < start)
> +   return false;
> +   return end <= current->mm->context.asce_limit;
> +}
> +#define gup_fast_permitted gup_fast_permitted
> +
>  #define pfn_pte(pfn,pgprot) mk_pte_phys(__pa((pfn) << PAGE_SHIFT),(pgprot))
>  #define pte_pfn(x) (pte_val(x) >> PAGE_SHIFT)
>  #define pte_page(x) pfn_to_page(pte_pfn(x))
> --- a/arch/s390/mm/Makefile
> +++ b/arch/s390/mm/Makefile
> @@ -4,7 +4,7 @@
>  #
>
>  obj-y  := init.o fault.o extmem.o mmap.o vmem.o maccess.o
> -obj-y  += page-states.o gup.o pageattr.o pgtable.o pgalloc.o
> +obj-y  += page-states.o pageattr.o pgtable.o pgalloc.o
>
>  obj-$(CONFIG_CMM)  += cmm.o
>  obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
> --- a/arch/s390/mm/gup.c
> +++ /dev/null
> @@ -1,291 +0,0 @@
> -// SPDX-License-Identifier: GPL-2.0
> -/*
> - *  Lockless get_user_pages_fast for s390
> - *
> - *  Copyright IBM Corp. 2010
> - *  Author(s): Martin Schwidefsky 
> - */
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -
> -/*
> - * The performance critical leaf functions are made noinline otherwise gcc
> - * inlines everything into a single function which results in too much
> - * register pressure.
> - */
> -static inline int gup_pte_range(pmd_t pmd, unsigned long addr,
> -   unsigned long end, int write, struct page **pages, int *nr)
> -{
> -   struct page *head, *page;
> -   unsigned long mask;
> -   pte_t *ptep, pte;
> -
> -   mask = (write ? _PAGE_PROTECT : 0) | _PAGE_INVALID | _PAGE_SPECIAL;
> -
> -   ptep = pte_offset_map(, addr);
> -   do {
> -   pte = *ptep;
> -   barrier();
> -   /* Similar to the PMD case, NUMA hinting must take slow path 
> */
> -   if (pte_protnone(pte))
> -   return 0;
> -   if ((pte_val(pte) & mask) != 0)
> -   return 0;
> -   VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
> -   page = pte_page(pte);
> -   head = compound_head(page);
> -   if (!page_cache_get_speculative(head))
> -   return 0;
> -   if (unlikely(pte_val(pte) != pte_val(*ptep))) {
> -   put_page(head);
> -   return 0;
> -   }
> -   VM_BUG_ON_PAGE(compound_head(page) != head, page);
> -   pages[*nr] = page;
> -   (*nr)++;
> -
> -   } while (ptep++, addr += PAGE_SIZE, addr != end);
> -
> -   return 1;
> -}
> -
> -static inline int gup_huge_pmd(pmd_t *pmdp, pmd_t pmd, unsigned long addr,
> -   unsigned long end, int write, struct

Re: [PATCH] s390: mark __cpacf_check_opcode() and cpacf_query_func() as __always_inline

2019-05-20 Thread Justin Forbes

On Fri, May 17, 2019 at 1:55 AM Masahiro Yamada
 wrote:
>
> Commit e60fb8bf68d4 ("s390/cpacf: mark scpacf_query() as __always_inline")
> was not enough to make sure to meet the 'i' (immediate) constraint for the
> asm operands.
>
> With CONFIG_OPTIMIZE_INLINING enabled, Laura Abbott reported error
> with gcc 9.1.1:
>
>   In file included from arch/s390/crypto/prng.c:29:
>   ./arch/s390/include/asm/cpacf.h: In function 'cpacf_query_func':
>   ./arch/s390/include/asm/cpacf.h:170:2: warning: asm operand 3 probably 
> doesn't match constraints
> 170 |  asm volatile(
> |  ^~~
>   ./arch/s390/include/asm/cpacf.h:170:2: error: impossible constraint in 'asm'
>
> Add more __always_inline to force inlining.
>
> Fixes: 9012d011660e ("compiler: allow all arches to enable 
> CONFIG_OPTIMIZE_INLINING")
> Reported-by: Laura Abbott 
> Signed-off-by: Masahiro Yamada 
> ---
>
Thanks for the fix, this does indeed fix the build issues for us.

Justin

Tested-by: Justin Forbes 

>  arch/s390/include/asm/cpacf.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/s390/include/asm/cpacf.h b/arch/s390/include/asm/cpacf.h
> index f316de40e51b..19459dfb4295 100644
> --- a/arch/s390/include/asm/cpacf.h
> +++ b/arch/s390/include/asm/cpacf.h
> @@ -177,7 +177,7 @@ static inline void __cpacf_query(unsigned int opcode, 
> cpacf_mask_t *mask)
> : "cc");
>  }
>
> -static inline int __cpacf_check_opcode(unsigned int opcode)
> +static __always_inline int __cpacf_check_opcode(unsigned int opcode)
>  {
> switch (opcode) {
> case CPACF_KMAC:
> @@ -217,7 +217,7 @@ static inline int cpacf_test_func(cpacf_mask_t *mask, 
> unsigned int func)
> return (mask->bytes[func >> 3] & (0x80 >> (func & 7))) != 0;
>  }
>
> -static inline int cpacf_query_func(unsigned int opcode, unsigned int func)
> +static __always_inline int cpacf_query_func(unsigned int opcode, unsigned 
> int func)
>  {
> cpacf_mask_t mask;
>
> --
> 2.17.1
>

Re: [PATCH] Fix kexec forbidding kernels signed with custom platform keys to boot

2018-08-17 Thread Justin Forbes

On Fri, Aug 17, 2018 at 9:58 AM, James Bottomley
 wrote:
> On Fri, 2018-08-17 at 09:24 +0100, David Howells wrote:
>> James Bottomley  wrote:
>>
>> > > > As a step by step process, I agree.  However, I think we can
>> > > > automate it to the point where you install a package and it
>> > > > says "insert your yubikey" every time you upgrade the kernel
>> > >
>> > > That's a very bad idea.  You train people to unlock their private
>> > > key on request.  It can be abused like one of those emails that
>> > > tells you that your account has been suspended - just follow the
>> > > link and put in your password please.
>> >
>> > It's exactly the same process those of us who use yubikeys for gpg
>> > or ssh keys follow.  You insert your key, you activate the process
>> > that needs the key, it asks for you to confirm your key, you press
>> > the button and the operation gets performed.  Since it's what we as
>> > kernel developers do, I don't see why it's a bad idea for others.
>>
>> You've completely missed the point.
>>
>> You need to think from the PoV of an ordinary user.  Imagine the
>> system does an automatic upgrade and wants to upgrade the kernel.  It
>> pops up a dialogue box saying "please put in your yubikey and enter
>> your password here"[**].  It might do this on a regular basis - and
>> you can be sure that some users at least will become accustomed to
>> just doing this when their computer tells them too.  *That* is the
>> problem.
>
> I was assuming the kernel would get pinned, so when the system
> automatically updates it installs everything else but tells you you
> have to do the kernel manually.  Presumably installing the add your own
> key package would do this.
>
> The point I'm making isn't that everything will just magically work,
> it's that we can design a process for a user to update a distro kernel
> while installing their own key.  I'm sure you can imagine hundreds of
> bad processes that encourage wrong behaviour, but the realistic answer
> is we just wouldn't use them.
>
>
>> Now they follow a link to a dodgy website that causes some code to be
>> downloaded and run.  *It* now pops up a dialogue box that looks
>> exactly like the kernel installer's dialogue that says "please put in
>> your yubikey and enter your password here".  But now we've trained
>> those users to do this on demand...
>>
>> PEBKAC[*].
>>
>> [*] Note that I'm not trying to slight ordinary users here, it's more
>> a fact
>> of psychology.  As a distribution, it's our responsibility to try
>> and
>> protect them as best we can - and training them to unthinkingly
>> bypass the
>> security mechanisms isn't in anyone's best interests.
>>
>> [**] Note also that I've never actually used a yubikey[***], so I'm
>> not sure
>>  whether it takes a password or has some other mechanism to
>> unlock the
>>  key.
>>
>> [***] We also don't want to require that someone buys and keeps track
>> of a
>>   yubikey to be able to use, say, the NVidia driver with
>> Fedora/RHEL.
>>   Using the TPM if installed would be preferable because it's
>> harder to
>>   lose.
>
> I'm perfectly happy to use the TPM as well, and to help design
> processes around it (although I think we'll need both yubikey and TPM).
>  I also have to confess whenever I say yubikey in the context of kernel
> processes I'm making the caveat that everyone else uses a yubikey but I
> use my TPM based keys.
>
>> We also don't necessarily want to encourage ordinary users to fiddle
>> with the system key databases unless they really know what they are
>> doing.  There've been cases where doing this has bricked a machine
>> because the BIOS is buggy. Now I will grant, since you'll probably
>> raise it if I don't;-), that this might be a good reason *for* having
>> our own third party signing key as we could then build the key into
>> our kernels.
>>
>> But if they use a yubikey, they have to get the public key from there
>> into the system key list or possibly the yubikey has to be accessed
>> by the bootloader. The same for the TPM.
>
> For security reasons, a Yubikey should only be connected when you need
> it to sign something.  The TPM you can assume is always available.
>
This is absolutely correct, the important word here being *should*.
The reality is, with weekly kernel updates and various uses of the
Yubikey, the average user is just going to leave it connected. We can
lay out best practices all we want, but it seems pretty obvious that
most users go with convenient or default.  I really wish we could
change that, but it is unlikely.  As a result, we have to make the
convenient or default path also fairly secure.

>> > > Further, you expose the unlocked key on a machine that might be
>> > > compromised.
>> >
>> > No it doesn't; the point about using a yubikey (or any other HSM
>> > type thing) is that the key is shielded inside the module so you
>> > get a signature back and the key can't be compromised even if the
>> > machine

Re: [PATCH] Fix kexec forbidding kernels signed with custom platform keys to boot

2018-08-17 Thread Justin Forbes

On Fri, Aug 17, 2018 at 9:58 AM, James Bottomley
 wrote:
> On Fri, 2018-08-17 at 09:24 +0100, David Howells wrote:
>> James Bottomley  wrote:
>>
>> > > > As a step by step process, I agree.  However, I think we can
>> > > > automate it to the point where you install a package and it
>> > > > says "insert your yubikey" every time you upgrade the kernel
>> > >
>> > > That's a very bad idea.  You train people to unlock their private
>> > > key on request.  It can be abused like one of those emails that
>> > > tells you that your account has been suspended - just follow the
>> > > link and put in your password please.
>> >
>> > It's exactly the same process those of us who use yubikeys for gpg
>> > or ssh keys follow.  You insert your key, you activate the process
>> > that needs the key, it asks for you to confirm your key, you press
>> > the button and the operation gets performed.  Since it's what we as
>> > kernel developers do, I don't see why it's a bad idea for others.
>>
>> You've completely missed the point.
>>
>> You need to think from the PoV of an ordinary user.  Imagine the
>> system does an automatic upgrade and wants to upgrade the kernel.  It
>> pops up a dialogue box saying "please put in your yubikey and enter
>> your password here"[**].  It might do this on a regular basis - and
>> you can be sure that some users at least will become accustomed to
>> just doing this when their computer tells them too.  *That* is the
>> problem.
>
> I was assuming the kernel would get pinned, so when the system
> automatically updates it installs everything else but tells you you
> have to do the kernel manually.  Presumably installing the add your own
> key package would do this.
>
> The point I'm making isn't that everything will just magically work,
> it's that we can design a process for a user to update a distro kernel
> while installing their own key.  I'm sure you can imagine hundreds of
> bad processes that encourage wrong behaviour, but the realistic answer
> is we just wouldn't use them.
>
>
>> Now they follow a link to a dodgy website that causes some code to be
>> downloaded and run.  *It* now pops up a dialogue box that looks
>> exactly like the kernel installer's dialogue that says "please put in
>> your yubikey and enter your password here".  But now we've trained
>> those users to do this on demand...
>>
>> PEBKAC[*].
>>
>> [*] Note that I'm not trying to slight ordinary users here, it's more
>> a fact
>> of psychology.  As a distribution, it's our responsibility to try
>> and
>> protect them as best we can - and training them to unthinkingly
>> bypass the
>> security mechanisms isn't in anyone's best interests.
>>
>> [**] Note also that I've never actually used a yubikey[***], so I'm
>> not sure
>>  whether it takes a password or has some other mechanism to
>> unlock the
>>  key.
>>
>> [***] We also don't want to require that someone buys and keeps track
>> of a
>>   yubikey to be able to use, say, the NVidia driver with
>> Fedora/RHEL.
>>   Using the TPM if installed would be preferable because it's
>> harder to
>>   lose.
>
> I'm perfectly happy to use the TPM as well, and to help design
> processes around it (although I think we'll need both yubikey and TPM).
>  I also have to confess whenever I say yubikey in the context of kernel
> processes I'm making the caveat that everyone else uses a yubikey but I
> use my TPM based keys.
>
>> We also don't necessarily want to encourage ordinary users to fiddle
>> with the system key databases unless they really know what they are
>> doing.  There've been cases where doing this has bricked a machine
>> because the BIOS is buggy. Now I will grant, since you'll probably
>> raise it if I don't;-), that this might be a good reason *for* having
>> our own third party signing key as we could then build the key into
>> our kernels.
>>
>> But if they use a yubikey, they have to get the public key from there
>> into the system key list or possibly the yubikey has to be accessed
>> by the bootloader. The same for the TPM.
>
> For security reasons, a Yubikey should only be connected when you need
> it to sign something.  The TPM you can assume is always available.
>
This is absolutely correct, the important word here being *should*.
The reality is, with weekly kernel updates and various uses of the
Yubikey, the average user is just going to leave it connected. We can
lay out best practices all we want, but it seems pretty obvious that
most users go with convenient or default.  I really wish we could
change that, but it is unlikely.  As a result, we have to make the
convenient or default path also fairly secure.

>> > > Further, you expose the unlocked key on a machine that might be
>> > > compromised.
>> >
>> > No it doesn't; the point about using a yubikey (or any other HSM
>> > type thing) is that the key is shielded inside the module so you
>> > get a signature back and the key can't be compromised even if the
>> > machine

Re: [Ksummit-discuss] bug-introducing patches

2018-05-08 Thread Justin Forbes

On Tue, May 8, 2018 at 3:55 PM, Sasha Levin
 wrote:
> On Tue, May 08, 2018 at 08:40:02PM +, Matthew Wilcox wrote:
>>I think your sample size omits some people. I run Debian Testing on my
>>laptop. That gets something akin to a Linus release pretty soon after he
>>releases it, and while it gets some amount of -stable patches, it
>>progresses to the next release fairly rapidly.
>
> Debian testing is pretty much a -stable tree, see the git log history:
>
> https://salsa.debian.org/kernel-team/linux/commits/sid
>
> It follows a current stable tree, and moves on to the next one once it's
> available (about a week after Linus releases a new kernel).
>
>>Added Ben to the cc for more updates.
>>
>>I think Fedora does something similar.
>
> Fedora's rawhide is just (daily?) builds of Linus's tree, they don't
> care what stage the tree is in at any point.

It is, but there is a branch point when Linus releases. If we are
working on a new Fedora release, such as F28, all testing stayed on
4.16.0 until stable updates were released. If there is no release
deadline nearing, we have a "stabilization" repository where people
are using and testing the .0 release until stable updates happen. In
either case, the Linus release is really only tested until the stable
.1 happens, but there are users and testers of .0.

>
> My point is that no one picks a release and sticks with it more than a
> week. If someone plans to use a release for longer term they use a
> -stable tree, and if they are interested in testing, they move on to the
> next release once it's available.
>
> There's no one, for example, who picked up vanilla v4.16 and plans to
> keep using it for a year.
>
> This leads to my point about rushing fixes: -stable releases for v4.16 are
> done weekly, there's no need to rush them in during v4.16-rc8 just to
> make some imaginary release no one will pick up.

Re: [Ksummit-discuss] bug-introducing patches

2018-05-08 Thread Justin Forbes

On Tue, May 8, 2018 at 3:55 PM, Sasha Levin
 wrote:
> On Tue, May 08, 2018 at 08:40:02PM +, Matthew Wilcox wrote:
>>I think your sample size omits some people. I run Debian Testing on my
>>laptop. That gets something akin to a Linus release pretty soon after he
>>releases it, and while it gets some amount of -stable patches, it
>>progresses to the next release fairly rapidly.
>
> Debian testing is pretty much a -stable tree, see the git log history:
>
> https://salsa.debian.org/kernel-team/linux/commits/sid
>
> It follows a current stable tree, and moves on to the next one once it's
> available (about a week after Linus releases a new kernel).
>
>>Added Ben to the cc for more updates.
>>
>>I think Fedora does something similar.
>
> Fedora's rawhide is just (daily?) builds of Linus's tree, they don't
> care what stage the tree is in at any point.

It is, but there is a branch point when Linus releases. If we are
working on a new Fedora release, such as F28, all testing stayed on
4.16.0 until stable updates were released. If there is no release
deadline nearing, we have a "stabilization" repository where people
are using and testing the .0 release until stable updates happen. In
either case, the Linus release is really only tested until the stable
.1 happens, but there are users and testers of .0.

>
> My point is that no one picks a release and sticks with it more than a
> week. If someone plans to use a release for longer term they use a
> -stable tree, and if they are interested in testing, they move on to the
> next release once it's available.
>
> There's no one, for example, who picked up vanilla v4.16 and plans to
> keep using it for a year.
>
> This leads to my point about rushing fixes: -stable releases for v4.16 are
> done weekly, there's no need to rush them in during v4.16-rc8 just to
> make some imaginary release no one will pick up.

Re: [Ksummit-discuss] bug-introducing patches

2018-05-08 Thread Justin Forbes

On Mon, May 7, 2018 at 9:34 PM, Sasha Levin
 wrote:
> On Thu, May 03, 2018 at 04:09:05PM -0700, Tony Lindgren wrote:
>>* Mark Brown  [180503 22:44]:
>>> On Wed, May 02, 2018 at 08:52:29PM -0700, Guenter Roeck wrote:
>>>
>>> > As for -next, me and others stopped reporting bugs in it, because when we 
>>> > do
>>> > we tend to get flamed for the "noise". Is anyone aware (or cares) that 
>>> > mips
>>> > and nds32 images don't build ? Soaking clothes in an empty bathtub won't 
>>> > make
>>> > them wet, and bugs in code which no one builds, much less tests or uses, 
>>> > won't
>>> > be found.
>>>
>>> You've been flamed for testing -next?  That's not been my experience and
>>> frankly it's pretty horrifying that it's happening.  Testing is pretty
>>> much the whole point of -next existing in the first place so you have to
>>> wonder why people are putting their trees there if they don't want
>>> testing.  I have seen a few issues with people reporting bugs on old
>>> versions of -next but otherwise...
>>
>>Yes I agree testing Linux next is very important. That's the best way for
>>maintainers to ensure a usable -rc1 after a merge window. And then for
>>the -rc cycle, there not much of need for chasing bugs to get things working.
>>
>>Bugs reported for Linux next often seem to get fixed or reverted faster
>>compared to the -rc cycle too. I think that's because people realize that
>>their code will not get merged until it's been fixed.
>>
>>So some daily testing of Linux next can save a lot scrambling after the
>>merge window :)
>>
>>Users don't usually upgrade kernels until after later -rc releases or only
>>after major releases so that probably explains some of the -rc cycle fixes.
>
> Tony, I'm curious, how many users are you aware of who actually run
> Linus's tree? All the users I've encountered so far on Azure seem to be
> running something based on -stable.

I couldn't tell you the number of users we have running rawhide
kernels (daily builds of Linus's tree), but it is a positive integer.
We do get bug reports on things, sometimes a day after Linus commits
them.

>
> I can't really get any solid statistics about that on my end both
> because I don't have visibility inside user VMs (I don't actually have
> prod access believe it or not), and even if I had it would probably be
> confidential, so I'm just basing this on reports from user's I've seen
> so far.
>
> I think that a question we should be asking ourselves is whether we
> should be basing our decisions here on the assumption that (pretty much)
> no one runs Linus's tree anymore?

Re: [Ksummit-discuss] bug-introducing patches

2018-05-08 Thread Justin Forbes

On Mon, May 7, 2018 at 9:34 PM, Sasha Levin
 wrote:
> On Thu, May 03, 2018 at 04:09:05PM -0700, Tony Lindgren wrote:
>>* Mark Brown  [180503 22:44]:
>>> On Wed, May 02, 2018 at 08:52:29PM -0700, Guenter Roeck wrote:
>>>
>>> > As for -next, me and others stopped reporting bugs in it, because when we 
>>> > do
>>> > we tend to get flamed for the "noise". Is anyone aware (or cares) that 
>>> > mips
>>> > and nds32 images don't build ? Soaking clothes in an empty bathtub won't 
>>> > make
>>> > them wet, and bugs in code which no one builds, much less tests or uses, 
>>> > won't
>>> > be found.
>>>
>>> You've been flamed for testing -next?  That's not been my experience and
>>> frankly it's pretty horrifying that it's happening.  Testing is pretty
>>> much the whole point of -next existing in the first place so you have to
>>> wonder why people are putting their trees there if they don't want
>>> testing.  I have seen a few issues with people reporting bugs on old
>>> versions of -next but otherwise...
>>
>>Yes I agree testing Linux next is very important. That's the best way for
>>maintainers to ensure a usable -rc1 after a merge window. And then for
>>the -rc cycle, there not much of need for chasing bugs to get things working.
>>
>>Bugs reported for Linux next often seem to get fixed or reverted faster
>>compared to the -rc cycle too. I think that's because people realize that
>>their code will not get merged until it's been fixed.
>>
>>So some daily testing of Linux next can save a lot scrambling after the
>>merge window :)
>>
>>Users don't usually upgrade kernels until after later -rc releases or only
>>after major releases so that probably explains some of the -rc cycle fixes.
>
> Tony, I'm curious, how many users are you aware of who actually run
> Linus's tree? All the users I've encountered so far on Azure seem to be
> running something based on -stable.

I couldn't tell you the number of users we have running rawhide
kernels (daily builds of Linus's tree), but it is a positive integer.
We do get bug reports on things, sometimes a day after Linus commits
them.

>
> I can't really get any solid statistics about that on my end both
> because I don't have visibility inside user VMs (I don't actually have
> prod access believe it or not), and even if I had it would probably be
> confidential, so I'm just basing this on reports from user's I've seen
> so far.
>
> I think that a question we should be asking ourselves is whether we
> should be basing our decisions here on the assumption that (pretty much)
> no one runs Linus's tree anymore?

Re: [Ksummit-discuss] bug-introducing patches

2018-05-03 Thread Justin Forbes

On Thu, May 3, 2018 at 11:02 AM, Sasha Levin
 wrote:
> On Thu, May 03, 2018 at 08:49:11AM -0700, Guenter Roeck wrote:
>>On Thu, May 03, 2018 at 02:55:36PM +, Sasha Levin wrote:
>>> On Wed, May 02, 2018 at 05:38:32PM -0700, Guenter Roeck wrote:
>>> >On 05/02/2018 05:06 PM, Theodore Y. Ts'o wrote:
>>> >>On Wed, May 02, 2018 at 10:41:56PM +0200, Geert Uytterhoeven wrote:
>>> >>>
>>> >>>Between v4.17-rc1 and v4.17-rc3, there are 660 non-merge commits, of 
>>> >>>which
>>> >>>   - 245 carry a Fixes tag,
>>> >>>   - 196 carry a CC stable,
>>> >>>   - 395 contain the string "fix".
>>> >>>(non-mutually exclusive)
>>> >>>
>>> >>>That leaves us with 200 commits not falling in the bugfix category.
>>> >>
>>> >>Some non-bug fixes are allowed in -rc2.  So perhaps what might be
>>> >>interesting is to look at v4.16 (which is completed), and look at the
>>> >>distribution of commits:
>>> >>
>>> >>   * regressions fixes (for bugs introduced during the current
>>> >>   release cycle)
>>> >>   * "normal" bug fixes
>>> >>   * commits which don't touch code (e.g., spelling or
>>> >>   documentation-only fixes)
>>> >>   * other commits (features or cleanup fixes)
>>> >>
>>> >>at each rcX level.  The historic "standard" has been feature commits
>>> >>in -rc1 and -rc2 (tolerated, but ideally should before the merge
>>> >>window), bug fixes / regressions in -rc3 and -rc4, and after -rc4,
>>> >>regression fixes only.  It would be interesting to see how well we
>>> >>have been holding to the historical ideal.
>>> >>
>>> >>It would then be intersting to use Sasha's analysis to see whether
>>> >>there are more bug fixes caused by regression fixes versus normal bug
>>> >>fixes, and whether or not they are common when fixes come "out of
>>> >>cycle" --- for example, a non-regression bug fix in -rc5 or -rc6.
>>> >>
>>> >>Because if that last is the case, then the prescription is very simple
>>> >>and not controversial --- bug fixes found post -rc4 should be held to
>>> >>the next merge window.
>>> >>
>>> >
>>> >Holding up even fixes for severe bugs for 4-6 weeks ? Seriously, that is
>>> >unrealistic. Holding up the fix for the next SpeckHammer because it was not
>>> >ready before -rc4 ? I don't think so.
>>>
>>> For severe problems, the patch usually gets more than enough reviews and
>>> testing, so I don't see a need to soak it in -next more than some
>>> minimal amount of time to get bot coverage.
>>>
>>> However, these things show up only a few times per year. Most of the
>>> fixes even in late -rc cycles are for older bugs that aren't too
>>> critical. We can't base our decision on severe bugs that get exceptional
>>> treatment anyways (see PTI getting pushed in -stable).
>>>
>>> >Even when not counting severe problems, you are adding lots of additional 
>>> >work
>>> >for those who do and want to rely on stable releases to merge in bug fixes.
>>> >Sure, I am at times annoyed having to deal with a regression in a stable
>>> >release, but it very much beats digging through various mailing lists for
>>> >pending patches to fix CVEs, or for crashes seen in the field, just because
>>> >they are held hostage by some restrictive process. Even worse, I'd end up
>>> >picking the regressions anyway because I can _not_ wait those 4-6 weeks
>>> >plus the time it takes for the fixes to show up in a stable release.
>>>
>>> I think that for -stable we don't have a good idea how soon we want to
>>> merge patches in. On one hand enterprise distro folks complain we're
>>> jumping the gun, and on the other hand folks like yourself claim we're
>>> too slow :)
>>>
>>
>>You are misquoting me. I am saying that it would be a bad idea to hold up
>>bug fixes after -rc4, which is quite different to saying that patches
>>don't make it into stable releases fast enough. I am perfectly happy to
>>wait a week or so for a patch to soak in _mainline_ before being applied
>>to stable.
>
> Most bug fixes that go in at that point are fixes for previous released
> kernels, what's the harm in keeping them around for longer?
>
> I'm not saying that it should be some arbitrary rule for everyone, but
> just suggesting that maintainers should exercise more caution merging
> untested commits that don't even fix a current regression.
>
There is a balance here. In the past, one of the biggest complaints we
had as distro maintainers was that known regressions get reported, and
fixed, and then the maintainer would sit on the fix until the next
merge window. This happened even for trivial fixes. And not being in
tree does keep it out of stable.  This has improved greatly recently.
Perhaps things have over compensated, but I don' t think that putting
a blanket rule out there is the answer. Just perhaps some best
practices for test coverage.

> w.r.t stable, as you just said, you're fine with a week or two, the
> enterprise folks (as well as Ted, to some extend, in this thread)
> suggest that this should be a month+

I don' t have

Re: [Ksummit-discuss] bug-introducing patches

2018-05-03 Thread Justin Forbes

On Thu, May 3, 2018 at 11:02 AM, Sasha Levin
 wrote:
> On Thu, May 03, 2018 at 08:49:11AM -0700, Guenter Roeck wrote:
>>On Thu, May 03, 2018 at 02:55:36PM +, Sasha Levin wrote:
>>> On Wed, May 02, 2018 at 05:38:32PM -0700, Guenter Roeck wrote:
>>> >On 05/02/2018 05:06 PM, Theodore Y. Ts'o wrote:
>>> >>On Wed, May 02, 2018 at 10:41:56PM +0200, Geert Uytterhoeven wrote:
>>> >>>
>>> >>>Between v4.17-rc1 and v4.17-rc3, there are 660 non-merge commits, of 
>>> >>>which
>>> >>>   - 245 carry a Fixes tag,
>>> >>>   - 196 carry a CC stable,
>>> >>>   - 395 contain the string "fix".
>>> >>>(non-mutually exclusive)
>>> >>>
>>> >>>That leaves us with 200 commits not falling in the bugfix category.
>>> >>
>>> >>Some non-bug fixes are allowed in -rc2.  So perhaps what might be
>>> >>interesting is to look at v4.16 (which is completed), and look at the
>>> >>distribution of commits:
>>> >>
>>> >>   * regressions fixes (for bugs introduced during the current
>>> >>   release cycle)
>>> >>   * "normal" bug fixes
>>> >>   * commits which don't touch code (e.g., spelling or
>>> >>   documentation-only fixes)
>>> >>   * other commits (features or cleanup fixes)
>>> >>
>>> >>at each rcX level.  The historic "standard" has been feature commits
>>> >>in -rc1 and -rc2 (tolerated, but ideally should before the merge
>>> >>window), bug fixes / regressions in -rc3 and -rc4, and after -rc4,
>>> >>regression fixes only.  It would be interesting to see how well we
>>> >>have been holding to the historical ideal.
>>> >>
>>> >>It would then be intersting to use Sasha's analysis to see whether
>>> >>there are more bug fixes caused by regression fixes versus normal bug
>>> >>fixes, and whether or not they are common when fixes come "out of
>>> >>cycle" --- for example, a non-regression bug fix in -rc5 or -rc6.
>>> >>
>>> >>Because if that last is the case, then the prescription is very simple
>>> >>and not controversial --- bug fixes found post -rc4 should be held to
>>> >>the next merge window.
>>> >>
>>> >
>>> >Holding up even fixes for severe bugs for 4-6 weeks ? Seriously, that is
>>> >unrealistic. Holding up the fix for the next SpeckHammer because it was not
>>> >ready before -rc4 ? I don't think so.
>>>
>>> For severe problems, the patch usually gets more than enough reviews and
>>> testing, so I don't see a need to soak it in -next more than some
>>> minimal amount of time to get bot coverage.
>>>
>>> However, these things show up only a few times per year. Most of the
>>> fixes even in late -rc cycles are for older bugs that aren't too
>>> critical. We can't base our decision on severe bugs that get exceptional
>>> treatment anyways (see PTI getting pushed in -stable).
>>>
>>> >Even when not counting severe problems, you are adding lots of additional 
>>> >work
>>> >for those who do and want to rely on stable releases to merge in bug fixes.
>>> >Sure, I am at times annoyed having to deal with a regression in a stable
>>> >release, but it very much beats digging through various mailing lists for
>>> >pending patches to fix CVEs, or for crashes seen in the field, just because
>>> >they are held hostage by some restrictive process. Even worse, I'd end up
>>> >picking the regressions anyway because I can _not_ wait those 4-6 weeks
>>> >plus the time it takes for the fixes to show up in a stable release.
>>>
>>> I think that for -stable we don't have a good idea how soon we want to
>>> merge patches in. On one hand enterprise distro folks complain we're
>>> jumping the gun, and on the other hand folks like yourself claim we're
>>> too slow :)
>>>
>>
>>You are misquoting me. I am saying that it would be a bad idea to hold up
>>bug fixes after -rc4, which is quite different to saying that patches
>>don't make it into stable releases fast enough. I am perfectly happy to
>>wait a week or so for a patch to soak in _mainline_ before being applied
>>to stable.
>
> Most bug fixes that go in at that point are fixes for previous released
> kernels, what's the harm in keeping them around for longer?
>
> I'm not saying that it should be some arbitrary rule for everyone, but
> just suggesting that maintainers should exercise more caution merging
> untested commits that don't even fix a current regression.
>
There is a balance here. In the past, one of the biggest complaints we
had as distro maintainers was that known regressions get reported, and
fixed, and then the maintainer would sit on the fix until the next
merge window. This happened even for trivial fixes. And not being in
tree does keep it out of stable.  This has improved greatly recently.
Perhaps things have over compensated, but I don' t think that putting
a blanket rule out there is the answer. Just perhaps some best
practices for test coverage.

> w.r.t stable, as you just said, you're fine with a week or two, the
> enterprise folks (as well as Ted, to some extend, in this thread)
> suggest that this should be a month+

I don' t have an issue with some things

Re: Linux messages full of `random: get_random_u32 called from`

2018-05-03 Thread Justin Forbes

On Wed, May 2, 2018 at 5:25 PM, Theodore Y. Ts'o  wrote:
> On Wed, May 02, 2018 at 10:49:34AM -0700, Laura Abbott wrote:
>>
>> It is a Fedora patch we're carrying
>> https://src.fedoraproject.org/rpms/libgcrypt/blob/master/f/libgcrypt-1.6.2-fips-ctor.patch#_23
>> so yes, it is a Fedora specific use case.
>> From talking to the libgcrypt team, this is a FIPS mode requirement
>> to run power on self test at the library constructor and the self
>> test of libgrcypt ends up requiring a fully seeded RNG. Citation
>> is in section 9.10 of
>> https://csrc.nist.gov/CSRC/media/Projects/Cryptographic-Module-Validation-Program/documents/fips140-2/FIPS1402IG.pdf
>
> Forgive me if this is a stupid question, but does Fedora need FIPS
> compliance?  Or is this something which is only required for RHEL?
>
> ("Here's to FIPS: the cause of, and solution to, all of Life's
> problems."  :-)
>
One of the advantages of carrying such things in Fedora is we find
these problems before RHEL does and hopefully there is a solution in
place before they ever even see it.

>From the rawhide end, I just brought in virtio-rng as inline vs
module, this works around the issue for lots of users, but not all.
GCE is still impacted, and a user came to complain about it already
last night.  And of course any other virt platform without virtio-rng,
or some hardware. Most hardware installs don't have dracut-fips so
they will boot, eventually.

Justin

Re: Linux messages full of `random: get_random_u32 called from`

2018-05-03 Thread Justin Forbes

On Wed, May 2, 2018 at 5:25 PM, Theodore Y. Ts'o  wrote:
> On Wed, May 02, 2018 at 10:49:34AM -0700, Laura Abbott wrote:
>>
>> It is a Fedora patch we're carrying
>> https://src.fedoraproject.org/rpms/libgcrypt/blob/master/f/libgcrypt-1.6.2-fips-ctor.patch#_23
>> so yes, it is a Fedora specific use case.
>> From talking to the libgcrypt team, this is a FIPS mode requirement
>> to run power on self test at the library constructor and the self
>> test of libgrcypt ends up requiring a fully seeded RNG. Citation
>> is in section 9.10 of
>> https://csrc.nist.gov/CSRC/media/Projects/Cryptographic-Module-Validation-Program/documents/fips140-2/FIPS1402IG.pdf
>
> Forgive me if this is a stupid question, but does Fedora need FIPS
> compliance?  Or is this something which is only required for RHEL?
>
> ("Here's to FIPS: the cause of, and solution to, all of Life's
> problems."  :-)
>
One of the advantages of carrying such things in Fedora is we find
these problems before RHEL does and hopefully there is a solution in
place before they ever even see it.

>From the rawhide end, I just brought in virtio-rng as inline vs
module, this works around the issue for lots of users, but not all.
GCE is still impacted, and a user came to complain about it already
last night.  And of course any other virt platform without virtio-rng,
or some hardware. Most hardware installs don't have dracut-fips so
they will boot, eventually.

Justin

Re: Linux messages full of `random: get_random_u32 called from`

2018-05-02 Thread Justin Forbes

On Tue, May 1, 2018 at 7:02 PM, Theodore Y. Ts'o <ty...@mit.edu> wrote:
> On Tue, May 01, 2018 at 05:35:56PM -0500, Justin Forbes wrote:
>>
>> I have not reproduced in GCE myself.  We did get some confirmation
>> that removing dracut-fips does make the problem less dire (but I
>> wouldn't call a 4 minute boot a win, but booting in 4 minutes is
>> better than not booting at all).  Specifically systemd calls libgcrypt
>> before it even opens the log with fips there, and this is before
>> virtio-rng modules could even load. Right now though, we are looking
>> at pretty much any possible options as the majority of people are
>> calling for me to backout the patches completely from rawhide.
>
> FWIW, Debian Testing is using systemd 238, and from what I can tell
> it's calling libgcrypt and it has the same (as near as I can tell)
> totally pointless hmac nonsense, and it's not a problem that I can
> see.  Of course, Debian and Fedora may have a different set of
> patches
>
Yes, Fedora libgcrypt is carrying a patch which makes it particularly
painful for us, we have reached out to the libgcrypt maintainer to
follow up on that end. But as I said before, even without that code
path (no dracut-fips) we are seeing some instances of 4 minute boots.
This is not really a workable user experience.  And are you sure that
every cloud platform and VM platform offers, makes it possible to
config virtio-rng?

Justin

Re: Linux messages full of `random: get_random_u32 called from`

2018-05-02 Thread Justin Forbes

On Tue, May 1, 2018 at 7:02 PM, Theodore Y. Ts'o  wrote:
> On Tue, May 01, 2018 at 05:35:56PM -0500, Justin Forbes wrote:
>>
>> I have not reproduced in GCE myself.  We did get some confirmation
>> that removing dracut-fips does make the problem less dire (but I
>> wouldn't call a 4 minute boot a win, but booting in 4 minutes is
>> better than not booting at all).  Specifically systemd calls libgcrypt
>> before it even opens the log with fips there, and this is before
>> virtio-rng modules could even load. Right now though, we are looking
>> at pretty much any possible options as the majority of people are
>> calling for me to backout the patches completely from rawhide.
>
> FWIW, Debian Testing is using systemd 238, and from what I can tell
> it's calling libgcrypt and it has the same (as near as I can tell)
> totally pointless hmac nonsense, and it's not a problem that I can
> see.  Of course, Debian and Fedora may have a different set of
> patches
>
Yes, Fedora libgcrypt is carrying a patch which makes it particularly
painful for us, we have reached out to the libgcrypt maintainer to
follow up on that end. But as I said before, even without that code
path (no dracut-fips) we are seeing some instances of 4 minute boots.
This is not really a workable user experience.  And are you sure that
every cloud platform and VM platform offers, makes it possible to
config virtio-rng?

Justin

Re: Linux messages full of `random: get_random_u32 called from`

2018-05-01 Thread Justin Forbes

On Tue, May 1, 2018 at 7:55 AM, Theodore Y. Ts'o <ty...@mit.edu> wrote:
> On Tue, May 01, 2018 at 06:52:47AM -0500, Justin Forbes wrote:
>>
>> We have also had reports that Fedora users are seeing this on Google
>> Compute Engine.
>
> Can you reproduce this yourself?  If so, could you confirm that
> removing the dracut-fips package makes the problem go away for you?
>

I have not reproduced in GCE myself.  We did get some confirmation
that removing dracut-fips does make the problem less dire (but I
wouldn't call a 4 minute boot a win, but booting in 4 minutes is
better than not booting at all).  Specifically systemd calls libgcrypt
before it even opens the log with fips there, and this is before
virtio-rng modules could even load. Right now though, we are looking
at pretty much any possible options as the majority of people are
calling for me to backout the patches completely from rawhide.

Re: Linux messages full of `random: get_random_u32 called from`

2018-05-01 Thread Justin Forbes

On Tue, May 1, 2018 at 7:55 AM, Theodore Y. Ts'o  wrote:
> On Tue, May 01, 2018 at 06:52:47AM -0500, Justin Forbes wrote:
>>
>> We have also had reports that Fedora users are seeing this on Google
>> Compute Engine.
>
> Can you reproduce this yourself?  If so, could you confirm that
> removing the dracut-fips package makes the problem go away for you?
>

I have not reproduced in GCE myself.  We did get some confirmation
that removing dracut-fips does make the problem less dire (but I
wouldn't call a 4 minute boot a win, but booting in 4 minutes is
better than not booting at all).  Specifically systemd calls libgcrypt
before it even opens the log with fips there, and this is before
virtio-rng modules could even load. Right now though, we are looking
at pretty much any possible options as the majority of people are
calling for me to backout the patches completely from rawhide.

Re: Linux messages full of `random: get_random_u32 called from`

2018-05-01 Thread Justin Forbes

On Mon, Apr 30, 2018 at 4:12 PM, Jeremy Cline  wrote:
> On 04/29/2018 06:05 PM, Theodore Y. Ts'o wrote:
>> On Sun, Apr 29, 2018 at 01:20:33PM -0700, Sultan Alsawaf wrote:
>>> On Sun, Apr 29, 2018 at 08:41:01PM +0200, Pavel Machek wrote:
 Umm. No. https://www.youtube.com/watch?v=xneBjc8z0DE
>>>
>>> Okay, but /dev/urandom isn't a solution to this problem because it isn't 
>>> usable
>>> until crng init is complete, so it suffers from the same init lag as
>>> /dev/random.
>>
>> It's more accurate to say that using /dev/urandom is no worse than
>> before (from a few years ago).  There are, alas, plenty of
>> distributions and user space application programmers that basically
>> got lazy using /dev/urandom, and assumed that there would be plenty of
>> entropy during early system startup.
>>
>> When they switched over the getrandom(2), the most egregious examples
>> of this caused pain (and they got fixed), but due to a bug in
>> drivers/char/random.c, if getrandom(2) was called after the entropy
>> pool was "half initialized", it would not block, but proceed.
>>
>> Is that exploitable?  Well, Jann and I didn't find an _obvious_ way to
>> exploit the short coming, which is this wasn't treated like an
>> emergency situation ala the embarassing situation we had five years
>> ago[1].
>>
>> [1] https://factorable.net/paper.html
>>
>> However, it was enough to make us be uncomfortable, which is why I
>> pushed the changes that I did.  At least on the devices we had at
>> hand, using the distributions that we typically use, the impact seemed
>> minimal.  Unfortuantely, there is no way to know for sure without
>> rolling out change and seeing who screams.  In the ideal world,
>> software would not require cryptographic randomness immediately after
>> boot, before the user logs in.  And ***really***, as in [1], softwaret
>> should not be generating long-term public keys that are essential to
>> the security of the box a few seconds immediately after the device is
>> first unboxed and plugged in.i
>>
>> What would be useful is if people gave reports that listed exactly
>> what laptop and distributions they are using.  Just "a high spec x86
>> laptop" isn't terribly useful, because *my* brand-new Dell XPS 13
>> running Debian testing is working just fine.  The year, model, make,
>> and CPU type plus what distribution (and distro version number) you
>> are running is useful, so I can assess how wide spread the unhappiness
>> is going to be, and what mitigation steps make sense.
>
> Fedora has started seeing some bug reports on this for Fedora 27[0] and
> I've asked reporters to include their hardware details.
>
> [0] https://bugzilla.redhat.com/show_bug.cgi?id=1572944
>

We have also had reports that Fedora users are seeing this on Google
Compute Engine.

Justin

Re: Linux messages full of `random: get_random_u32 called from`

2018-05-01 Thread Justin Forbes

On Mon, Apr 30, 2018 at 4:12 PM, Jeremy Cline  wrote:
> On 04/29/2018 06:05 PM, Theodore Y. Ts'o wrote:
>> On Sun, Apr 29, 2018 at 01:20:33PM -0700, Sultan Alsawaf wrote:
>>> On Sun, Apr 29, 2018 at 08:41:01PM +0200, Pavel Machek wrote:
 Umm. No. https://www.youtube.com/watch?v=xneBjc8z0DE
>>>
>>> Okay, but /dev/urandom isn't a solution to this problem because it isn't 
>>> usable
>>> until crng init is complete, so it suffers from the same init lag as
>>> /dev/random.
>>
>> It's more accurate to say that using /dev/urandom is no worse than
>> before (from a few years ago).  There are, alas, plenty of
>> distributions and user space application programmers that basically
>> got lazy using /dev/urandom, and assumed that there would be plenty of
>> entropy during early system startup.
>>
>> When they switched over the getrandom(2), the most egregious examples
>> of this caused pain (and they got fixed), but due to a bug in
>> drivers/char/random.c, if getrandom(2) was called after the entropy
>> pool was "half initialized", it would not block, but proceed.
>>
>> Is that exploitable?  Well, Jann and I didn't find an _obvious_ way to
>> exploit the short coming, which is this wasn't treated like an
>> emergency situation ala the embarassing situation we had five years
>> ago[1].
>>
>> [1] https://factorable.net/paper.html
>>
>> However, it was enough to make us be uncomfortable, which is why I
>> pushed the changes that I did.  At least on the devices we had at
>> hand, using the distributions that we typically use, the impact seemed
>> minimal.  Unfortuantely, there is no way to know for sure without
>> rolling out change and seeing who screams.  In the ideal world,
>> software would not require cryptographic randomness immediately after
>> boot, before the user logs in.  And ***really***, as in [1], softwaret
>> should not be generating long-term public keys that are essential to
>> the security of the box a few seconds immediately after the device is
>> first unboxed and plugged in.i
>>
>> What would be useful is if people gave reports that listed exactly
>> what laptop and distributions they are using.  Just "a high spec x86
>> laptop" isn't terribly useful, because *my* brand-new Dell XPS 13
>> running Debian testing is working just fine.  The year, model, make,
>> and CPU type plus what distribution (and distro version number) you
>> are running is useful, so I can assess how wide spread the unhappiness
>> is going to be, and what mitigation steps make sense.
>
> Fedora has started seeing some bug reports on this for Fedora 27[0] and
> I've asked reporters to include their hardware details.
>
> [0] https://bugzilla.redhat.com/show_bug.cgi?id=1572944
>

We have also had reports that Fedora users are seeing this on Google
Compute Engine.

Justin

Re: [PATCH 01/24] Add the ability to lock down access to the running kernel image

2018-04-12 Thread Justin Forbes

On Wed, Apr 11, 2018, 5:38 PM Linus Torvalds
 wrote:
>
> On Wed, Apr 11, 2018 at 2:05 PM, Jordan Glover
>  wrote:
> >>
> >> If that /dev/mem access prevention was just instead done as an even
> >> stricter mode of the existing CONFIG_STRICT_DEVMEM, it could just be
> >> enabled unconditionally.
> >
> > CONFIG_DEVMEM=n
>
> It's actually CONFIG_DEVMEM, CONFIG_DEVKMEM and CONFIG_DEVPORT, it's
> just not obvious from the patch.
>
> But the important part is this part:
>
> >> So I would seriously ask that the distros that have been using these
> >> patches look at which parts of lockdown they could make unconditional
> >> (because it doesn't break machines), and which ones need that escape
> >> clause.
>
> .. because I get the feeling that not a lot of people have actually
> been testing this, because "turn off secure boot" is such a universal
> thing when people boot Linux.
>
> So it's really the whole claim that distributions have been running
> for this for the last five years that I wonder about, and how often
> people end up being told: "just disable secure boot":.

Very rarely in my experience. And the one time that we sent a kernel
to updates-testing that was signed with the test key instead of the
real key, we had a surprisingly high number of reports from users that
it was broken before the update even got synched to mirrors.  So we
don't have actual numbers of users running active secure boot with
Fedora, but we do know it is more than we expected.  The majority of
people who do run into issues are those running out of tree modules,
who haven't imported any sort of key for local signing.  This isn't
like SELinux was at launch where it was so invasive that a large
number of users instinctively turned it off with every installation, I
would guess even people who turned it off in the past, don't even
think about it when they get a new machine and leave it on.

> But if people really don't need DEVMEM/DEVKMEM/DEVPORT, maybe we
> should just disable them in the default configs, and consider them
> legacy.
>
> I'm just surprised. I suspect a lot of people end up actually using
> devmem as a fallback for dmidecode etc. Maybe those people don't boot
> with EFI secure mode, but if so that just shows that this whole
> "hardening" is just security theater.
>
>   Linus

Re: [PATCH 01/24] Add the ability to lock down access to the running kernel image

2018-04-12 Thread Justin Forbes

On Wed, Apr 11, 2018, 5:38 PM Linus Torvalds
 wrote:
>
> On Wed, Apr 11, 2018 at 2:05 PM, Jordan Glover
>  wrote:
> >>
> >> If that /dev/mem access prevention was just instead done as an even
> >> stricter mode of the existing CONFIG_STRICT_DEVMEM, it could just be
> >> enabled unconditionally.
> >
> > CONFIG_DEVMEM=n
>
> It's actually CONFIG_DEVMEM, CONFIG_DEVKMEM and CONFIG_DEVPORT, it's
> just not obvious from the patch.
>
> But the important part is this part:
>
> >> So I would seriously ask that the distros that have been using these
> >> patches look at which parts of lockdown they could make unconditional
> >> (because it doesn't break machines), and which ones need that escape
> >> clause.
>
> .. because I get the feeling that not a lot of people have actually
> been testing this, because "turn off secure boot" is such a universal
> thing when people boot Linux.
>
> So it's really the whole claim that distributions have been running
> for this for the last five years that I wonder about, and how often
> people end up being told: "just disable secure boot":.

Very rarely in my experience. And the one time that we sent a kernel
to updates-testing that was signed with the test key instead of the
real key, we had a surprisingly high number of reports from users that
it was broken before the update even got synched to mirrors.  So we
don't have actual numbers of users running active secure boot with
Fedora, but we do know it is more than we expected.  The majority of
people who do run into issues are those running out of tree modules,
who haven't imported any sort of key for local signing.  This isn't
like SELinux was at launch where it was so invasive that a large
number of users instinctively turned it off with every installation, I
would guess even people who turned it off in the past, don't even
think about it when they get a new machine and leave it on.

> But if people really don't need DEVMEM/DEVKMEM/DEVPORT, maybe we
> should just disable them in the default configs, and consider them
> legacy.
>
> I'm just surprised. I suspect a lot of people end up actually using
> devmem as a fallback for dmidecode etc. Maybe those people don't boot
> with EFI secure mode, but if so that just shows that this whole
> "hardening" is just security theater.
>
>   Linus

Re: [PATCH 01/24] Add the ability to lock down access to the running kernel image

2018-04-11 Thread Justin Forbes

On Wed, Apr 11, 2018 at 1:09 PM, Linus Torvalds
 wrote:
> On Wed, Apr 11, 2018 at 9:24 AM, David Howells  wrote:
>> Provide a single call to allow kernel code to determine whether the system
>> should be locked down, thereby disallowing various accesses that might
>> allow the running kernel image to be changed, including:
>>
>>  - /dev/mem and similar
>>  - Loading of unauthorised modules
>>  - Fiddling with MSR registers
>>  - Suspend to disk managed by the kernel
>>  - Use of device DMA
>
> So what I stlll absolutely detest about  this series is that I think
> many of these things should simply be done as separate config options.
>
> For example, if the distro is sure that it doesn't need /dev/mem, then
> why the hell is  this tied to "lockdown" that then may have to be
> disabled because *other* changes may not be acceptable (eg people may
> need that device DMA, or whatever).
>
> If that /dev/mem access prevention was just instead done as an even
> stricter mode of the existing CONFIG_STRICT_DEVMEM, it could just be
> enabled unconditionally.
>
> So none of these patches raise my hackles per se. But what continues
> to makes me very very uncomfortable is how this is all tied together.
>
> Why is this one magical mode that then - because it has such a big
> impact - has to be enabled/disabled as a single magical mode and with
> very odd rules?
>
> I think a lot of people would be happier if this wasn't so incestuous
> and mixing together independent things under one name, and one flag.
>
> I think a lot of the secure boot problems were exacerbated by that mixup.
>
> So I would seriously ask that the distros that have been using these
> patches look at which parts of lockdown they could make unconditional
> (because it doesn't break machines), and which ones need that escape
> clause.
>

Optionally, it might make sense to add separate config options for
each of these pieces which can be unconditionally enabled, and a
separate option for secure boot which selects all of them? As much as
I hate select, it might make sense here.  Of course the flip side to
that, is users no longer have one big switch "turn off secure boot"
which turns it all off in case of trouble.

Justin

Re: [PATCH 01/24] Add the ability to lock down access to the running kernel image

2018-04-11 Thread Justin Forbes

On Wed, Apr 11, 2018 at 1:09 PM, Linus Torvalds
 wrote:
> On Wed, Apr 11, 2018 at 9:24 AM, David Howells  wrote:
>> Provide a single call to allow kernel code to determine whether the system
>> should be locked down, thereby disallowing various accesses that might
>> allow the running kernel image to be changed, including:
>>
>>  - /dev/mem and similar
>>  - Loading of unauthorised modules
>>  - Fiddling with MSR registers
>>  - Suspend to disk managed by the kernel
>>  - Use of device DMA
>
> So what I stlll absolutely detest about  this series is that I think
> many of these things should simply be done as separate config options.
>
> For example, if the distro is sure that it doesn't need /dev/mem, then
> why the hell is  this tied to "lockdown" that then may have to be
> disabled because *other* changes may not be acceptable (eg people may
> need that device DMA, or whatever).
>
> If that /dev/mem access prevention was just instead done as an even
> stricter mode of the existing CONFIG_STRICT_DEVMEM, it could just be
> enabled unconditionally.
>
> So none of these patches raise my hackles per se. But what continues
> to makes me very very uncomfortable is how this is all tied together.
>
> Why is this one magical mode that then - because it has such a big
> impact - has to be enabled/disabled as a single magical mode and with
> very odd rules?
>
> I think a lot of people would be happier if this wasn't so incestuous
> and mixing together independent things under one name, and one flag.
>
> I think a lot of the secure boot problems were exacerbated by that mixup.
>
> So I would seriously ask that the distros that have been using these
> patches look at which parts of lockdown they could make unconditional
> (because it doesn't break machines), and which ones need that escape
> clause.
>

Optionally, it might make sense to add separate config options for
each of these pieces which can be unconditionally enabled, and a
separate option for secure boot which selects all of them? As much as
I hate select, it might make sense here.  Of course the flip side to
that, is users no longer have one big switch "turn off secure boot"
which turns it all off in case of trouble.

Justin

Re: [GIT PULL] Kernel lockdown for secure boot

2018-04-04 Thread Justin Forbes

On Wed, Apr 4, 2018 at 11:39 AM, Andy Lutomirski  wrote:
> On Wed, Apr 4, 2018 at 9:22 AM, Matthew Garrett  wrote:
>> On Wed, Apr 4, 2018 at 6:52 AM Theodore Y. Ts'o  wrote:
>>
>>> On Wed, Apr 04, 2018 at 02:33:37PM +0100, David Howells wrote:
>>> > Theodore Y. Ts'o  wrote:
>>> >
>>> > > Whoa.  Why doesn't lockdown prevent kexec?  Put another away, why
>>> > > isn't this a problem for people who are fearful that Linux could be
>>> > > used as part of a Windows boot virus in a Secure UEFI context?
>>> >
>>> > Lockdown mode restricts kexec to booting an authorised image (where the
>>> > authorisation may be by signature or by IMA).
>>
>>> If that's true, then Matthew's assertion that lockdown w/o secure boot
>>> is insecure goes away, no?
>>
>> If you don't have secure boot then an attacker with root can modify your
>> bootloader or kernel, and on next boot lockdown can be silently disabled.
>
> This has been rebutted over and over and over.  Secure boot is not the
> only verified boot mechanism in the world.  Other, better, much more
> auditable, and much simpler mechanisms have been around for a long,
> long time.
>
That is certainly the case, and one of the main reasons for the
secureboot patchset being split out and lockdown taking a different
name. The problem is, right now, secure boot is the only thing using
lockdown. I certainly wouldn't go through any effort to tie into it
with any other mechanism knowing that this patch set has been delayed
upstream for years. I would hope and expect that once lockdown is in
mainline, other verified boot mechanisms would leverage it as well.

>>> The fact that this Verified Boot on, lockdown off causes trouble
>>> points to a clear problem.   User owns the hardware they should have
>>> the right to defeat secureboot if they wish to.
>>
>> Which is why Shim allows you to disable validation if you prove physical
>> user presence.
>
> And that's a giant hack.  The actual feature should be that a user
> proves physical presence and thus disables lockdown *without*
> disabling verification.
>
> --Andy

Re: [GIT PULL] Kernel lockdown for secure boot

2018-04-04 Thread Justin Forbes

On Wed, Apr 4, 2018 at 11:39 AM, Andy Lutomirski  wrote:
> On Wed, Apr 4, 2018 at 9:22 AM, Matthew Garrett  wrote:
>> On Wed, Apr 4, 2018 at 6:52 AM Theodore Y. Ts'o  wrote:
>>
>>> On Wed, Apr 04, 2018 at 02:33:37PM +0100, David Howells wrote:
>>> > Theodore Y. Ts'o  wrote:
>>> >
>>> > > Whoa.  Why doesn't lockdown prevent kexec?  Put another away, why
>>> > > isn't this a problem for people who are fearful that Linux could be
>>> > > used as part of a Windows boot virus in a Secure UEFI context?
>>> >
>>> > Lockdown mode restricts kexec to booting an authorised image (where the
>>> > authorisation may be by signature or by IMA).
>>
>>> If that's true, then Matthew's assertion that lockdown w/o secure boot
>>> is insecure goes away, no?
>>
>> If you don't have secure boot then an attacker with root can modify your
>> bootloader or kernel, and on next boot lockdown can be silently disabled.
>
> This has been rebutted over and over and over.  Secure boot is not the
> only verified boot mechanism in the world.  Other, better, much more
> auditable, and much simpler mechanisms have been around for a long,
> long time.
>
That is certainly the case, and one of the main reasons for the
secureboot patchset being split out and lockdown taking a different
name. The problem is, right now, secure boot is the only thing using
lockdown. I certainly wouldn't go through any effort to tie into it
with any other mechanism knowing that this patch set has been delayed
upstream for years. I would hope and expect that once lockdown is in
mainline, other verified boot mechanisms would leverage it as well.

>>> The fact that this Verified Boot on, lockdown off causes trouble
>>> points to a clear problem.   User owns the hardware they should have
>>> the right to defeat secureboot if they wish to.
>>
>> Which is why Shim allows you to disable validation if you prove physical
>> user presence.
>
> And that's a giant hack.  The actual feature should be that a user
> proves physical presence and thus disables lockdown *without*
> disabling verification.
>
> --Andy

Re: [GIT PULL] Kernel lockdown for secure boot

2018-04-03 Thread Justin Forbes

On Tue, Apr 3, 2018 at 7:56 PM, Linus Torvalds
 wrote:
> On Tue, Apr 3, 2018 at 5:46 PM, Matthew Garrett  wrote:
>>
>> The generic distros have been shipping this policy for the past 5 years.
>
> .. so apparently it doesn't actually break things? Why not enable it
> by default then?
>
> And if "turn off secure boot" really is the accepted - and actuially
> used - workaround for the breakage, then
>

While there is very little breakage in the *years* we have been
shipping this in distro kernels, the accepted and used workaround has
always been "turn off secure boot" or sign/import your own keys,
depending on the problems encountered.

>WHY THE HELL DIDN'T YOU START OFF BY EXPLAINING THAT IN THE FIRST
> PLACE WHEN PEOPLE ASKED WHY THE TIE-IN EXISTED?
>
> Sorry for shouting, but really. We have a thread of just *how* many
> email messages that asked for the explanation for this? All we got was
> incomprehensible and illogical crap explanations.
>
> If there actually was a good explanation for the tie-in, it should
> have been front-and-center and explained as such.
>

Honestly, yes, the major distros have been shipping this patch set for
years now, and every time it comes to upstream, the same damn
arguments emerge.  I do not disagree that there are uses for lockdown
outside of secure boot, provided you have some other mechanism to
verify your chain, I believe chrome OS does. But the tie to secure
boot is because that is the use case that users have been using for
years, it was discussed at kernel summit quite a while ago, plans went
forward there seemed to be agreement, and when it comes time for a
pull request, people come out of the woodwork with an expectation that
it solves every problem or it doesn't need to exist. What is here is a
good starting point. I would expect that if it were merged, others
would build upon that and use much of the code already in place to
extend it. It is tied to secure boot because that is what has been
using this for years as it never seems to get upstream.  I am sure
that once it does finally land, it can and will be extended to other
things, but I don't think I would want to spend a lot of time trying
to leverage another external patch set that has been delayed upstream
so many times until it actually did land.
As for the ties to MS that come up every time, and have here as well,
there is no requirement on the MS signature. You can import your own
keys if you don't want them involved, I keep a "test key" imported for
actually running what I build locally.

Re: [GIT PULL] Kernel lockdown for secure boot

2018-04-03 Thread Justin Forbes

On Tue, Apr 3, 2018 at 7:56 PM, Linus Torvalds
 wrote:
> On Tue, Apr 3, 2018 at 5:46 PM, Matthew Garrett  wrote:
>>
>> The generic distros have been shipping this policy for the past 5 years.
>
> .. so apparently it doesn't actually break things? Why not enable it
> by default then?
>
> And if "turn off secure boot" really is the accepted - and actuially
> used - workaround for the breakage, then
>

While there is very little breakage in the *years* we have been
shipping this in distro kernels, the accepted and used workaround has
always been "turn off secure boot" or sign/import your own keys,
depending on the problems encountered.

>WHY THE HELL DIDN'T YOU START OFF BY EXPLAINING THAT IN THE FIRST
> PLACE WHEN PEOPLE ASKED WHY THE TIE-IN EXISTED?
>
> Sorry for shouting, but really. We have a thread of just *how* many
> email messages that asked for the explanation for this? All we got was
> incomprehensible and illogical crap explanations.
>
> If there actually was a good explanation for the tie-in, it should
> have been front-and-center and explained as such.
>

Honestly, yes, the major distros have been shipping this patch set for
years now, and every time it comes to upstream, the same damn
arguments emerge.  I do not disagree that there are uses for lockdown
outside of secure boot, provided you have some other mechanism to
verify your chain, I believe chrome OS does. But the tie to secure
boot is because that is the use case that users have been using for
years, it was discussed at kernel summit quite a while ago, plans went
forward there seemed to be agreement, and when it comes time for a
pull request, people come out of the woodwork with an expectation that
it solves every problem or it doesn't need to exist. What is here is a
good starting point. I would expect that if it were merged, others
would build upon that and use much of the code already in place to
extend it. It is tied to secure boot because that is what has been
using this for years as it never seems to get upstream.  I am sure
that once it does finally land, it can and will be extended to other
things, but I don't think I would want to spend a lot of time trying
to leverage another external patch set that has been delayed upstream
so many times until it actually did land.
As for the ties to MS that come up every time, and have here as well,
there is no requirement on the MS signature. You can import your own
keys if you don't want them involved, I keep a "test key" imported for
actually running what I build locally.

Re: [patch RFC 5/5] x86/speculation: Add basic speculation control code

2018-01-09 Thread Justin Forbes

On Tue, Jan 9, 2018 at 8:02 PM, Dave Hansen  wrote:
> On 01/09/2018 05:06 PM, Thomas Gleixner wrote:
>> --- a/arch/x86/kernel/cpu/bugs.c
>> +++ b/arch/x86/kernel/cpu/bugs.c
>> @@ -79,6 +79,7 @@ enum spectre_v2_mitigation_cmd {
>>   SPECTRE_V2_CMD_RETPOLINE,
>>   SPECTRE_V2_CMD_RETPOLINE_GENERIC,
>>   SPECTRE_V2_CMD_RETPOLINE_AMD,
>> + SPECTRE_V2_CMD_IBRS,
>>  };
>
> A few nits on this:
>
> IBRS should not default on anywhere, which goes double when retpolines
> are available.
>
> I think I'd also prefer that we separate the IBRS and retpoline enabling
> so that you can do both if you want.  They do nearly the same thing in
> practice, but I can't convince myself that you never ever need IBRS once
> retpolines are in place.

Fairly strong agreement here. IBRS being separately configurable gives
us an option for the paranoid, and allows distros to ship with it off
by default.

Re: [patch RFC 5/5] x86/speculation: Add basic speculation control code

2018-01-09 Thread Justin Forbes

On Tue, Jan 9, 2018 at 8:02 PM, Dave Hansen  wrote:
> On 01/09/2018 05:06 PM, Thomas Gleixner wrote:
>> --- a/arch/x86/kernel/cpu/bugs.c
>> +++ b/arch/x86/kernel/cpu/bugs.c
>> @@ -79,6 +79,7 @@ enum spectre_v2_mitigation_cmd {
>>   SPECTRE_V2_CMD_RETPOLINE,
>>   SPECTRE_V2_CMD_RETPOLINE_GENERIC,
>>   SPECTRE_V2_CMD_RETPOLINE_AMD,
>> + SPECTRE_V2_CMD_IBRS,
>>  };
>
> A few nits on this:
>
> IBRS should not default on anywhere, which goes double when retpolines
> are available.
>
> I think I'd also prefer that we separate the IBRS and retpoline enabling
> so that you can do both if you want.  They do nearly the same thing in
> practice, but I can't convince myself that you never ever need IBRS once
> retpolines are in place.

Fairly strong agreement here. IBRS being separately configurable gives
us an option for the paranoid, and allows distros to ship with it off
by default.

Re: [PATCH v3 11/13] retpoline/taint: Taint kernel for missing retpoline in compiler

2018-01-04 Thread Justin Forbes

On Thu, Jan 4, 2018 at 8:37 AM, David Woodhouse  wrote:
> From: Andi Kleen 
>
> When the kernel or a module hasn't been compiled with a retpoline
> aware compiler, print a warning and set a taint flag.
>
> For modules it is checked at compile time, however it cannot
> check assembler or other non compiled objects used in the module link.
>
> Due to lack of better letter it uses taint option 'Z'
>

Is taint really the right thing to do here? Why not just do pr_info?

Re: [PATCH v3 11/13] retpoline/taint: Taint kernel for missing retpoline in compiler

2018-01-04 Thread Justin Forbes

On Thu, Jan 4, 2018 at 8:37 AM, David Woodhouse  wrote:
> From: Andi Kleen 
>
> When the kernel or a module hasn't been compiled with a retpoline
> aware compiler, print a warning and set a taint flag.
>
> For modules it is checked at compile time, however it cannot
> check assembler or other non compiled objects used in the module link.
>
> Due to lack of better letter it uses taint option 'Z'
>

Is taint really the right thing to do here? Why not just do pr_info?

Re: [PATCH 0/7] IBRS patch series

2018-01-04 Thread Justin Forbes

On Thu, Jan 4, 2018 at 11:56 AM, Tim Chen  wrote:
> This patch series enables the basic detection and usage of x86 indirect
> branch speculation feature.  It enables the indirect branch restricted
> speculation (IBRS) on kernel entry and disables it on exit.
> It enumerates the indirect branch prediction barrier (IBPB).
>
> The x86 IBRS feature requires corresponding microcode support.
> It mitigates the variant 2 vulnerability described in
> https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
>

Are there plans to make the corresponding microcode support available?

Re: [PATCH 0/7] IBRS patch series

2018-01-04 Thread Justin Forbes

On Thu, Jan 4, 2018 at 11:56 AM, Tim Chen  wrote:
> This patch series enables the basic detection and usage of x86 indirect
> branch speculation feature.  It enables the indirect branch restricted
> speculation (IBRS) on kernel entry and disables it on exit.
> It enumerates the indirect branch prediction barrier (IBPB).
>
> The x86 IBRS feature requires corresponding microcode support.
> It mitigates the variant 2 vulnerability described in
> https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
>

Are there plans to make the corresponding microcode support available?

Re: [PATCH] powerpc: fix distclean with Makefile.postlink

2017-05-08 Thread Justin Forbes

On Mon, May 8, 2017 at 8:50 AM, Horia Geantă  wrote:
> On 5/8/2017 2:57 PM, Michael Ellerman wrote:
>> Horia Geantă  writes:
>>
>>> Makefile.postlink always includes include/config/auto.conf, however
>>> this file is not present in a clean kernel tree, causing make to fail:
>>>
>>> arch/powerpc/Makefile.postlink:10: include/config/auto.conf: No such file 
>>> or directory
>>> make[1]: *** No rule to make target `include/config/auto.conf'.  Stop.
>>> make: *** [vmlinuxclean] Error 2
>>>
>>> Change the inclusion such that file not being found does not trigger
>>> an error.
>>>
>>> Fixes: f188d0524d7e ("powerpc: Use the new post-link pass to check 
>>> relocations")
>>
>> I can't reproduce this. What exact steps are you doing? And what version
>> of Make?
>>
> Start with a clean kernel tree and then
> make distclean
> arch/powerpc/Makefile.postlink:10: include/config/auto.conf: No such
> file or directory
> make[1]: *** No rule to make target `include/config/auto.conf'.  Stop.
> make: *** [vmlinuxclean] Error 2
>
> make --version
> GNU Make 3.82
> Built for x86_64-redhat-linux-gnu
> Copyright (C) 2010  Free Software Foundation, Inc.
> [...]
>
> The fix is basically the same as:
> 6e5b95cdbd0e MIPS: Fix distclean with Makefile.postlink
>

Noticed it on the Fedora builds as well today.  This patch fixes it.

Tested-by: Justin M. Forbes

Re: [PATCH] powerpc: fix distclean with Makefile.postlink

2017-05-08 Thread Justin Forbes

On Mon, May 8, 2017 at 8:50 AM, Horia Geantă  wrote:
> On 5/8/2017 2:57 PM, Michael Ellerman wrote:
>> Horia Geantă  writes:
>>
>>> Makefile.postlink always includes include/config/auto.conf, however
>>> this file is not present in a clean kernel tree, causing make to fail:
>>>
>>> arch/powerpc/Makefile.postlink:10: include/config/auto.conf: No such file 
>>> or directory
>>> make[1]: *** No rule to make target `include/config/auto.conf'.  Stop.
>>> make: *** [vmlinuxclean] Error 2
>>>
>>> Change the inclusion such that file not being found does not trigger
>>> an error.
>>>
>>> Fixes: f188d0524d7e ("powerpc: Use the new post-link pass to check 
>>> relocations")
>>
>> I can't reproduce this. What exact steps are you doing? And what version
>> of Make?
>>
> Start with a clean kernel tree and then
> make distclean
> arch/powerpc/Makefile.postlink:10: include/config/auto.conf: No such
> file or directory
> make[1]: *** No rule to make target `include/config/auto.conf'.  Stop.
> make: *** [vmlinuxclean] Error 2
>
> make --version
> GNU Make 3.82
> Built for x86_64-redhat-linux-gnu
> Copyright (C) 2010  Free Software Foundation, Inc.
> [...]
>
> The fix is basically the same as:
> 6e5b95cdbd0e MIPS: Fix distclean with Makefile.postlink
>

Noticed it on the Fedora builds as well today.  This patch fixes it.

Tested-by: Justin M. Forbes

Re: [PATCH 00/24] Kernel lockdown

2017-04-07 Thread Justin Forbes

On Wed, Apr 5, 2017 at 12:07 PM, David Howells <dhowe...@redhat.com> wrote:
>
> These patches provide a facility by which a variety of avenues by which
> userspace can feasibly modify the running kernel image can be locked down.
> These include:
>
>  (*) No unsigned modules and no modules for which can't validate the
>  signature.
>
>  (*) No use of ioperm(), iopl() and no writing to /dev/port.
>
>  (*) No writing to /dev/mem or /dev/kmem.
>
>  (*) No hibernation.
>
>  (*) Restrict PCI BAR access.
>
>  (*) Restrict MSR access.
>
>  (*) No kexec_load().
>
>  (*) Certain ACPI restrictions.
>
>  (*) Restrict debugfs interface to ASUS WMI.
>
> The lock-down can be configured to be triggered by the EFI secure boot
> status, provided the shim isn't insecure.  The lock-down can be lifted by
> typing SysRq+x on a keyboard attached to the system.
>
>
> The patches can be found here also:
>
> 
> http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=efi-lockdown
>
> They are dependent on the hwparam branch, which I posted separately.
>
> David
> ---
> Dave Young (1):
>   Copy secure_boot flag in boot params across kexec reboot
>
> David Howells (7):
>   Add the ability to lock down access to the running kernel image
>   efi: Lock down the kernel if booted in secure boot mode
>   Enforce module signatures if the kernel is locked down
>   scsi: Lock down the eata driver
>   Prohibit PCMCIA CIS storage when the kernel is locked down
>   Lock down TIOCSSERIAL
>   Lock down module params that specify hardware parameters (eg. ioport)
>
> Josh Boyer (3):
>   efi: Add EFI_SECURE_BOOT bit
>   hibernate: Disable when the kernel is locked down
>   acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
>
> Kyle McMartin (1):
>   Add a sysrq option to exit secure boot mode
>
> Lee, Chun-Yi (2):
>   kexec_file: Disable at runtime if securelevel has been set
>   bpf: Restrict kernel image access functions when the kernel is locked 
> down
>
> Linn Crosetto (2):
>   acpi: Disable ACPI table override if the kernel is locked down
>   acpi: Disable APEI error injection if the kernel is locked down
>
> Matthew Garrett (8):
>   Restrict /dev/mem and /dev/kmem when the kernel is locked down
>   kexec: Disable at runtime if the kernel is locked down
>   uswsusp: Disable when the kernel is locked down
>   PCI: Lock down BAR access when the kernel is locked down
>   x86: Lock down IO port access when the kernel is locked down
>   x86: Restrict MSR access when the kernel is locked down
>   asus-wmi: Restrict debugfs interface when the kernel is locked down
>   ACPI: Limit access to custom_method when the kernel is locked down
>
>
>  arch/x86/Kconfig  |   22 
>  arch/x86/kernel/ioport.c  |4 ++--
>  arch/x86/kernel/kexec-bzimage64.c |1 +
>  arch/x86/kernel/msr.c |7 ++
>  arch/x86/kernel/setup.c   |   40 
> -
>  drivers/acpi/apei/einj.c  |3 +++
>  drivers/acpi/custom_method.c  |3 +++
>  drivers/acpi/osl.c|2 +-
>  drivers/acpi/tables.c |5 +
>  drivers/char/mem.c|8 +++
>  drivers/input/misc/uinput.c   |1 +
>  drivers/pci/pci-sysfs.c   |9 
>  drivers/pci/proc.c|8 ++-
>  drivers/pci/syscall.c |2 +-
>  drivers/pcmcia/cistpl.c   |5 +
>  drivers/platform/x86/asus-wmi.c   |9 
>  drivers/scsi/eata.c   |7 ++
>  drivers/tty/serial/serial_core.c  |6 ++
>  drivers/tty/sysrq.c   |   19 --
>  include/linux/efi.h   |1 +
>  include/linux/input.h |5 +
>  include/linux/kernel.h|9 
>  include/linux/security.h  |   11 ++
>  include/linux/sysrq.h |8 ++-
>  kernel/debug/kdb/kdb_main.c   |2 +-
>  kernel/kexec.c|7 ++
>  kernel/kexec_file.c   |6 ++
>  kernel/module.c   |2 +-
>  kernel/params.c   |   27 -
>  kernel/power/hibernate.c  |2 +-
>  kernel/power/user.c   |3 +++
>  kernel/trace/bpf_trace.c      |   11 ++
>  security/Kconfig  |   15 ++
>  security/Makefile |3 +++
>  security/lock_down.c  |   40 
> +
>  35 files changed, 291 insertions(+), 22 deletions(-)
>  create mode 100644 security/lock_down.c
>

Tested-by: Justin Forbes <jfor...@fedoraproject.org>

Re: [PATCH 00/24] Kernel lockdown

2017-04-07 Thread Justin Forbes

On Wed, Apr 5, 2017 at 12:07 PM, David Howells  wrote:
>
> These patches provide a facility by which a variety of avenues by which
> userspace can feasibly modify the running kernel image can be locked down.
> These include:
>
>  (*) No unsigned modules and no modules for which can't validate the
>  signature.
>
>  (*) No use of ioperm(), iopl() and no writing to /dev/port.
>
>  (*) No writing to /dev/mem or /dev/kmem.
>
>  (*) No hibernation.
>
>  (*) Restrict PCI BAR access.
>
>  (*) Restrict MSR access.
>
>  (*) No kexec_load().
>
>  (*) Certain ACPI restrictions.
>
>  (*) Restrict debugfs interface to ASUS WMI.
>
> The lock-down can be configured to be triggered by the EFI secure boot
> status, provided the shim isn't insecure.  The lock-down can be lifted by
> typing SysRq+x on a keyboard attached to the system.
>
>
> The patches can be found here also:
>
> 
> http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=efi-lockdown
>
> They are dependent on the hwparam branch, which I posted separately.
>
> David
> ---
> Dave Young (1):
>   Copy secure_boot flag in boot params across kexec reboot
>
> David Howells (7):
>   Add the ability to lock down access to the running kernel image
>   efi: Lock down the kernel if booted in secure boot mode
>   Enforce module signatures if the kernel is locked down
>   scsi: Lock down the eata driver
>   Prohibit PCMCIA CIS storage when the kernel is locked down
>   Lock down TIOCSSERIAL
>   Lock down module params that specify hardware parameters (eg. ioport)
>
> Josh Boyer (3):
>   efi: Add EFI_SECURE_BOOT bit
>   hibernate: Disable when the kernel is locked down
>   acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
>
> Kyle McMartin (1):
>   Add a sysrq option to exit secure boot mode
>
> Lee, Chun-Yi (2):
>   kexec_file: Disable at runtime if securelevel has been set
>   bpf: Restrict kernel image access functions when the kernel is locked 
> down
>
> Linn Crosetto (2):
>   acpi: Disable ACPI table override if the kernel is locked down
>   acpi: Disable APEI error injection if the kernel is locked down
>
> Matthew Garrett (8):
>   Restrict /dev/mem and /dev/kmem when the kernel is locked down
>   kexec: Disable at runtime if the kernel is locked down
>   uswsusp: Disable when the kernel is locked down
>   PCI: Lock down BAR access when the kernel is locked down
>   x86: Lock down IO port access when the kernel is locked down
>   x86: Restrict MSR access when the kernel is locked down
>   asus-wmi: Restrict debugfs interface when the kernel is locked down
>   ACPI: Limit access to custom_method when the kernel is locked down
>
>
>  arch/x86/Kconfig  |   22 
>  arch/x86/kernel/ioport.c  |4 ++--
>  arch/x86/kernel/kexec-bzimage64.c |1 +
>  arch/x86/kernel/msr.c |7 ++
>  arch/x86/kernel/setup.c   |   40 
> -
>  drivers/acpi/apei/einj.c  |3 +++
>  drivers/acpi/custom_method.c  |3 +++
>  drivers/acpi/osl.c|2 +-
>  drivers/acpi/tables.c |5 +
>  drivers/char/mem.c|8 +++
>  drivers/input/misc/uinput.c   |1 +
>  drivers/pci/pci-sysfs.c   |9 
>  drivers/pci/proc.c|8 ++-
>  drivers/pci/syscall.c |2 +-
>  drivers/pcmcia/cistpl.c   |5 +
>  drivers/platform/x86/asus-wmi.c   |9 
>  drivers/scsi/eata.c   |7 ++
>  drivers/tty/serial/serial_core.c  |6 ++
>  drivers/tty/sysrq.c   |   19 --
>  include/linux/efi.h   |1 +
>  include/linux/input.h |5 +
>  include/linux/kernel.h|9 
>  include/linux/security.h  |   11 ++
>  include/linux/sysrq.h |8 ++-
>  kernel/debug/kdb/kdb_main.c   |2 +-
>  kernel/kexec.c|7 ++
>  kernel/kexec_file.c   |6 ++
>  kernel/module.c   |2 +-
>  kernel/params.c   |   27 -
>  kernel/power/hibernate.c  |2 +-
>  kernel/power/user.c   |3 +++
>  kernel/trace/bpf_trace.c      |   11 ++
>  security/Kconfig  |   15 ++
>  security/Makefile |3 +++
>  security/lock_down.c  |   40 
> +
>  35 files changed, 291 insertions(+), 22 deletions(-)
>  create mode 100644 security/lock_down.c
>

Tested-by: Justin Forbes

Re: [PATCH 00/24] Kernel lockdown

2017-04-07 Thread Justin Forbes

On Fri, Apr 7, 2017 at 10:59 AM, Austin S. Hemmelgarn
 wrote:
> On 2017-04-05 16:14, David Howells wrote:
>>
>>
>> These patches provide a facility by which a variety of avenues by which
>> userspace can feasibly modify the running kernel image can be locked down.
>> These include:
>>
>>  (*) No unsigned modules and no modules for which can't validate the
>>  signature.
>>
>>  (*) No use of ioperm(), iopl() and no writing to /dev/port.
>>
>>  (*) No writing to /dev/mem or /dev/kmem.
>>
>>  (*) No hibernation.
>>
>>  (*) Restrict PCI BAR access.
>>
>>  (*) Restrict MSR access.
>>
>>  (*) No kexec_load().
>>
>>  (*) Certain ACPI restrictions.
>>
>>  (*) Restrict debugfs interface to ASUS WMI.
>>
>> The lock-down can be configured to be triggered by the EFI secure boot
>> status, provided the shim isn't insecure.  The lock-down can be lifted by
>> typing SysRq+x on a keyboard attached to the system.
>
> This has already been mentioned both in response to previous versions of
> this patch set, and by at least 2 people in response to a specific patch in
> this posting, but for any kind of proper security analysis, you need to
> better clarify your threat model.  'Prevent modification to the running
> kernel image' is a decent start on this, but at least some of the patches
> don't explain very well _how_ what you're disabling could be used to modify
> the running kernel image.  Clarifying how something is a threat will help
> with verifying that you're correctly blocking the threat.

It is more than just preventing modification to the running kernel
image.  The idea is that everything is verified, from UEFI through the
bootloader, and into the kernel.

> Furthermore, why is the only way to enable this to boot in UEFI Secure Boot
> mode?  Almost all of the hardening done here has general utility in
> hardening regular systems, and as such I'd suggest adding a command line
> option to enable kernel lock-down (which would greatly simplify testing),
> and a kconfig option to enforce it at build-time.

The problem is, if the hand off doesn't happen from a secure firmware,
there is no guarantee the system has not been compromised. UEFI Secure
Boot mode attempts to give that promise, and an appropriate hand off.
That doesn't mean that there is no value in turning some of this on,
it is just of more limited effectiveness.

> In addition to all that, it would be nice to be able to disable all of the
> following at build time independent of the kernel lock-down state
> * The acpi_rsdp kernel parameter (I could easily see many distros building
> kernels with this disabled, it's insanely use-case specific).
> * IO port and resource reservation module parameters (this would actually be
> easier than having runtime blacklisting, and I could also easily see this
> being turned on by default by a number of distros).
> * TOICSERIAL (this one is more likely than the above two to break systems).
>
> And these would probably be useful as lockable sysctls that would be
> automatically locked disabled when the kernel is locked down:
> * ioperm/iopl (these can technically be blocked by seccomp or other means,
> but that is non-trivial to do).
> * Most of the other ACPI stuff (some of this is useful for troubleshooting,
> but is not normally used during regular operation).
> * PCI BAR access.

There are more patches to do some of these things.   Baby steps.

Re: [PATCH 00/24] Kernel lockdown

2017-04-07 Thread Justin Forbes

On Fri, Apr 7, 2017 at 10:59 AM, Austin S. Hemmelgarn
 wrote:
> On 2017-04-05 16:14, David Howells wrote:
>>
>>
>> These patches provide a facility by which a variety of avenues by which
>> userspace can feasibly modify the running kernel image can be locked down.
>> These include:
>>
>>  (*) No unsigned modules and no modules for which can't validate the
>>  signature.
>>
>>  (*) No use of ioperm(), iopl() and no writing to /dev/port.
>>
>>  (*) No writing to /dev/mem or /dev/kmem.
>>
>>  (*) No hibernation.
>>
>>  (*) Restrict PCI BAR access.
>>
>>  (*) Restrict MSR access.
>>
>>  (*) No kexec_load().
>>
>>  (*) Certain ACPI restrictions.
>>
>>  (*) Restrict debugfs interface to ASUS WMI.
>>
>> The lock-down can be configured to be triggered by the EFI secure boot
>> status, provided the shim isn't insecure.  The lock-down can be lifted by
>> typing SysRq+x on a keyboard attached to the system.
>
> This has already been mentioned both in response to previous versions of
> this patch set, and by at least 2 people in response to a specific patch in
> this posting, but for any kind of proper security analysis, you need to
> better clarify your threat model.  'Prevent modification to the running
> kernel image' is a decent start on this, but at least some of the patches
> don't explain very well _how_ what you're disabling could be used to modify
> the running kernel image.  Clarifying how something is a threat will help
> with verifying that you're correctly blocking the threat.

It is more than just preventing modification to the running kernel
image.  The idea is that everything is verified, from UEFI through the
bootloader, and into the kernel.

> Furthermore, why is the only way to enable this to boot in UEFI Secure Boot
> mode?  Almost all of the hardening done here has general utility in
> hardening regular systems, and as such I'd suggest adding a command line
> option to enable kernel lock-down (which would greatly simplify testing),
> and a kconfig option to enforce it at build-time.

The problem is, if the hand off doesn't happen from a secure firmware,
there is no guarantee the system has not been compromised. UEFI Secure
Boot mode attempts to give that promise, and an appropriate hand off.
That doesn't mean that there is no value in turning some of this on,
it is just of more limited effectiveness.

> In addition to all that, it would be nice to be able to disable all of the
> following at build time independent of the kernel lock-down state
> * The acpi_rsdp kernel parameter (I could easily see many distros building
> kernels with this disabled, it's insanely use-case specific).
> * IO port and resource reservation module parameters (this would actually be
> easier than having runtime blacklisting, and I could also easily see this
> being turned on by default by a number of distros).
> * TOICSERIAL (this one is more likely than the above two to break systems).
>
> And these would probably be useful as lockable sysctls that would be
> automatically locked disabled when the kernel is locked down:
> * ioperm/iopl (these can technically be blocked by seccomp or other means,
> but that is non-trivial to do).
> * Most of the other ACPI stuff (some of this is useful for troubleshooting,
> but is not normally used during regular operation).
> * PCI BAR access.

There are more patches to do some of these things.   Baby steps.

Re: [PATCH 00/16] Kernel lockdown

2016-11-16 Thread Justin Forbes

On Wed, Nov 16, 2016 at 3:47 PM, David Howells  wrote:
>
> These patches provide a facility by which a variety of avenues by which
> userspace can feasibly modify the running kernel image can be locked down.
> These include:
>

Bit surprised to see this.  Not that I am opposed to the patches
themselves.  These were pulled into my tree as the first step towards
consolidating the implementation used for secure boot, and I know
there is interest in using large parts outside of a secure boot
context as well, but there were a few changes to be made after our
discussions in Santa Fe. Those are going into
http://git.kernel.org/cgit/linux/kernel/git/jforbes/linux.git/log/?h=lockdown
I am completely happy to submit those changes as separate patches if
people want to take these.  They do actually work, and are being
shipped and supported by multiple distributions at this point.

Justin

Re: [PATCH 00/16] Kernel lockdown

2016-11-16 Thread Justin Forbes

On Wed, Nov 16, 2016 at 3:47 PM, David Howells  wrote:
>
> These patches provide a facility by which a variety of avenues by which
> userspace can feasibly modify the running kernel image can be locked down.
> These include:
>

Bit surprised to see this.  Not that I am opposed to the patches
themselves.  These were pulled into my tree as the first step towards
consolidating the implementation used for secure boot, and I know
there is interest in using large parts outside of a secure boot
context as well, but there were a few changes to be made after our
discussions in Santa Fe. Those are going into
http://git.kernel.org/cgit/linux/kernel/git/jforbes/linux.git/log/?h=lockdown
I am completely happy to submit those changes as separate patches if
people want to take these.  They do actually work, and are being
shipped and supported by multiple distributions at this point.

Justin

51 matches

Mail list logo