Re: [PATCH v30 07/12] landlock: Support filesystem access-control

2021-03-23 Thread Jann Horn
On Tue, Mar 23, 2021 at 8:22 PM Mickaël Salaün  wrote:
> On 23/03/2021 18:49, Jann Horn wrote:
> > On Tue, Mar 23, 2021 at 4:54 PM Mickaël Salaün  wrote:
> >> On 23/03/2021 01:13, Jann Horn wrote:
> >>>  On Tue, Mar 16, 2021 at 9:43 PM Mickaël Salaün  wrote:
>  Using Landlock objects and ruleset, it is possible to tag inodes
>  according to a process's domain.
> >>> [...]
>  +static void release_inode(struct landlock_object *const object)
>  +   __releases(object->lock)
>  +{
>  +   struct inode *const inode = object->underobj;
>  +   struct super_block *sb;
>  +
>  +   if (!inode) {
>  +   spin_unlock(>lock);
>  +   return;
>  +   }
>  +
>  +   /*
>  +* Protects against concurrent use by hook_sb_delete() of the 
>  reference
>  +* to the underlying inode.
>  +*/
>  +   object->underobj = NULL;
>  +   /*
>  +* Makes sure that if the filesystem is concurrently unmounted,
>  +* hook_sb_delete() will wait for us to finish iput().
>  +*/
>  +   sb = inode->i_sb;
>  +   atomic_long_inc(_superblock(sb)->inode_refs);
>  +   spin_unlock(>lock);
>  +   /*
>  +* Because object->underobj was not NULL, hook_sb_delete() and
>  +* get_inode_object() guarantee that it is safe to reset
>  +* landlock_inode(inode)->object while it is not NULL.  It is 
>  therefore
>  +* not necessary to lock inode->i_lock.
>  +*/
>  +   rcu_assign_pointer(landlock_inode(inode)->object, NULL);
>  +   /*
>  +* Now, new rules can safely be tied to @inode with 
>  get_inode_object().
>  +*/
>  +
>  +   iput(inode);
>  +   if 
>  (atomic_long_dec_and_test(_superblock(sb)->inode_refs))
>  +   wake_up_var(_superblock(sb)->inode_refs);
>  +}
> >>> [...]
>  +static struct landlock_object *get_inode_object(struct inode *const 
>  inode)
>  +{
>  +   struct landlock_object *object, *new_object;
>  +   struct landlock_inode_security *inode_sec = 
>  landlock_inode(inode);
>  +
>  +   rcu_read_lock();
>  +retry:
>  +   object = rcu_dereference(inode_sec->object);
>  +   if (object) {
>  +   if (likely(refcount_inc_not_zero(>usage))) {
>  +   rcu_read_unlock();
>  +   return object;
>  +   }
>  +   /*
>  +* We are racing with release_inode(), the object is 
>  going
>  +* away.  Wait for release_inode(), then retry.
>  +*/
>  +   spin_lock(>lock);
>  +   spin_unlock(>lock);
>  +   goto retry;
>  +   }
>  +   rcu_read_unlock();
>  +
>  +   /*
>  +* If there is no object tied to @inode, then create a new one 
>  (without
>  +* holding any locks).
>  +*/
>  +   new_object = landlock_create_object(_fs_underops, 
>  inode);
>  +   if (IS_ERR(new_object))
>  +   return new_object;
>  +
>  +   /* Protects against concurrent get_inode_object() calls. */
>  +   spin_lock(>i_lock);
>  +   object = rcu_dereference_protected(inode_sec->object,
>  +   lockdep_is_held(>i_lock));
> >>>
> >>> rcu_dereference_protected() requires that inode_sec->object is not
> >>> concurrently changed, but I think another thread could call
> >>> get_inode_object() while we're in landlock_create_object(), and then
> >>> we could race with the NULL write in release_inode() here? (It
> >>> wouldn't actually be a UAF though because we're not actually accessing
> >>> `object` here.) Or am I missing a lock that prevents this?
> >>>
> >>> In v28 this wasn't an issue because release_inode() was holding
> >>> inode->i_lock (and object->lock) during the NULL store; but in v29 and
> >>> this version the NULL store in release_inode() moved out of the locked
> >>> region. I think you could just move the NULL store in release_inode()
> >>> back up (and maybe add a comment explaining the locking rules for
> >>> landlock_inode(...)->object)?
> >>>
> >>> (Or alternatively you could use rcu_dereference_raw() with a comment
> >>> explaining that the read pointer is only used to check for NULL-ness,
> >>> and that it is guaranteed that the pointer can't change if it is NULL
> >>> and we're holding the lock. But that'd be needlessly complicated, I
> >>> think.)
> >>
> >> To reach rcu_assign_pointer(landlock_inode(inode)->object, NULL) in
> >> release_inode() or in hook_sb_delete(), the
> >> landlock_inode(inode)->object need to be non-NULL,
> >
> > Yes.
> >
> >> which implies that a
> >> call to 

Re: [PATCH v30 07/12] landlock: Support filesystem access-control

2021-03-23 Thread Mickaël Salaün


On 19/03/2021 20:19, Mickaël Salaün wrote:
> 
> On 19/03/2021 19:57, Kees Cook wrote:
>> On Tue, Mar 16, 2021 at 09:42:47PM +0100, Mickaël Salaün wrote:
>>> From: Mickaël Salaün 
>>>
>>> Using Landlock objects and ruleset, it is possible to tag inodes
>>> according to a process's domain.  To enable an unprivileged process to
>>> express a file hierarchy, it first needs to open a directory (or a file)
>>> and pass this file descriptor to the kernel through
>>> landlock_add_rule(2).  When checking if a file access request is
>>> allowed, we walk from the requested dentry to the real root, following
>>> the different mount layers.  The access to each "tagged" inodes are
>>> collected according to their rule layer level, and ANDed to create
>>> access to the requested file hierarchy.  This makes possible to identify
>>> a lot of files without tagging every inodes nor modifying the
>>> filesystem, while still following the view and understanding the user
>>> has from the filesystem.
>>>
>>> Add a new ARCH_EPHEMERAL_INODES for UML because it currently does not
>>> keep the same struct inodes for the same inodes whereas these inodes are
>>> in use.
>>>
>>> This commit adds a minimal set of supported filesystem access-control
>>> which doesn't enable to restrict all file-related actions.  This is the
>>> result of multiple discussions to minimize the code of Landlock to ease
>>> review.  Thanks to the Landlock design, extending this access-control
>>> without breaking user space will not be a problem.  Moreover, seccomp
>>> filters can be used to restrict the use of syscall families which may
>>> not be currently handled by Landlock.
>>>
>>> Cc: Al Viro 
>>> Cc: Anton Ivanov 
>>> Cc: James Morris 
>>> Cc: Jann Horn 
>>> Cc: Jeff Dike 
>>> Cc: Kees Cook 
>>> Cc: Richard Weinberger 
>>> Cc: Serge E. Hallyn 
>>> Signed-off-by: Mickaël Salaün 
>>> Link: https://lore.kernel.org/r/20210316204252.427806-8-...@digikod.net
>>> [...]
>>> +   spin_lock(>s_inode_list_lock);
>>> +   list_for_each_entry(inode, >s_inodes, i_sb_list) {
>>> +   struct landlock_object *object;
>>> +
>>> +   /* Only handles referenced inodes. */
>>> +   if (!atomic_read(>i_count))
>>> +   continue;
>>> +
>>> +   /*
>>> +* Checks I_FREEING and I_WILL_FREE  to protect against a race
>>> +* condition when release_inode() just called iput(), which
>>> +* could lead to a NULL dereference of inode->security or a
>>> +* second call to iput() for the same Landlock object.  Also
>>> +* checks I_NEW because such inode cannot be tied to an object.
>>> +*/
>>> +   spin_lock(>i_lock);
>>> +   if (inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW)) {
>>> +   spin_unlock(>i_lock);
>>> +   continue;
>>> +   }
>>
>> This (and elsewhere here) seems like a lot of inode internals getting
>> exposed. Can any of this be repurposed into helpers? I see this test
>> scattered around the kernel a fair bit:
>>
>> $ git grep I_FREEING | grep I_WILL_FREE | grep I_NEW | wc -l
>> 9
> 
> Dealing with the filesystem is complex. Some helpers could probably be
> added, but with a series dedicated to the filesystem. I can work on that
> once this series is merged.
> 
>>
>>> +static inline u32 get_mode_access(const umode_t mode)
>>> +{
>>> +   switch (mode & S_IFMT) {
>>> +   case S_IFLNK:
>>> +   return LANDLOCK_ACCESS_FS_MAKE_SYM;
>>> +   case 0:
>>> +   /* A zero mode translates to S_IFREG. */
>>> +   case S_IFREG:
>>> +   return LANDLOCK_ACCESS_FS_MAKE_REG;
>>> +   case S_IFDIR:
>>> +   return LANDLOCK_ACCESS_FS_MAKE_DIR;
>>> +   case S_IFCHR:
>>> +   return LANDLOCK_ACCESS_FS_MAKE_CHAR;
>>> +   case S_IFBLK:
>>> +   return LANDLOCK_ACCESS_FS_MAKE_BLOCK;
>>> +   case S_IFIFO:
>>> +   return LANDLOCK_ACCESS_FS_MAKE_FIFO;
>>> +   case S_IFSOCK:
>>> +   return LANDLOCK_ACCESS_FS_MAKE_SOCK;
>>> +   default:
>>> +   WARN_ON_ONCE(1);
>>> +   return 0;
>>> +   }
>>
>> I'm assuming this won't be reachable from userspace.
> 
> It should not, only a bogus kernel code could.
> 
>>
>>> [...]
>>> index a5d6ef334991..f8e8e980454c 100644
>>> --- a/security/landlock/setup.c
>>> +++ b/security/landlock/setup.c
>>> @@ -11,17 +11,24 @@
>>>  
>>>  #include "common.h"
>>>  #include "cred.h"
>>> +#include "fs.h"
>>>  #include "ptrace.h"
>>>  #include "setup.h"
>>>  
>>> +bool landlock_initialized __lsm_ro_after_init = false;
>>> +
>>>  struct lsm_blob_sizes landlock_blob_sizes __lsm_ro_after_init = {
>>> .lbs_cred = sizeof(struct landlock_cred_security),
>>> +   .lbs_inode = sizeof(struct landlock_inode_security),
>>> +   .lbs_superblock = sizeof(struct landlock_superblock_security),
>>>  };
>>>  
>>>  static int __init landlock_init(void)
>>>  {
>>> landlock_add_cred_hooks();
>>> landlock_add_ptrace_hooks();
>>> +   

Re: [PATCH v30 07/12] landlock: Support filesystem access-control

2021-03-23 Thread Mickaël Salaün


On 23/03/2021 18:49, Jann Horn wrote:
> On Tue, Mar 23, 2021 at 4:54 PM Mickaël Salaün  wrote:
>> On 23/03/2021 01:13, Jann Horn wrote:
>>>  On Tue, Mar 16, 2021 at 9:43 PM Mickaël Salaün  wrote:
 Using Landlock objects and ruleset, it is possible to tag inodes
 according to a process's domain.
>>> [...]
 +static void release_inode(struct landlock_object *const object)
 +   __releases(object->lock)
 +{
 +   struct inode *const inode = object->underobj;
 +   struct super_block *sb;
 +
 +   if (!inode) {
 +   spin_unlock(>lock);
 +   return;
 +   }
 +
 +   /*
 +* Protects against concurrent use by hook_sb_delete() of the 
 reference
 +* to the underlying inode.
 +*/
 +   object->underobj = NULL;
 +   /*
 +* Makes sure that if the filesystem is concurrently unmounted,
 +* hook_sb_delete() will wait for us to finish iput().
 +*/
 +   sb = inode->i_sb;
 +   atomic_long_inc(_superblock(sb)->inode_refs);
 +   spin_unlock(>lock);
 +   /*
 +* Because object->underobj was not NULL, hook_sb_delete() and
 +* get_inode_object() guarantee that it is safe to reset
 +* landlock_inode(inode)->object while it is not NULL.  It is 
 therefore
 +* not necessary to lock inode->i_lock.
 +*/
 +   rcu_assign_pointer(landlock_inode(inode)->object, NULL);
 +   /*
 +* Now, new rules can safely be tied to @inode with 
 get_inode_object().
 +*/
 +
 +   iput(inode);
 +   if (atomic_long_dec_and_test(_superblock(sb)->inode_refs))
 +   wake_up_var(_superblock(sb)->inode_refs);
 +}
>>> [...]
 +static struct landlock_object *get_inode_object(struct inode *const inode)
 +{
 +   struct landlock_object *object, *new_object;
 +   struct landlock_inode_security *inode_sec = landlock_inode(inode);
 +
 +   rcu_read_lock();
 +retry:
 +   object = rcu_dereference(inode_sec->object);
 +   if (object) {
 +   if (likely(refcount_inc_not_zero(>usage))) {
 +   rcu_read_unlock();
 +   return object;
 +   }
 +   /*
 +* We are racing with release_inode(), the object is going
 +* away.  Wait for release_inode(), then retry.
 +*/
 +   spin_lock(>lock);
 +   spin_unlock(>lock);
 +   goto retry;
 +   }
 +   rcu_read_unlock();
 +
 +   /*
 +* If there is no object tied to @inode, then create a new one 
 (without
 +* holding any locks).
 +*/
 +   new_object = landlock_create_object(_fs_underops, inode);
 +   if (IS_ERR(new_object))
 +   return new_object;
 +
 +   /* Protects against concurrent get_inode_object() calls. */
 +   spin_lock(>i_lock);
 +   object = rcu_dereference_protected(inode_sec->object,
 +   lockdep_is_held(>i_lock));
>>>
>>> rcu_dereference_protected() requires that inode_sec->object is not
>>> concurrently changed, but I think another thread could call
>>> get_inode_object() while we're in landlock_create_object(), and then
>>> we could race with the NULL write in release_inode() here? (It
>>> wouldn't actually be a UAF though because we're not actually accessing
>>> `object` here.) Or am I missing a lock that prevents this?
>>>
>>> In v28 this wasn't an issue because release_inode() was holding
>>> inode->i_lock (and object->lock) during the NULL store; but in v29 and
>>> this version the NULL store in release_inode() moved out of the locked
>>> region. I think you could just move the NULL store in release_inode()
>>> back up (and maybe add a comment explaining the locking rules for
>>> landlock_inode(...)->object)?
>>>
>>> (Or alternatively you could use rcu_dereference_raw() with a comment
>>> explaining that the read pointer is only used to check for NULL-ness,
>>> and that it is guaranteed that the pointer can't change if it is NULL
>>> and we're holding the lock. But that'd be needlessly complicated, I
>>> think.)
>>
>> To reach rcu_assign_pointer(landlock_inode(inode)->object, NULL) in
>> release_inode() or in hook_sb_delete(), the
>> landlock_inode(inode)->object need to be non-NULL,
> 
> Yes.
> 
>> which implies that a
>> call to get_inode_object(inode) either "retry" (because release_inode is
>> only called by landlock_put_object, which set object->usage to 0) until
>> it creates a new object, or reuses the existing referenced object (and
>> increments object->usage).
> 
> But it can be that landlock_inode(inode)->object only becomes non-NULL
> after 

Re: [PATCH v30 07/12] landlock: Support filesystem access-control

2021-03-23 Thread Jann Horn
On Tue, Mar 23, 2021 at 4:54 PM Mickaël Salaün  wrote:
> On 23/03/2021 01:13, Jann Horn wrote:
> >  On Tue, Mar 16, 2021 at 9:43 PM Mickaël Salaün  wrote:
> >> Using Landlock objects and ruleset, it is possible to tag inodes
> >> according to a process's domain.
> > [...]
> >> +static void release_inode(struct landlock_object *const object)
> >> +   __releases(object->lock)
> >> +{
> >> +   struct inode *const inode = object->underobj;
> >> +   struct super_block *sb;
> >> +
> >> +   if (!inode) {
> >> +   spin_unlock(>lock);
> >> +   return;
> >> +   }
> >> +
> >> +   /*
> >> +* Protects against concurrent use by hook_sb_delete() of the 
> >> reference
> >> +* to the underlying inode.
> >> +*/
> >> +   object->underobj = NULL;
> >> +   /*
> >> +* Makes sure that if the filesystem is concurrently unmounted,
> >> +* hook_sb_delete() will wait for us to finish iput().
> >> +*/
> >> +   sb = inode->i_sb;
> >> +   atomic_long_inc(_superblock(sb)->inode_refs);
> >> +   spin_unlock(>lock);
> >> +   /*
> >> +* Because object->underobj was not NULL, hook_sb_delete() and
> >> +* get_inode_object() guarantee that it is safe to reset
> >> +* landlock_inode(inode)->object while it is not NULL.  It is 
> >> therefore
> >> +* not necessary to lock inode->i_lock.
> >> +*/
> >> +   rcu_assign_pointer(landlock_inode(inode)->object, NULL);
> >> +   /*
> >> +* Now, new rules can safely be tied to @inode with 
> >> get_inode_object().
> >> +*/
> >> +
> >> +   iput(inode);
> >> +   if (atomic_long_dec_and_test(_superblock(sb)->inode_refs))
> >> +   wake_up_var(_superblock(sb)->inode_refs);
> >> +}
> > [...]
> >> +static struct landlock_object *get_inode_object(struct inode *const inode)
> >> +{
> >> +   struct landlock_object *object, *new_object;
> >> +   struct landlock_inode_security *inode_sec = landlock_inode(inode);
> >> +
> >> +   rcu_read_lock();
> >> +retry:
> >> +   object = rcu_dereference(inode_sec->object);
> >> +   if (object) {
> >> +   if (likely(refcount_inc_not_zero(>usage))) {
> >> +   rcu_read_unlock();
> >> +   return object;
> >> +   }
> >> +   /*
> >> +* We are racing with release_inode(), the object is going
> >> +* away.  Wait for release_inode(), then retry.
> >> +*/
> >> +   spin_lock(>lock);
> >> +   spin_unlock(>lock);
> >> +   goto retry;
> >> +   }
> >> +   rcu_read_unlock();
> >> +
> >> +   /*
> >> +* If there is no object tied to @inode, then create a new one 
> >> (without
> >> +* holding any locks).
> >> +*/
> >> +   new_object = landlock_create_object(_fs_underops, inode);
> >> +   if (IS_ERR(new_object))
> >> +   return new_object;
> >> +
> >> +   /* Protects against concurrent get_inode_object() calls. */
> >> +   spin_lock(>i_lock);
> >> +   object = rcu_dereference_protected(inode_sec->object,
> >> +   lockdep_is_held(>i_lock));
> >
> > rcu_dereference_protected() requires that inode_sec->object is not
> > concurrently changed, but I think another thread could call
> > get_inode_object() while we're in landlock_create_object(), and then
> > we could race with the NULL write in release_inode() here? (It
> > wouldn't actually be a UAF though because we're not actually accessing
> > `object` here.) Or am I missing a lock that prevents this?
> >
> > In v28 this wasn't an issue because release_inode() was holding
> > inode->i_lock (and object->lock) during the NULL store; but in v29 and
> > this version the NULL store in release_inode() moved out of the locked
> > region. I think you could just move the NULL store in release_inode()
> > back up (and maybe add a comment explaining the locking rules for
> > landlock_inode(...)->object)?
> >
> > (Or alternatively you could use rcu_dereference_raw() with a comment
> > explaining that the read pointer is only used to check for NULL-ness,
> > and that it is guaranteed that the pointer can't change if it is NULL
> > and we're holding the lock. But that'd be needlessly complicated, I
> > think.)
>
> To reach rcu_assign_pointer(landlock_inode(inode)->object, NULL) in
> release_inode() or in hook_sb_delete(), the
> landlock_inode(inode)->object need to be non-NULL,

Yes.

> which implies that a
> call to get_inode_object(inode) either "retry" (because release_inode is
> only called by landlock_put_object, which set object->usage to 0) until
> it creates a new object, or reuses the existing referenced object (and
> increments object->usage).

But it can be that landlock_inode(inode)->object only becomes non-NULL
after get_inode_object() has checked
rcu_dereference(inode_sec->object).

> 

Re: [PATCH v30 07/12] landlock: Support filesystem access-control

2021-03-23 Thread Mickaël Salaün


On 23/03/2021 01:13, Jann Horn wrote:
>  On Tue, Mar 16, 2021 at 9:43 PM Mickaël Salaün  wrote:
>> Using Landlock objects and ruleset, it is possible to tag inodes
>> according to a process's domain.
> [...]
>> +static void release_inode(struct landlock_object *const object)
>> +   __releases(object->lock)
>> +{
>> +   struct inode *const inode = object->underobj;
>> +   struct super_block *sb;
>> +
>> +   if (!inode) {
>> +   spin_unlock(>lock);
>> +   return;
>> +   }
>> +
>> +   /*
>> +* Protects against concurrent use by hook_sb_delete() of the 
>> reference
>> +* to the underlying inode.
>> +*/
>> +   object->underobj = NULL;
>> +   /*
>> +* Makes sure that if the filesystem is concurrently unmounted,
>> +* hook_sb_delete() will wait for us to finish iput().
>> +*/
>> +   sb = inode->i_sb;
>> +   atomic_long_inc(_superblock(sb)->inode_refs);
>> +   spin_unlock(>lock);
>> +   /*
>> +* Because object->underobj was not NULL, hook_sb_delete() and
>> +* get_inode_object() guarantee that it is safe to reset
>> +* landlock_inode(inode)->object while it is not NULL.  It is 
>> therefore
>> +* not necessary to lock inode->i_lock.
>> +*/
>> +   rcu_assign_pointer(landlock_inode(inode)->object, NULL);
>> +   /*
>> +* Now, new rules can safely be tied to @inode with 
>> get_inode_object().
>> +*/
>> +
>> +   iput(inode);
>> +   if (atomic_long_dec_and_test(_superblock(sb)->inode_refs))
>> +   wake_up_var(_superblock(sb)->inode_refs);
>> +}
> [...]
>> +static struct landlock_object *get_inode_object(struct inode *const inode)
>> +{
>> +   struct landlock_object *object, *new_object;
>> +   struct landlock_inode_security *inode_sec = landlock_inode(inode);
>> +
>> +   rcu_read_lock();
>> +retry:
>> +   object = rcu_dereference(inode_sec->object);
>> +   if (object) {
>> +   if (likely(refcount_inc_not_zero(>usage))) {
>> +   rcu_read_unlock();
>> +   return object;
>> +   }
>> +   /*
>> +* We are racing with release_inode(), the object is going
>> +* away.  Wait for release_inode(), then retry.
>> +*/
>> +   spin_lock(>lock);
>> +   spin_unlock(>lock);
>> +   goto retry;
>> +   }
>> +   rcu_read_unlock();
>> +
>> +   /*
>> +* If there is no object tied to @inode, then create a new one 
>> (without
>> +* holding any locks).
>> +*/
>> +   new_object = landlock_create_object(_fs_underops, inode);
>> +   if (IS_ERR(new_object))
>> +   return new_object;
>> +
>> +   /* Protects against concurrent get_inode_object() calls. */
>> +   spin_lock(>i_lock);
>> +   object = rcu_dereference_protected(inode_sec->object,
>> +   lockdep_is_held(>i_lock));
> 
> rcu_dereference_protected() requires that inode_sec->object is not
> concurrently changed, but I think another thread could call
> get_inode_object() while we're in landlock_create_object(), and then
> we could race with the NULL write in release_inode() here? (It
> wouldn't actually be a UAF though because we're not actually accessing
> `object` here.) Or am I missing a lock that prevents this?
> 
> In v28 this wasn't an issue because release_inode() was holding
> inode->i_lock (and object->lock) during the NULL store; but in v29 and
> this version the NULL store in release_inode() moved out of the locked
> region. I think you could just move the NULL store in release_inode()
> back up (and maybe add a comment explaining the locking rules for
> landlock_inode(...)->object)?
> 
> (Or alternatively you could use rcu_dereference_raw() with a comment
> explaining that the read pointer is only used to check for NULL-ness,
> and that it is guaranteed that the pointer can't change if it is NULL
> and we're holding the lock. But that'd be needlessly complicated, I
> think.)

To reach rcu_assign_pointer(landlock_inode(inode)->object, NULL) in
release_inode() or in hook_sb_delete(), the
landlock_inode(inode)->object need to be non-NULL, which implies that a
call to get_inode_object(inode) either "retry" (because release_inode is
only called by landlock_put_object, which set object->usage to 0) until
it creates a new object, or reuses the existing referenced object (and
increments object->usage). The worse case would be if
get_inode_object(inode) is called just before the
rcu_assign_pointer(landlock_inode(inode)->object, NULL) from
hook_sb_delete(), which would result in an object with a NULL underobj,
which is the expected behavior (and checked by release_inode).

The line rcu_assign_pointer(inode_sec->object, new_object) from
get_inode_object() can only be reached if the underlying inode doesn't
reference an object, 

Re: [PATCH v30 07/12] landlock: Support filesystem access-control

2021-03-22 Thread Jann Horn
 On Tue, Mar 16, 2021 at 9:43 PM Mickaël Salaün  wrote:
> Using Landlock objects and ruleset, it is possible to tag inodes
> according to a process's domain.
[...]
> +static void release_inode(struct landlock_object *const object)
> +   __releases(object->lock)
> +{
> +   struct inode *const inode = object->underobj;
> +   struct super_block *sb;
> +
> +   if (!inode) {
> +   spin_unlock(>lock);
> +   return;
> +   }
> +
> +   /*
> +* Protects against concurrent use by hook_sb_delete() of the 
> reference
> +* to the underlying inode.
> +*/
> +   object->underobj = NULL;
> +   /*
> +* Makes sure that if the filesystem is concurrently unmounted,
> +* hook_sb_delete() will wait for us to finish iput().
> +*/
> +   sb = inode->i_sb;
> +   atomic_long_inc(_superblock(sb)->inode_refs);
> +   spin_unlock(>lock);
> +   /*
> +* Because object->underobj was not NULL, hook_sb_delete() and
> +* get_inode_object() guarantee that it is safe to reset
> +* landlock_inode(inode)->object while it is not NULL.  It is 
> therefore
> +* not necessary to lock inode->i_lock.
> +*/
> +   rcu_assign_pointer(landlock_inode(inode)->object, NULL);
> +   /*
> +* Now, new rules can safely be tied to @inode with 
> get_inode_object().
> +*/
> +
> +   iput(inode);
> +   if (atomic_long_dec_and_test(_superblock(sb)->inode_refs))
> +   wake_up_var(_superblock(sb)->inode_refs);
> +}
[...]
> +static struct landlock_object *get_inode_object(struct inode *const inode)
> +{
> +   struct landlock_object *object, *new_object;
> +   struct landlock_inode_security *inode_sec = landlock_inode(inode);
> +
> +   rcu_read_lock();
> +retry:
> +   object = rcu_dereference(inode_sec->object);
> +   if (object) {
> +   if (likely(refcount_inc_not_zero(>usage))) {
> +   rcu_read_unlock();
> +   return object;
> +   }
> +   /*
> +* We are racing with release_inode(), the object is going
> +* away.  Wait for release_inode(), then retry.
> +*/
> +   spin_lock(>lock);
> +   spin_unlock(>lock);
> +   goto retry;
> +   }
> +   rcu_read_unlock();
> +
> +   /*
> +* If there is no object tied to @inode, then create a new one 
> (without
> +* holding any locks).
> +*/
> +   new_object = landlock_create_object(_fs_underops, inode);
> +   if (IS_ERR(new_object))
> +   return new_object;
> +
> +   /* Protects against concurrent get_inode_object() calls. */
> +   spin_lock(>i_lock);
> +   object = rcu_dereference_protected(inode_sec->object,
> +   lockdep_is_held(>i_lock));

rcu_dereference_protected() requires that inode_sec->object is not
concurrently changed, but I think another thread could call
get_inode_object() while we're in landlock_create_object(), and then
we could race with the NULL write in release_inode() here? (It
wouldn't actually be a UAF though because we're not actually accessing
`object` here.) Or am I missing a lock that prevents this?

In v28 this wasn't an issue because release_inode() was holding
inode->i_lock (and object->lock) during the NULL store; but in v29 and
this version the NULL store in release_inode() moved out of the locked
region. I think you could just move the NULL store in release_inode()
back up (and maybe add a comment explaining the locking rules for
landlock_inode(...)->object)?

(Or alternatively you could use rcu_dereference_raw() with a comment
explaining that the read pointer is only used to check for NULL-ness,
and that it is guaranteed that the pointer can't change if it is NULL
and we're holding the lock. But that'd be needlessly complicated, I
think.)


> +   if (unlikely(object)) {
> +   /* Someone else just created the object, bail out and retry. 
> */
> +   spin_unlock(>i_lock);
> +   kfree(new_object);
> +
> +   rcu_read_lock();
> +   goto retry;
> +   }
> +
> +   rcu_assign_pointer(inode_sec->object, new_object);
> +   /*
> +* @inode will be released by hook_sb_delete() on its superblock
> +* shutdown.
> +*/
> +   ihold(inode);
> +   spin_unlock(>i_lock);
> +   return new_object;
> +}


Re: [PATCH v30 07/12] landlock: Support filesystem access-control

2021-03-19 Thread Mickaël Salaün


On 19/03/2021 19:57, Kees Cook wrote:
> On Tue, Mar 16, 2021 at 09:42:47PM +0100, Mickaël Salaün wrote:
>> From: Mickaël Salaün 
>>
>> Using Landlock objects and ruleset, it is possible to tag inodes
>> according to a process's domain.  To enable an unprivileged process to
>> express a file hierarchy, it first needs to open a directory (or a file)
>> and pass this file descriptor to the kernel through
>> landlock_add_rule(2).  When checking if a file access request is
>> allowed, we walk from the requested dentry to the real root, following
>> the different mount layers.  The access to each "tagged" inodes are
>> collected according to their rule layer level, and ANDed to create
>> access to the requested file hierarchy.  This makes possible to identify
>> a lot of files without tagging every inodes nor modifying the
>> filesystem, while still following the view and understanding the user
>> has from the filesystem.
>>
>> Add a new ARCH_EPHEMERAL_INODES for UML because it currently does not
>> keep the same struct inodes for the same inodes whereas these inodes are
>> in use.
>>
>> This commit adds a minimal set of supported filesystem access-control
>> which doesn't enable to restrict all file-related actions.  This is the
>> result of multiple discussions to minimize the code of Landlock to ease
>> review.  Thanks to the Landlock design, extending this access-control
>> without breaking user space will not be a problem.  Moreover, seccomp
>> filters can be used to restrict the use of syscall families which may
>> not be currently handled by Landlock.
>>
>> Cc: Al Viro 
>> Cc: Anton Ivanov 
>> Cc: James Morris 
>> Cc: Jann Horn 
>> Cc: Jeff Dike 
>> Cc: Kees Cook 
>> Cc: Richard Weinberger 
>> Cc: Serge E. Hallyn 
>> Signed-off-by: Mickaël Salaün 
>> Link: https://lore.kernel.org/r/20210316204252.427806-8-...@digikod.net
>> [...]
>> +spin_lock(>s_inode_list_lock);
>> +list_for_each_entry(inode, >s_inodes, i_sb_list) {
>> +struct landlock_object *object;
>> +
>> +/* Only handles referenced inodes. */
>> +if (!atomic_read(>i_count))
>> +continue;
>> +
>> +/*
>> + * Checks I_FREEING and I_WILL_FREE  to protect against a race
>> + * condition when release_inode() just called iput(), which
>> + * could lead to a NULL dereference of inode->security or a
>> + * second call to iput() for the same Landlock object.  Also
>> + * checks I_NEW because such inode cannot be tied to an object.
>> + */
>> +spin_lock(>i_lock);
>> +if (inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW)) {
>> +spin_unlock(>i_lock);
>> +continue;
>> +}
> 
> This (and elsewhere here) seems like a lot of inode internals getting
> exposed. Can any of this be repurposed into helpers? I see this test
> scattered around the kernel a fair bit:
> 
> $ git grep I_FREEING | grep I_WILL_FREE | grep I_NEW | wc -l
> 9

Dealing with the filesystem is complex. Some helpers could probably be
added, but with a series dedicated to the filesystem. I can work on that
once this series is merged.

> 
>> +static inline u32 get_mode_access(const umode_t mode)
>> +{
>> +switch (mode & S_IFMT) {
>> +case S_IFLNK:
>> +return LANDLOCK_ACCESS_FS_MAKE_SYM;
>> +case 0:
>> +/* A zero mode translates to S_IFREG. */
>> +case S_IFREG:
>> +return LANDLOCK_ACCESS_FS_MAKE_REG;
>> +case S_IFDIR:
>> +return LANDLOCK_ACCESS_FS_MAKE_DIR;
>> +case S_IFCHR:
>> +return LANDLOCK_ACCESS_FS_MAKE_CHAR;
>> +case S_IFBLK:
>> +return LANDLOCK_ACCESS_FS_MAKE_BLOCK;
>> +case S_IFIFO:
>> +return LANDLOCK_ACCESS_FS_MAKE_FIFO;
>> +case S_IFSOCK:
>> +return LANDLOCK_ACCESS_FS_MAKE_SOCK;
>> +default:
>> +WARN_ON_ONCE(1);
>> +return 0;
>> +}
> 
> I'm assuming this won't be reachable from userspace.

It should not, only a bogus kernel code could.

> 
>> [...]
>> index a5d6ef334991..f8e8e980454c 100644
>> --- a/security/landlock/setup.c
>> +++ b/security/landlock/setup.c
>> @@ -11,17 +11,24 @@
>>  
>>  #include "common.h"
>>  #include "cred.h"
>> +#include "fs.h"
>>  #include "ptrace.h"
>>  #include "setup.h"
>>  
>> +bool landlock_initialized __lsm_ro_after_init = false;
>> +
>>  struct lsm_blob_sizes landlock_blob_sizes __lsm_ro_after_init = {
>>  .lbs_cred = sizeof(struct landlock_cred_security),
>> +.lbs_inode = sizeof(struct landlock_inode_security),
>> +.lbs_superblock = sizeof(struct landlock_superblock_security),
>>  };
>>  
>>  static int __init landlock_init(void)
>>  {
>>  landlock_add_cred_hooks();
>>  landlock_add_ptrace_hooks();
>> +landlock_add_fs_hooks();
>> +landlock_initialized = true;
> 
> I think this landlock_initialized is logically separate from the optional
> 

Re: [PATCH v30 07/12] landlock: Support filesystem access-control

2021-03-19 Thread Kees Cook
On Tue, Mar 16, 2021 at 09:42:47PM +0100, Mickaël Salaün wrote:
> From: Mickaël Salaün 
> 
> Using Landlock objects and ruleset, it is possible to tag inodes
> according to a process's domain.  To enable an unprivileged process to
> express a file hierarchy, it first needs to open a directory (or a file)
> and pass this file descriptor to the kernel through
> landlock_add_rule(2).  When checking if a file access request is
> allowed, we walk from the requested dentry to the real root, following
> the different mount layers.  The access to each "tagged" inodes are
> collected according to their rule layer level, and ANDed to create
> access to the requested file hierarchy.  This makes possible to identify
> a lot of files without tagging every inodes nor modifying the
> filesystem, while still following the view and understanding the user
> has from the filesystem.
> 
> Add a new ARCH_EPHEMERAL_INODES for UML because it currently does not
> keep the same struct inodes for the same inodes whereas these inodes are
> in use.
> 
> This commit adds a minimal set of supported filesystem access-control
> which doesn't enable to restrict all file-related actions.  This is the
> result of multiple discussions to minimize the code of Landlock to ease
> review.  Thanks to the Landlock design, extending this access-control
> without breaking user space will not be a problem.  Moreover, seccomp
> filters can be used to restrict the use of syscall families which may
> not be currently handled by Landlock.
> 
> Cc: Al Viro 
> Cc: Anton Ivanov 
> Cc: James Morris 
> Cc: Jann Horn 
> Cc: Jeff Dike 
> Cc: Kees Cook 
> Cc: Richard Weinberger 
> Cc: Serge E. Hallyn 
> Signed-off-by: Mickaël Salaün 
> Link: https://lore.kernel.org/r/20210316204252.427806-8-...@digikod.net
> [...]
> + spin_lock(>s_inode_list_lock);
> + list_for_each_entry(inode, >s_inodes, i_sb_list) {
> + struct landlock_object *object;
> +
> + /* Only handles referenced inodes. */
> + if (!atomic_read(>i_count))
> + continue;
> +
> + /*
> +  * Checks I_FREEING and I_WILL_FREE  to protect against a race
> +  * condition when release_inode() just called iput(), which
> +  * could lead to a NULL dereference of inode->security or a
> +  * second call to iput() for the same Landlock object.  Also
> +  * checks I_NEW because such inode cannot be tied to an object.
> +  */
> + spin_lock(>i_lock);
> + if (inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW)) {
> + spin_unlock(>i_lock);
> + continue;
> + }

This (and elsewhere here) seems like a lot of inode internals getting
exposed. Can any of this be repurposed into helpers? I see this test
scattered around the kernel a fair bit:

$ git grep I_FREEING | grep I_WILL_FREE | grep I_NEW | wc -l
9

> +static inline u32 get_mode_access(const umode_t mode)
> +{
> + switch (mode & S_IFMT) {
> + case S_IFLNK:
> + return LANDLOCK_ACCESS_FS_MAKE_SYM;
> + case 0:
> + /* A zero mode translates to S_IFREG. */
> + case S_IFREG:
> + return LANDLOCK_ACCESS_FS_MAKE_REG;
> + case S_IFDIR:
> + return LANDLOCK_ACCESS_FS_MAKE_DIR;
> + case S_IFCHR:
> + return LANDLOCK_ACCESS_FS_MAKE_CHAR;
> + case S_IFBLK:
> + return LANDLOCK_ACCESS_FS_MAKE_BLOCK;
> + case S_IFIFO:
> + return LANDLOCK_ACCESS_FS_MAKE_FIFO;
> + case S_IFSOCK:
> + return LANDLOCK_ACCESS_FS_MAKE_SOCK;
> + default:
> + WARN_ON_ONCE(1);
> + return 0;
> + }

I'm assuming this won't be reachable from userspace.

> [...]
> index a5d6ef334991..f8e8e980454c 100644
> --- a/security/landlock/setup.c
> +++ b/security/landlock/setup.c
> @@ -11,17 +11,24 @@
>  
>  #include "common.h"
>  #include "cred.h"
> +#include "fs.h"
>  #include "ptrace.h"
>  #include "setup.h"
>  
> +bool landlock_initialized __lsm_ro_after_init = false;
> +
>  struct lsm_blob_sizes landlock_blob_sizes __lsm_ro_after_init = {
>   .lbs_cred = sizeof(struct landlock_cred_security),
> + .lbs_inode = sizeof(struct landlock_inode_security),
> + .lbs_superblock = sizeof(struct landlock_superblock_security),
>  };
>  
>  static int __init landlock_init(void)
>  {
>   landlock_add_cred_hooks();
>   landlock_add_ptrace_hooks();
> + landlock_add_fs_hooks();
> + landlock_initialized = true;

I think this landlock_initialized is logically separate from the optional
DEFINE_LSM "enabled" variable, but I thought I'd double check. :)

It seems like it's used here to avoid releasing superblocks before
landlock_init() is called? What is the scenario where that happens?

>   pr_info("Up and running.\n");
>   return 0;
>  }
> diff --git a/security/landlock/setup.h b/security/landlock/setup.h
> index 

Re: [PATCH v30 07/12] landlock: Support filesystem access-control

2021-03-18 Thread James Morris


> This commit adds a minimal set of supported filesystem access-control
> which doesn't enable to restrict all file-related actions.

It would be great to get some more review/acks on this patch, particularly 
from VFS/FS folk.


-- 
James Morris




[PATCH v30 07/12] landlock: Support filesystem access-control

2021-03-16 Thread Mickaël Salaün
From: Mickaël Salaün 

Using Landlock objects and ruleset, it is possible to tag inodes
according to a process's domain.  To enable an unprivileged process to
express a file hierarchy, it first needs to open a directory (or a file)
and pass this file descriptor to the kernel through
landlock_add_rule(2).  When checking if a file access request is
allowed, we walk from the requested dentry to the real root, following
the different mount layers.  The access to each "tagged" inodes are
collected according to their rule layer level, and ANDed to create
access to the requested file hierarchy.  This makes possible to identify
a lot of files without tagging every inodes nor modifying the
filesystem, while still following the view and understanding the user
has from the filesystem.

Add a new ARCH_EPHEMERAL_INODES for UML because it currently does not
keep the same struct inodes for the same inodes whereas these inodes are
in use.

This commit adds a minimal set of supported filesystem access-control
which doesn't enable to restrict all file-related actions.  This is the
result of multiple discussions to minimize the code of Landlock to ease
review.  Thanks to the Landlock design, extending this access-control
without breaking user space will not be a problem.  Moreover, seccomp
filters can be used to restrict the use of syscall families which may
not be currently handled by Landlock.

Cc: Al Viro 
Cc: Anton Ivanov 
Cc: James Morris 
Cc: Jann Horn 
Cc: Jeff Dike 
Cc: Kees Cook 
Cc: Richard Weinberger 
Cc: Serge E. Hallyn 
Signed-off-by: Mickaël Salaün 
Link: https://lore.kernel.org/r/20210316204252.427806-8-...@digikod.net
---

Changes since v29:
* Remove a useless unlock/lock for the first loop walk in
  hook_sb_delete().  This also makes the code clearer but doesn't change
  the garantees for iput().
* Rename iput_inode to prev_inode, which shows its origin.

Changes since v28:
* Fix race conditions that could be caused by concurrent calls to
  release_inode() and hook_sb_delete().
* Avoid livelock when a lot of inodes are tagged.
* Improve concurrency locking and add comments to explain the specific
  lock rules.
* Add an inode_free_security hook to check that release_inode() and
  hook_sb_delete() do their job.
* Add early return to check_access_path() to check if the access request
  is empty.  This doesn't change the semantic.
* Reword the first description sentence (suggested by Serge Hallyn).

Changes since v27:
* Fix domains with layers of non-overlapping access rights (cf.
  layout1.non_overlapping_accesses test) thanks to a stack of access
  rights per layer (replacing ORed access rights).  This avoids
  too-restrictive domains.
* Cosmetic fixes and updates in comments and Kconfig.

Changes since v26:
* Check each rule of a path to enable a more permissive and pragmatic
  access control per layer.  Suggested by Jann Horn:
  
https://lore.kernel.org/lkml/cag48ez1o0vtweird3kqexof78wr+cmp5bgk5kh5cs7apepi...@mail.gmail.com/
* Rename check_access_path_continue() to unmask_layers() and make it
  return the new layer mask.
* Avoid double domain check in hook_file_open().
* In the documentation, add utime(2) as another example of unhandled
  syscalls.  Indeed, using `touch` to test write access may be tempting.
* Remove outdated comment about OverlayFS.
* Rename the landlock.h ifdef to align with most similar files.
* Fix spelling.

Changes since v25:
* Move build_check_layer() to ruleset.c, and add built-time checks for
  the fs_access_mask and access variables according to
  _LANDLOCK_ACCESS_FS_MASK.
* Move limits to a dedicated file and rename them:
  _LANDLOCK_ACCESS_FS_LAST and _LANDLOCK_ACCESS_FS_MASK.
* Set build_check_layer() as non-inline to trigger a warning if it is
  not called.
* Use BITS_PER_TYPE() macro.
* Rename function to landlock_add_fs_hooks().
* Cosmetic variable renames.

Changes since v24:
* Use the new struct landlock_rule and landlock_layer to not mix
  accesses from different layers.  Revert "Enforce deterministic
  interleaved path rules" from v24, and fix the layer check.  This
  enables to follow a sane semantic: an access is granted if, for each
  policy layer, at least one rule encountered on the pathwalk grants the
  access, regardless of their position in the layer stack (suggested by
  Jann Horn).  See layout1.interleaved_masked_accesses tests from
  tools/testing/selftests/landlock/fs_test.c for corner cases.
* Add build-time checks for layers.
* Use the new landlock_insert_rule() API.

Changes since v23:
* Enforce deterministic interleaved path rules.  To have consistent
  layered rules, granting access to a path implies that all accesses
  tied to inodes, from the requested file to the real root, must be
  checked.  Otherwise, stacked rules may result to overzealous
  restrictions.  By excluding the ability to add exceptions in the same
  layer (e.g. /a allowed, /a/b denied, and /a/b/c allowed), we get
  deterministic interleaved path rules.  This removes an optimization