Stephen O'Donnell created HDFS-15372:
----------------------------------------

             Summary: Files in snapshots no longer see attribute provider 
permissions
                 Key: HDFS-15372
                 URL: https://issues.apache.org/jira/browse/HDFS-15372
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: Stephen O'Donnell
            Assignee: Stephen O'Donnell


Given a cluster with an authorization provider configured (eg Sentry) and the 
paths covered by the provider are snapshotable, there was a change in behaviour 
in how the provider permissions and ACLs are applied to files in snapshots 
between the 2.x branch and Hadoop 3.0.

Eg, if we have the snapshotable path /data, which is Sentry managed. The ACLs 
below are provided by Sentry:

{code}
hadoop fs -getfacl -R /data
# file: /data
# owner: hive
# group: hive
user::rwx
group::rwx
other::--x

# file: /data/tab1
# owner: hive
# group: hive
user::rwx
group::---
group:flume:rwx
user:hive:rwx
group:hive:rwx
group:testgroup:rwx
mask::rwx
other::--x
/data/tab1
{code}

After taking a snapshot, the files in the snapshot do not see the provider 
permissions:

{code}
hadoop fs -getfacl -R /data/.snapshot
# file: /data/.snapshot
# owner: 
# group: 
user::rwx
group::rwx
other::rwx

# file: /data/.snapshot/snap1
# owner: hive
# group: hive
user::rwx
group::rwx
other::--x

# file: /data/.snapshot/snap1/tab1
# owner: hive
# group: hive
user::rwx
group::rwx
other::--x
{code}

However pre-Hadoop 3.0 (when the attribute provider etc was extensively 
refactored) snapshots did get the provider permissions.

The reason is this code in FSDirectory.java which ultimately calls the 
attribute provider and passes the path we want permissions for:

{code}
  INodeAttributes getAttributes(INodesInPath iip)
      throws IOException {
    INode node = FSDirectory.resolveLastINode(iip);
    int snapshot = iip.getPathSnapshotId();
    INodeAttributes nodeAttrs = node.getSnapshotINode(snapshot);
    UserGroupInformation ugi = NameNode.getRemoteUser();
    INodeAttributeProvider ap = this.getUserFilteredAttributeProvider(ugi);

    if (ap != null) {
      // permission checking sends the full components array including the
      // first empty component for the root.  however file status
      // related calls are expected to strip out the root component according
      // to TestINodeAttributeProvider.
      byte[][] components = iip.getPathComponents();
      components = Arrays.copyOfRange(components, 1, components.length);
      nodeAttrs = ap.getAttributes(components, nodeAttrs);
    }
    return nodeAttrs;
  }
{code}

The line:

{code}
INode node = FSDirectory.resolveLastINode(iip);
{code}

Picks the last resolved Inode and if you then call node.getPathComponents, for 
a path like '/data/.snapshot/snap1/tab1' it will return /data/tab1. It resolves 
the snapshot path to its original location, but its still the snapshot inode.

However the logic passes 'iip.getPathComponents' which returns 
"/user/.snapshot/snap1/tab" to the provider.

The pre Hadoop 3.0 code passes the inode directly to the provider, and hence it 
only ever sees the path as "/user/data/tab1".

It is debatable which path should be passed to the provider - 
/user/.snapshot/snap1/tab or /data/tab1 in the case of snapshots. However as 
the behaviour has changed I feel we should ensure the old behaviour is retained.

It would also be fairly easy to provide a config switch so the provider gets 
the full snapshot path or the resolved path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to