This is exactly what I am seeing ok, good, that makes me feel a bit better
(I am not crazy!)  Before we file a JIRA, can anyone comment on what may be
happening here? Is this a bug or a feature? Since this is so new, I am not
really sure the expected result...

On Wed, Nov 11, 2015 at 3:25 PM, Vince Gonzalez <vince.gonza...@gmail.com>
wrote:

> My files were owned by mapr:mapr. I changed the ownership of everything to
> ec2-user, and now get permission denied on the refresh table metadata
> command, even though impersonation is on and I authenticated as ec2-user.
> If impersonation is working correctly, then I'd expect this should work. Is
> this what you see?
>
> It's also kinda weird in that both users involved should have write access
> to the files - ec2-user is the owner, and mapr is the superuser on MFS.
>
> [ec2-user@ip-172-16-2-36 tmp]$ sudo -u mapr chown -R ec2-user:ec2-user .
> [ec2-user@ip-172-16-2-36 tmp]$ sqlline -u jdbc:drill: -n ec2-user -p mapr
> apache drill 1.2.0
> "a drill is a terrible thing to waste"
> 0: jdbc:drill:> select count(*) from dfs.`/tmp/flows`;
> +---------+
> | EXPR$0  |
> +---------+
> | 370280  |
> +---------+
> 1 row selected (6.452 seconds)
> 0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`;
>
> +--------+-----------------------------------------------------------------------------------------------------+
> |   ok   |                                               summary
>                                     |
>
> +--------+-----------------------------------------------------------------------------------------------------+
> | false  | Error: 2050.6796.144654
> /tmp/flows/2015/11/11/15/01/20/.drill.parquet_metadata (Permission denied)
>  |
>
> +--------+-----------------------------------------------------------------------------------------------------+
> 1 row selected (3.253 seconds)
>
> $ ls -la flows/2015/11/11/15/01/20/.drill.parquet_metadata
> -rwxr-xr-x 1 ec2-user ec2-user 0 Nov 11 19:55
> flows/2015/11/11/15/01/20/.drill.parquet_metadata
>
>
> Then I tried to CTAS and it works, but apparently impersonation does not:
>
> 0: jdbc:drill:> create table dfs.tmp.flows2 as select * from
> dfs.`/tmp/flows`;
> +-----------+----------------------------+
> | Fragment  | Number of records written  |
> +-----------+----------------------------+
> | 1_1       | 81222                      |
> | 1_3       | 78255                      |
> | 1_0       | 113624                     |
> | 1_2       | 97179                      |
> +-----------+----------------------------+
> 4 rows selected (22.591 seconds)
> 0: jdbc:drill:> refresh table metadata dfs.tmp.flows2;
> +-------+--------------------------------------------------+
> |  ok   |                     summary                      |
> +-------+--------------------------------------------------+
> | true  | Successfully updated metadata for table flows2.  |
> +-------+--------------------------------------------------+
> 1 row selected (0.13 seconds)
>
> $ ls -la flows2/
> total 3499
> drwxr-xr-x 2 ec2-user ec2-user       5 Nov 11 21:18 .
> drwxrwxrwx 4 ec2-user ec2-user       2 Nov 11 21:18 ..
> -rwxr-xr-x 1 ec2-user ec2-user 1068250 Nov 11 21:18 1_0_0.parquet
> -rwxr-xr-x 1 ec2-user ec2-user  789341 Nov 11 21:18 1_1_0.parquet
> -rwxr-xr-x 1 ec2-user ec2-user  952667 Nov 11 21:18 1_2_0.parquet
> -rwxr-xr-x 1 ec2-user ec2-user  755805 Nov 11 21:18 1_3_0.parquet
> -rwxr-xr-x 1 mapr     mapr       14033 Nov 11 21:18 .drill.parquet_metadata
>
>
> Looks like a bug to me. Impersonation doesn't seem to be in force for
> REFRESH TABLE METADATA.
>
>
> On Wed, Nov 11, 2015 at 4:09 PM, John Omernik <j...@omernik.com> wrote:
>
> > I turned on MapR Auditing (This is a handy feature) and found that when I
> > run a query (that is giving me access denied.. my query is select * from
> > table limit 1) Per MapR the user I am logged in as (mapradm) is trying to
> > do a create operation on the .drill.parquet_metadata operation and I
> > guessing it's failing with status: 17 (Not sure what this means,
> successes
> > appear to be "0".  What was intersting was the "CREATE" being attempted
> > three times.   Any thoughts on why a select * from tables limit 1 would
> try
> > to initiate a create operation on the .drill.parquet_metadata file?
> >
> > On Wed, Nov 11, 2015 at 2:25 PM, John Omernik <j...@omernik.com> wrote:
> >
> > > I take it back.
> > >
> > > I went to run a query, in the same session that had worked, and now I
> am
> > > getting permission denied.
> > >
> > > I do have a query running created new directories every 5 minutes,
> > > however, these aren't the directories that are giving me permission
> > denied.
> > >   Did you try running an aggregate query accross all data? This is a
> > > interesting one to track down, not sure why I am getting the access
> > denied
> > > now,
> > >
> > > the .drill.parquet_metadata file in the directory that I am getting the
> > > error on is owned by mapr:mapr and has rwxr-xr-x  permissions. This
> tells
> > > me that both the user of the drillbits (mapr) and the user I am logged
> > into
> > > in sqlline (mapradm) should be able to read the file... so why do I get
> > an
> > > access denied in running a query. I any assistance would be valuable
> here
> > > in that there are some great performance increases with the metadata
> > > caching, and I don't want to miss out on that.
> > >
> > > On Wed, Nov 11, 2015 at 2:18 PM, John Omernik <j...@omernik.com>
> wrote:
> > >
> > >> All files are owned by mapr:mapr?
> > >>
> > >> I have a setup where mapr is the user running the drillbit, but then I
> > >> have a directory that is owned by a another user. mapradm:mapradm on
> all
> > >> files. (Permissions on directories and files appears to be rwxr-x-r-x)
> > When
> > >> I run the REFRESH TABLE metatdata the .drill.parquet_metadata file
> gets
> > >> created as mapr:mapr with rwxr-xr-x.
> > >>
> > >> So
> > >> Drillbit User:mapr
> > >> Directory (and subdirectories/files) owner: mapradm:mapradm
> > >> Directory permissions (all files and folder under main directory)
> > >> rwxr-x-r-x
> > >>
> > >> I authenticated to drill via sqlline as user mapradm (this user should
> > be
> > >> able to read and write just fine to all directories).
> > >>
> > >> Now, one thing I did notice is my mapr user was not in the mapradm
> > group,
> > >> therefore, didn't have write permissions anywhere... when I fixed that
> > on
> > >> all nodes, and then I manually deleted the metadatafiles, things seem
> > to be
> > >> working. I wonder if that was my issue?
> > >>
> > >> Basically, the user running the drillbits need to be able to write
> files
> > >> (the .drill.parquet_metadata)  or something bad will happen :) I will
> do
> > >> more testing. This may be a good candidate for some documentation work
> > to
> > >> understand what permissions are required to be able to query these.
> > >>
> > >>
> > >>
> > >>
> > >> On Wed, Nov 11, 2015 at 1:36 PM, Vince Gonzalez <
> > vince.gonza...@gmail.com
> > >> > wrote:
> > >>
> > >>> Hi John, I tried this and didn't find any issues. Let me know if I
> > didn't
> > >>> follow your reproduction faithfully.
> > >>>
> > >>> $ sqlline -u jdbc:drill: -n ec2-user -p mapr
> > >>> apache drill 1.2.0
> > >>> "drill baby drill"
> > >>> 0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`;
> > >>> +-------+------------------------------------------------------+
> > >>> |  ok   |                       summary                        |
> > >>> +-------+------------------------------------------------------+
> > >>> | true  | Successfully updated metadata for table /tmp/flows.  |
> > >>> +-------+------------------------------------------------------+
> > >>> 1 row selected (32.27 seconds)
> > >>> 0: jdbc:drill:> select srcIP,dstIP from dfs.`/tmp/flows` limit 12;
> > >>> +---------------+---------------+
> > >>> |     srcIP     |     dstIP     |
> > >>> +---------------+---------------+
> > >>> | 172.16.2.152  | 172.16.1.58   |
> > >>> | 172.16.1.58   | 172.16.2.152  |
> > >>> | 172.16.2.152  | 172.16.2.73   |
> > >>> | 172.16.2.152  | 172.16.2.73   |
> > >>> | 172.16.2.73   | 172.16.2.152  |
> > >>> | 172.16.2.152  | 172.16.2.73   |
> > >>> | 172.16.2.152  | 172.16.2.73   |
> > >>> | 172.16.2.152  | 172.16.2.73   |
> > >>> | 172.16.2.73   | 172.16.2.152  |
> > >>> | 172.16.2.73   | 172.16.2.152  |
> > >>> | 172.16.2.73   | 172.16.2.152  |
> > >>> | 172.16.2.152  | 172.16.2.73   |
> > >>> +---------------+---------------+
> > >>> 12 rows selected (5.654 seconds)
> > >>>
> > >>> And here's what my table structure looks like (as seen via MapR NFS):
> > >>>
> > >>> $ tree /mapr/vgonzalez.drill/tmp/flows/ | head -15
> > >>> /mapr/vgonzalez.drill/tmp/flows/
> > >>> └── 2015
> > >>>     └── 11
> > >>>         ├── 10
> > >>>         │   ├── 21
> > >>>         │   │   ├── 39
> > >>>         │   │   │   ├── 03
> > >>>         │   │   │   │   ├── _common_metadata
> > >>>         │   │   │   │   ├── _metadata
> > >>>         │   │   │   │   ├──
> > >>> part-r-00000-853882bd-66d8-4505-96ba-f0a282e374de.gz.parquet
> > >>>         │   │   │   │   └── _SUCCESS
> > >>>         │   │   │   └── 20
> > >>>         │   │   │       ├── _common_metadata
> > >>>         │   │   │       ├── _metadata
> > >>>         │   │   │       ├──
> > >>> part-r-00000-37a94549-8e56-46d5-be88-cb28e6d8bc35.gz.parquet
> > >>>
> > >>> My parquet was created in Spark, not Drill. Not sure if that's
> > relevant.
> > >>>
> > >>> I have authentication and impersonation turned on, and the files are
> > >>> owned
> > >>> by mapr:mapr. Here's my drill-override.conf:
> > >>>
> > >>> drill.exec: {
> > >>>   cluster-id: "vgonzalez_drill-drillbits",
> > >>> zk.connect:
> > >>>
> > >>>
> >
> "ip-172-16-2-36.ec2.internal:5181,ip-172-16-2-37.ec2.internal:5181,ip-172-16-2-38.ec2.internal:5181"
> > >>> }
> > >>> drill.exec.impersonation: { enabled: true, max_chained_user_hops: 3 }
> > >>> drill.exec { security.user.auth { enabled: true, packages +=
> > >>> "org.apache.drill.exec.rpc.user.security", impl: "pam",
> pam_profiles: [
> > >>> "login","sudo","sshd","password-auth" ] } }
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Tue, Nov 10, 2015 at 1:17 PM, John Omernik <j...@omernik.com>
> > wrote:
> > >>>
> > >>> > Cool, looking forward to it.
> > >>> >
> > >>> > On Mon, Nov 9, 2015 at 7:21 PM, Vince Gonzalez <
> > >>> vince.gonza...@gmail.com>
> > >>> > wrote:
> > >>> >
> > >>> > > Hey John, I have a secure cluster and some parquet files, I'll
> try
> > >>> this
> > >>> > out
> > >>> > > and report back.
> > >>> > >
> > >>> > > On Monday, November 9, 2015, John Omernik <j...@omernik.com>
> > wrote:
> > >>> > >
> > >>> > > > Has anyone been able to try/test this? I am curious if it's me
> > only
> > >>> > issue
> > >>> > > > or something more of bug so I can open a JIRA if needed.
> > >>> > > >
> > >>> > > > John
> > >>> > > >
> > >>> > > > On Fri, Nov 6, 2015 at 11:06 AM, John Omernik <
> j...@omernik.com
> > >>> > > > <javascript:;>> wrote:
> > >>> > > >
> > >>> > > > > If someone has authorization/authentication setup, to
> > reproduce:
> > >>> > > > >
> > >>> > > > > Have a Parquet table with directories underneath the main (I
> > have
> > >>> > > > > directories per day)
> > >>> > > > >
> > >>> > > > > Then issue REFRESH TABLE METADATA on the root of the table
> > >>> running an
> > >>> > > > > authenticated user other than the drill bit user. (I am using
> > >>> mapr, I
> > >>> > > > used
> > >>> > > > > my user to run the query, and yes I have access to the data)
> > >>> > > > >
> > >>> > > > > Then run a normal query and see what the result is. .
> > >>> > > > >
> > >>> > > > > John
> > >>> > > > >
> > >>> > > > > On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala <
> > >>> > > > > nrentachint...@maprtech.com <javascript:;>> wrote:
> > >>> > > > >
> > >>> > > > >> This doesn't make sense and seems like a bug.
> > >>> > > > >> I think the right behavior is for the Drillbit to access the
> > >>> cache
> > >>> > as
> > >>> > > > >> Drillbit user at the query time (there is no user level
> > metadata
> > >>> > cache
> > >>> > > > in
> > >>> > > > >> Drill at this point).
> > >>> > > > >>
> > >>> > > > >>
> > >>> > > > >>
> > >>> > > > >> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik <
> > j...@omernik.com
> > >>> > > > <javascript:;>> wrote:
> > >>> > > > >>
> > >>> > > > >> > I ran REFRESH TABLE METADATA on a table, it completed
> > >>> > successfully.
> > >>> > > > >> >
> > >>> > > > >> > When I tried a subsequent query, I get a IOException:
> > >>> Permission
> > >>> > > > Denied
> > >>> > > > >> on
> > >>> > > > >> > .drill.parquet_metadata.
> > >>> > > > >> >
> > >>> > > > >> > I am running drill with authentication.  I ran the REFRESH
> > >>> TABLE
> > >>> > > > >> METADATA
> > >>> > > > >> > as user X, it appears the .drill.parquet_metadata was
> > created
> > >>> and
> > >>> > > > owned
> > >>> > > > >> by
> > >>> > > > >> > the user the drill bits are running as as is created with
> > >>> > > -rwxr-x-r-x
> > >>> > > > >> >
> > >>> > > > >> > My question is this: So, I can see why the file is owned
> by
> > >>> the
> > >>> > > drill
> > >>> > > > >> bit
> > >>> > > > >> > user, and the file is created with all can read
> permissions,
> > >>> but
> > >>> > why
> > >>> > > > am
> > >>> > > > >> I
> > >>> > > > >> > getting a permission denied when user X is trying to run a
> > >>> query?
> > >>> > > > >> >
> > >>> > > > >>
> > >>> > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>

Reply via email to