This is exactly what I am seeing ok, good, that makes me feel a bit better (I am not crazy!) Before we file a JIRA, can anyone comment on what may be happening here? Is this a bug or a feature? Since this is so new, I am not really sure the expected result...
On Wed, Nov 11, 2015 at 3:25 PM, Vince Gonzalez <vince.gonza...@gmail.com> wrote: > My files were owned by mapr:mapr. I changed the ownership of everything to > ec2-user, and now get permission denied on the refresh table metadata > command, even though impersonation is on and I authenticated as ec2-user. > If impersonation is working correctly, then I'd expect this should work. Is > this what you see? > > It's also kinda weird in that both users involved should have write access > to the files - ec2-user is the owner, and mapr is the superuser on MFS. > > [ec2-user@ip-172-16-2-36 tmp]$ sudo -u mapr chown -R ec2-user:ec2-user . > [ec2-user@ip-172-16-2-36 tmp]$ sqlline -u jdbc:drill: -n ec2-user -p mapr > apache drill 1.2.0 > "a drill is a terrible thing to waste" > 0: jdbc:drill:> select count(*) from dfs.`/tmp/flows`; > +---------+ > | EXPR$0 | > +---------+ > | 370280 | > +---------+ > 1 row selected (6.452 seconds) > 0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`; > > +--------+-----------------------------------------------------------------------------------------------------+ > | ok | summary > | > > +--------+-----------------------------------------------------------------------------------------------------+ > | false | Error: 2050.6796.144654 > /tmp/flows/2015/11/11/15/01/20/.drill.parquet_metadata (Permission denied) > | > > +--------+-----------------------------------------------------------------------------------------------------+ > 1 row selected (3.253 seconds) > > $ ls -la flows/2015/11/11/15/01/20/.drill.parquet_metadata > -rwxr-xr-x 1 ec2-user ec2-user 0 Nov 11 19:55 > flows/2015/11/11/15/01/20/.drill.parquet_metadata > > > Then I tried to CTAS and it works, but apparently impersonation does not: > > 0: jdbc:drill:> create table dfs.tmp.flows2 as select * from > dfs.`/tmp/flows`; > +-----------+----------------------------+ > | Fragment | Number of records written | > +-----------+----------------------------+ > | 1_1 | 81222 | > | 1_3 | 78255 | > | 1_0 | 113624 | > | 1_2 | 97179 | > +-----------+----------------------------+ > 4 rows selected (22.591 seconds) > 0: jdbc:drill:> refresh table metadata dfs.tmp.flows2; > +-------+--------------------------------------------------+ > | ok | summary | > +-------+--------------------------------------------------+ > | true | Successfully updated metadata for table flows2. | > +-------+--------------------------------------------------+ > 1 row selected (0.13 seconds) > > $ ls -la flows2/ > total 3499 > drwxr-xr-x 2 ec2-user ec2-user 5 Nov 11 21:18 . > drwxrwxrwx 4 ec2-user ec2-user 2 Nov 11 21:18 .. > -rwxr-xr-x 1 ec2-user ec2-user 1068250 Nov 11 21:18 1_0_0.parquet > -rwxr-xr-x 1 ec2-user ec2-user 789341 Nov 11 21:18 1_1_0.parquet > -rwxr-xr-x 1 ec2-user ec2-user 952667 Nov 11 21:18 1_2_0.parquet > -rwxr-xr-x 1 ec2-user ec2-user 755805 Nov 11 21:18 1_3_0.parquet > -rwxr-xr-x 1 mapr mapr 14033 Nov 11 21:18 .drill.parquet_metadata > > > Looks like a bug to me. Impersonation doesn't seem to be in force for > REFRESH TABLE METADATA. > > > On Wed, Nov 11, 2015 at 4:09 PM, John Omernik <j...@omernik.com> wrote: > > > I turned on MapR Auditing (This is a handy feature) and found that when I > > run a query (that is giving me access denied.. my query is select * from > > table limit 1) Per MapR the user I am logged in as (mapradm) is trying to > > do a create operation on the .drill.parquet_metadata operation and I > > guessing it's failing with status: 17 (Not sure what this means, > successes > > appear to be "0". What was intersting was the "CREATE" being attempted > > three times. Any thoughts on why a select * from tables limit 1 would > try > > to initiate a create operation on the .drill.parquet_metadata file? > > > > On Wed, Nov 11, 2015 at 2:25 PM, John Omernik <j...@omernik.com> wrote: > > > > > I take it back. > > > > > > I went to run a query, in the same session that had worked, and now I > am > > > getting permission denied. > > > > > > I do have a query running created new directories every 5 minutes, > > > however, these aren't the directories that are giving me permission > > denied. > > > Did you try running an aggregate query accross all data? This is a > > > interesting one to track down, not sure why I am getting the access > > denied > > > now, > > > > > > the .drill.parquet_metadata file in the directory that I am getting the > > > error on is owned by mapr:mapr and has rwxr-xr-x permissions. This > tells > > > me that both the user of the drillbits (mapr) and the user I am logged > > into > > > in sqlline (mapradm) should be able to read the file... so why do I get > > an > > > access denied in running a query. I any assistance would be valuable > here > > > in that there are some great performance increases with the metadata > > > caching, and I don't want to miss out on that. > > > > > > On Wed, Nov 11, 2015 at 2:18 PM, John Omernik <j...@omernik.com> > wrote: > > > > > >> All files are owned by mapr:mapr? > > >> > > >> I have a setup where mapr is the user running the drillbit, but then I > > >> have a directory that is owned by a another user. mapradm:mapradm on > all > > >> files. (Permissions on directories and files appears to be rwxr-x-r-x) > > When > > >> I run the REFRESH TABLE metatdata the .drill.parquet_metadata file > gets > > >> created as mapr:mapr with rwxr-xr-x. > > >> > > >> So > > >> Drillbit User:mapr > > >> Directory (and subdirectories/files) owner: mapradm:mapradm > > >> Directory permissions (all files and folder under main directory) > > >> rwxr-x-r-x > > >> > > >> I authenticated to drill via sqlline as user mapradm (this user should > > be > > >> able to read and write just fine to all directories). > > >> > > >> Now, one thing I did notice is my mapr user was not in the mapradm > > group, > > >> therefore, didn't have write permissions anywhere... when I fixed that > > on > > >> all nodes, and then I manually deleted the metadatafiles, things seem > > to be > > >> working. I wonder if that was my issue? > > >> > > >> Basically, the user running the drillbits need to be able to write > files > > >> (the .drill.parquet_metadata) or something bad will happen :) I will > do > > >> more testing. This may be a good candidate for some documentation work > > to > > >> understand what permissions are required to be able to query these. > > >> > > >> > > >> > > >> > > >> On Wed, Nov 11, 2015 at 1:36 PM, Vince Gonzalez < > > vince.gonza...@gmail.com > > >> > wrote: > > >> > > >>> Hi John, I tried this and didn't find any issues. Let me know if I > > didn't > > >>> follow your reproduction faithfully. > > >>> > > >>> $ sqlline -u jdbc:drill: -n ec2-user -p mapr > > >>> apache drill 1.2.0 > > >>> "drill baby drill" > > >>> 0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`; > > >>> +-------+------------------------------------------------------+ > > >>> | ok | summary | > > >>> +-------+------------------------------------------------------+ > > >>> | true | Successfully updated metadata for table /tmp/flows. | > > >>> +-------+------------------------------------------------------+ > > >>> 1 row selected (32.27 seconds) > > >>> 0: jdbc:drill:> select srcIP,dstIP from dfs.`/tmp/flows` limit 12; > > >>> +---------------+---------------+ > > >>> | srcIP | dstIP | > > >>> +---------------+---------------+ > > >>> | 172.16.2.152 | 172.16.1.58 | > > >>> | 172.16.1.58 | 172.16.2.152 | > > >>> | 172.16.2.152 | 172.16.2.73 | > > >>> | 172.16.2.152 | 172.16.2.73 | > > >>> | 172.16.2.73 | 172.16.2.152 | > > >>> | 172.16.2.152 | 172.16.2.73 | > > >>> | 172.16.2.152 | 172.16.2.73 | > > >>> | 172.16.2.152 | 172.16.2.73 | > > >>> | 172.16.2.73 | 172.16.2.152 | > > >>> | 172.16.2.73 | 172.16.2.152 | > > >>> | 172.16.2.73 | 172.16.2.152 | > > >>> | 172.16.2.152 | 172.16.2.73 | > > >>> +---------------+---------------+ > > >>> 12 rows selected (5.654 seconds) > > >>> > > >>> And here's what my table structure looks like (as seen via MapR NFS): > > >>> > > >>> $ tree /mapr/vgonzalez.drill/tmp/flows/ | head -15 > > >>> /mapr/vgonzalez.drill/tmp/flows/ > > >>> └── 2015 > > >>> └── 11 > > >>> ├── 10 > > >>> │ ├── 21 > > >>> │ │ ├── 39 > > >>> │ │ │ ├── 03 > > >>> │ │ │ │ ├── _common_metadata > > >>> │ │ │ │ ├── _metadata > > >>> │ │ │ │ ├── > > >>> part-r-00000-853882bd-66d8-4505-96ba-f0a282e374de.gz.parquet > > >>> │ │ │ │ └── _SUCCESS > > >>> │ │ │ └── 20 > > >>> │ │ │ ├── _common_metadata > > >>> │ │ │ ├── _metadata > > >>> │ │ │ ├── > > >>> part-r-00000-37a94549-8e56-46d5-be88-cb28e6d8bc35.gz.parquet > > >>> > > >>> My parquet was created in Spark, not Drill. Not sure if that's > > relevant. > > >>> > > >>> I have authentication and impersonation turned on, and the files are > > >>> owned > > >>> by mapr:mapr. Here's my drill-override.conf: > > >>> > > >>> drill.exec: { > > >>> cluster-id: "vgonzalez_drill-drillbits", > > >>> zk.connect: > > >>> > > >>> > > > "ip-172-16-2-36.ec2.internal:5181,ip-172-16-2-37.ec2.internal:5181,ip-172-16-2-38.ec2.internal:5181" > > >>> } > > >>> drill.exec.impersonation: { enabled: true, max_chained_user_hops: 3 } > > >>> drill.exec { security.user.auth { enabled: true, packages += > > >>> "org.apache.drill.exec.rpc.user.security", impl: "pam", > pam_profiles: [ > > >>> "login","sudo","sshd","password-auth" ] } } > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> On Tue, Nov 10, 2015 at 1:17 PM, John Omernik <j...@omernik.com> > > wrote: > > >>> > > >>> > Cool, looking forward to it. > > >>> > > > >>> > On Mon, Nov 9, 2015 at 7:21 PM, Vince Gonzalez < > > >>> vince.gonza...@gmail.com> > > >>> > wrote: > > >>> > > > >>> > > Hey John, I have a secure cluster and some parquet files, I'll > try > > >>> this > > >>> > out > > >>> > > and report back. > > >>> > > > > >>> > > On Monday, November 9, 2015, John Omernik <j...@omernik.com> > > wrote: > > >>> > > > > >>> > > > Has anyone been able to try/test this? I am curious if it's me > > only > > >>> > issue > > >>> > > > or something more of bug so I can open a JIRA if needed. > > >>> > > > > > >>> > > > John > > >>> > > > > > >>> > > > On Fri, Nov 6, 2015 at 11:06 AM, John Omernik < > j...@omernik.com > > >>> > > > <javascript:;>> wrote: > > >>> > > > > > >>> > > > > If someone has authorization/authentication setup, to > > reproduce: > > >>> > > > > > > >>> > > > > Have a Parquet table with directories underneath the main (I > > have > > >>> > > > > directories per day) > > >>> > > > > > > >>> > > > > Then issue REFRESH TABLE METADATA on the root of the table > > >>> running an > > >>> > > > > authenticated user other than the drill bit user. (I am using > > >>> mapr, I > > >>> > > > used > > >>> > > > > my user to run the query, and yes I have access to the data) > > >>> > > > > > > >>> > > > > Then run a normal query and see what the result is. . > > >>> > > > > > > >>> > > > > John > > >>> > > > > > > >>> > > > > On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala < > > >>> > > > > nrentachint...@maprtech.com <javascript:;>> wrote: > > >>> > > > > > > >>> > > > >> This doesn't make sense and seems like a bug. > > >>> > > > >> I think the right behavior is for the Drillbit to access the > > >>> cache > > >>> > as > > >>> > > > >> Drillbit user at the query time (there is no user level > > metadata > > >>> > cache > > >>> > > > in > > >>> > > > >> Drill at this point). > > >>> > > > >> > > >>> > > > >> > > >>> > > > >> > > >>> > > > >> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik < > > j...@omernik.com > > >>> > > > <javascript:;>> wrote: > > >>> > > > >> > > >>> > > > >> > I ran REFRESH TABLE METADATA on a table, it completed > > >>> > successfully. > > >>> > > > >> > > > >>> > > > >> > When I tried a subsequent query, I get a IOException: > > >>> Permission > > >>> > > > Denied > > >>> > > > >> on > > >>> > > > >> > .drill.parquet_metadata. > > >>> > > > >> > > > >>> > > > >> > I am running drill with authentication. I ran the REFRESH > > >>> TABLE > > >>> > > > >> METADATA > > >>> > > > >> > as user X, it appears the .drill.parquet_metadata was > > created > > >>> and > > >>> > > > owned > > >>> > > > >> by > > >>> > > > >> > the user the drill bits are running as as is created with > > >>> > > -rwxr-x-r-x > > >>> > > > >> > > > >>> > > > >> > My question is this: So, I can see why the file is owned > by > > >>> the > > >>> > > drill > > >>> > > > >> bit > > >>> > > > >> > user, and the file is created with all can read > permissions, > > >>> but > > >>> > why > > >>> > > > am > > >>> > > > >> I > > >>> > > > >> > getting a permission denied when user X is trying to run a > > >>> query? > > >>> > > > >> > > > >>> > > > >> > > >>> > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >> > > >> > > > > > >