John, Vince
I am little confused by this email thread.
>From the original description by John, I thought that the issue refresh
metadata command is running successfully (and the cache is created with the
Drillbit user as owner) , but at query time it fails for any user (even
though the user has permissions on the directory/dataset).

Per the latest discussion, it seems like you are hitting permission denied
when running 'refresh metadata' command itself.

Just wanted to share what I think the right behavior here is. Feel free to
comment.

- When Refresh metadata command is run, the cache files get created with
drillbit user as the owner (irrespective of whoever is running the command
and impersonation is turned on)
- When a select query comes in on the table , the corresponding cache file
is always accessed as drillbit user (irrespective of whoever is running the
command and impersonation is turned on)
- The cache file created through refresh metadata command should restrict
access to any other users other than the drillbit user (so there is no
leakage of metadata for someone going to file system opening the file i.e
cache is Drill's internal planning purposes and not meant as user level
cache).

If the above is not happening, it seems like a bug.

thanks
Neeraja

On Wed, Nov 11, 2015 at 2:07 PM, kbotzum <[email protected]> wrote:

> MapR audit records print the errno value to indicate success/failure. Thus
> status 17 means errno 17 which means EEXIST. Looks like Drill is trying to
> create a file that already exists.
>
> I’ll defer to others as to why Drill might do that.
>
> Keys
> _______________________________
> Keys Botzum
> Senior Principal Technologist
> [email protected]
> 443-718-0098
> MapR Technologies
> http://www.mapr.com
>
>
>
> On Nov 11, 2015, at 4:09 PM, John Omernik <[email protected]> wrote:
>
> > I turned on MapR Auditing (This is a handy feature) and found that when I
> > run a query (that is giving me access denied.. my query is select * from
> > table limit 1) Per MapR the user I am logged in as (mapradm) is trying to
> > do a create operation on the .drill.parquet_metadata operation and I
> > guessing it's failing with status: 17 (Not sure what this means,
> successes
> > appear to be "0".  What was intersting was the "CREATE" being attempted
> > three times.   Any thoughts on why a select * from tables limit 1 would
> try
> > to initiate a create operation on the .drill.parquet_metadata file?
> >
> > On Wed, Nov 11, 2015 at 2:25 PM, John Omernik <[email protected]> wrote:
> >
> >> I take it back.
> >>
> >> I went to run a query, in the same session that had worked, and now I am
> >> getting permission denied.
> >>
> >> I do have a query running created new directories every 5 minutes,
> >> however, these aren't the directories that are giving me permission
> denied.
> >>  Did you try running an aggregate query accross all data? This is a
> >> interesting one to track down, not sure why I am getting the access
> denied
> >> now,
> >>
> >> the .drill.parquet_metadata file in the directory that I am getting the
> >> error on is owned by mapr:mapr and has rwxr-xr-x  permissions. This
> tells
> >> me that both the user of the drillbits (mapr) and the user I am logged
> into
> >> in sqlline (mapradm) should be able to read the file... so why do I get
> an
> >> access denied in running a query. I any assistance would be valuable
> here
> >> in that there are some great performance increases with the metadata
> >> caching, and I don't want to miss out on that.
> >>
> >> On Wed, Nov 11, 2015 at 2:18 PM, John Omernik <[email protected]> wrote:
> >>
> >>> All files are owned by mapr:mapr?
> >>>
> >>> I have a setup where mapr is the user running the drillbit, but then I
> >>> have a directory that is owned by a another user. mapradm:mapradm on
> all
> >>> files. (Permissions on directories and files appears to be rwxr-x-r-x)
> When
> >>> I run the REFRESH TABLE metatdata the .drill.parquet_metadata file gets
> >>> created as mapr:mapr with rwxr-xr-x.
> >>>
> >>> So
> >>> Drillbit User:mapr
> >>> Directory (and subdirectories/files) owner: mapradm:mapradm
> >>> Directory permissions (all files and folder under main directory)
> >>> rwxr-x-r-x
> >>>
> >>> I authenticated to drill via sqlline as user mapradm (this user should
> be
> >>> able to read and write just fine to all directories).
> >>>
> >>> Now, one thing I did notice is my mapr user was not in the mapradm
> group,
> >>> therefore, didn't have write permissions anywhere... when I fixed that
> on
> >>> all nodes, and then I manually deleted the metadatafiles, things seem
> to be
> >>> working. I wonder if that was my issue?
> >>>
> >>> Basically, the user running the drillbits need to be able to write
> files
> >>> (the .drill.parquet_metadata)  or something bad will happen :) I will
> do
> >>> more testing. This may be a good candidate for some documentation work
> to
> >>> understand what permissions are required to be able to query these.
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, Nov 11, 2015 at 1:36 PM, Vince Gonzalez <
> [email protected]
> >>>> wrote:
> >>>
> >>>> Hi John, I tried this and didn't find any issues. Let me know if I
> didn't
> >>>> follow your reproduction faithfully.
> >>>>
> >>>> $ sqlline -u jdbc:drill: -n ec2-user -p mapr
> >>>> apache drill 1.2.0
> >>>> "drill baby drill"
> >>>> 0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`;
> >>>> +-------+------------------------------------------------------+
> >>>> |  ok   |                       summary                        |
> >>>> +-------+------------------------------------------------------+
> >>>> | true  | Successfully updated metadata for table /tmp/flows.  |
> >>>> +-------+------------------------------------------------------+
> >>>> 1 row selected (32.27 seconds)
> >>>> 0: jdbc:drill:> select srcIP,dstIP from dfs.`/tmp/flows` limit 12;
> >>>> +---------------+---------------+
> >>>> |     srcIP     |     dstIP     |
> >>>> +---------------+---------------+
> >>>> | 172.16.2.152  | 172.16.1.58   |
> >>>> | 172.16.1.58   | 172.16.2.152  |
> >>>> | 172.16.2.152  | 172.16.2.73   |
> >>>> | 172.16.2.152  | 172.16.2.73   |
> >>>> | 172.16.2.73   | 172.16.2.152  |
> >>>> | 172.16.2.152  | 172.16.2.73   |
> >>>> | 172.16.2.152  | 172.16.2.73   |
> >>>> | 172.16.2.152  | 172.16.2.73   |
> >>>> | 172.16.2.73   | 172.16.2.152  |
> >>>> | 172.16.2.73   | 172.16.2.152  |
> >>>> | 172.16.2.73   | 172.16.2.152  |
> >>>> | 172.16.2.152  | 172.16.2.73   |
> >>>> +---------------+---------------+
> >>>> 12 rows selected (5.654 seconds)
> >>>>
> >>>> And here's what my table structure looks like (as seen via MapR NFS):
> >>>>
> >>>> $ tree /mapr/vgonzalez.drill/tmp/flows/ | head -15
> >>>> /mapr/vgonzalez.drill/tmp/flows/
> >>>> └── 2015
> >>>>    └── 11
> >>>>        ├── 10
> >>>>        │   ├── 21
> >>>>        │   │   ├── 39
> >>>>        │   │   │   ├── 03
> >>>>        │   │   │   │   ├── _common_metadata
> >>>>        │   │   │   │   ├── _metadata
> >>>>        │   │   │   │   ├──
> >>>> part-r-00000-853882bd-66d8-4505-96ba-f0a282e374de.gz.parquet
> >>>>        │   │   │   │   └── _SUCCESS
> >>>>        │   │   │   └── 20
> >>>>        │   │   │       ├── _common_metadata
> >>>>        │   │   │       ├── _metadata
> >>>>        │   │   │       ├──
> >>>> part-r-00000-37a94549-8e56-46d5-be88-cb28e6d8bc35.gz.parquet
> >>>>
> >>>> My parquet was created in Spark, not Drill. Not sure if that's
> relevant.
> >>>>
> >>>> I have authentication and impersonation turned on, and the files are
> >>>> owned
> >>>> by mapr:mapr. Here's my drill-override.conf:
> >>>>
> >>>> drill.exec: {
> >>>>  cluster-id: "vgonzalez_drill-drillbits",
> >>>> zk.connect:
> >>>>
> >>>>
> "ip-172-16-2-36.ec2.internal:5181,ip-172-16-2-37.ec2.internal:5181,ip-172-16-2-38.ec2.internal:5181"
> >>>> }
> >>>> drill.exec.impersonation: { enabled: true, max_chained_user_hops: 3 }
> >>>> drill.exec { security.user.auth { enabled: true, packages +=
> >>>> "org.apache.drill.exec.rpc.user.security", impl: "pam", pam_profiles:
> [
> >>>> "login","sudo","sshd","password-auth" ] } }
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Nov 10, 2015 at 1:17 PM, John Omernik <[email protected]>
> wrote:
> >>>>
> >>>>> Cool, looking forward to it.
> >>>>>
> >>>>> On Mon, Nov 9, 2015 at 7:21 PM, Vince Gonzalez <
> >>>> [email protected]>
> >>>>> wrote:
> >>>>>
> >>>>>> Hey John, I have a secure cluster and some parquet files, I'll try
> >>>> this
> >>>>> out
> >>>>>> and report back.
> >>>>>>
> >>>>>> On Monday, November 9, 2015, John Omernik <[email protected]> wrote:
> >>>>>>
> >>>>>>> Has anyone been able to try/test this? I am curious if it's me only
> >>>>> issue
> >>>>>>> or something more of bug so I can open a JIRA if needed.
> >>>>>>>
> >>>>>>> John
> >>>>>>>
> >>>>>>> On Fri, Nov 6, 2015 at 11:06 AM, John Omernik <[email protected]
> >>>>>>> <javascript:;>> wrote:
> >>>>>>>
> >>>>>>>> If someone has authorization/authentication setup, to reproduce:
> >>>>>>>>
> >>>>>>>> Have a Parquet table with directories underneath the main (I have
> >>>>>>>> directories per day)
> >>>>>>>>
> >>>>>>>> Then issue REFRESH TABLE METADATA on the root of the table
> >>>> running an
> >>>>>>>> authenticated user other than the drill bit user. (I am using
> >>>> mapr, I
> >>>>>>> used
> >>>>>>>> my user to run the query, and yes I have access to the data)
> >>>>>>>>
> >>>>>>>> Then run a normal query and see what the result is. .
> >>>>>>>>
> >>>>>>>> John
> >>>>>>>>
> >>>>>>>> On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala <
> >>>>>>>> [email protected] <javascript:;>> wrote:
> >>>>>>>>
> >>>>>>>>> This doesn't make sense and seems like a bug.
> >>>>>>>>> I think the right behavior is for the Drillbit to access the
> >>>> cache
> >>>>> as
> >>>>>>>>> Drillbit user at the query time (there is no user level metadata
> >>>>> cache
> >>>>>>> in
> >>>>>>>>> Drill at this point).
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik <[email protected]
> >>>>>>> <javascript:;>> wrote:
> >>>>>>>>>
> >>>>>>>>>> I ran REFRESH TABLE METADATA on a table, it completed
> >>>>> successfully.
> >>>>>>>>>>
> >>>>>>>>>> When I tried a subsequent query, I get a IOException:
> >>>> Permission
> >>>>>>> Denied
> >>>>>>>>> on
> >>>>>>>>>> .drill.parquet_metadata.
> >>>>>>>>>>
> >>>>>>>>>> I am running drill with authentication.  I ran the REFRESH
> >>>> TABLE
> >>>>>>>>> METADATA
> >>>>>>>>>> as user X, it appears the .drill.parquet_metadata was created
> >>>> and
> >>>>>>> owned
> >>>>>>>>> by
> >>>>>>>>>> the user the drill bits are running as as is created with
> >>>>>> -rwxr-x-r-x
> >>>>>>>>>>
> >>>>>>>>>> My question is this: So, I can see why the file is owned by
> >>>> the
> >>>>>> drill
> >>>>>>>>> bit
> >>>>>>>>>> user, and the file is created with all can read permissions,
> >>>> but
> >>>>> why
> >>>>>>> am
> >>>>>>>>> I
> >>>>>>>>>> getting a permission denied when user X is trying to run a
> >>>> query?
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
>
>

Reply via email to