I turned on MapR Auditing (This is a handy feature) and found that when I
run a query (that is giving me access denied.. my query is select * from
table limit 1) Per MapR the user I am logged in as (mapradm) is trying to
do a create operation on the .drill.parquet_metadata operation and I
guessing it's failing with status: 17 (Not sure what this means, successes
appear to be "0".  What was intersting was the "CREATE" being attempted
three times.   Any thoughts on why a select * from tables limit 1 would try
to initiate a create operation on the .drill.parquet_metadata file?

On Wed, Nov 11, 2015 at 2:25 PM, John Omernik <[email protected]> wrote:

> I take it back.
>
> I went to run a query, in the same session that had worked, and now I am
> getting permission denied.
>
> I do have a query running created new directories every 5 minutes,
> however, these aren't the directories that are giving me permission denied.
>   Did you try running an aggregate query accross all data? This is a
> interesting one to track down, not sure why I am getting the access denied
> now,
>
> the .drill.parquet_metadata file in the directory that I am getting the
> error on is owned by mapr:mapr and has rwxr-xr-x  permissions. This tells
> me that both the user of the drillbits (mapr) and the user I am logged into
> in sqlline (mapradm) should be able to read the file... so why do I get an
> access denied in running a query. I any assistance would be valuable here
> in that there are some great performance increases with the metadata
> caching, and I don't want to miss out on that.
>
> On Wed, Nov 11, 2015 at 2:18 PM, John Omernik <[email protected]> wrote:
>
>> All files are owned by mapr:mapr?
>>
>> I have a setup where mapr is the user running the drillbit, but then I
>> have a directory that is owned by a another user. mapradm:mapradm on all
>> files. (Permissions on directories and files appears to be rwxr-x-r-x) When
>> I run the REFRESH TABLE metatdata the .drill.parquet_metadata file gets
>> created as mapr:mapr with rwxr-xr-x.
>>
>> So
>> Drillbit User:mapr
>> Directory (and subdirectories/files) owner: mapradm:mapradm
>> Directory permissions (all files and folder under main directory)
>> rwxr-x-r-x
>>
>> I authenticated to drill via sqlline as user mapradm (this user should be
>> able to read and write just fine to all directories).
>>
>> Now, one thing I did notice is my mapr user was not in the mapradm group,
>> therefore, didn't have write permissions anywhere... when I fixed that on
>> all nodes, and then I manually deleted the metadatafiles, things seem to be
>> working. I wonder if that was my issue?
>>
>> Basically, the user running the drillbits need to be able to write files
>> (the .drill.parquet_metadata)  or something bad will happen :) I will do
>> more testing. This may be a good candidate for some documentation work to
>> understand what permissions are required to be able to query these.
>>
>>
>>
>>
>> On Wed, Nov 11, 2015 at 1:36 PM, Vince Gonzalez <[email protected]
>> > wrote:
>>
>>> Hi John, I tried this and didn't find any issues. Let me know if I didn't
>>> follow your reproduction faithfully.
>>>
>>> $ sqlline -u jdbc:drill: -n ec2-user -p mapr
>>> apache drill 1.2.0
>>> "drill baby drill"
>>> 0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`;
>>> +-------+------------------------------------------------------+
>>> |  ok   |                       summary                        |
>>> +-------+------------------------------------------------------+
>>> | true  | Successfully updated metadata for table /tmp/flows.  |
>>> +-------+------------------------------------------------------+
>>> 1 row selected (32.27 seconds)
>>> 0: jdbc:drill:> select srcIP,dstIP from dfs.`/tmp/flows` limit 12;
>>> +---------------+---------------+
>>> |     srcIP     |     dstIP     |
>>> +---------------+---------------+
>>> | 172.16.2.152  | 172.16.1.58   |
>>> | 172.16.1.58   | 172.16.2.152  |
>>> | 172.16.2.152  | 172.16.2.73   |
>>> | 172.16.2.152  | 172.16.2.73   |
>>> | 172.16.2.73   | 172.16.2.152  |
>>> | 172.16.2.152  | 172.16.2.73   |
>>> | 172.16.2.152  | 172.16.2.73   |
>>> | 172.16.2.152  | 172.16.2.73   |
>>> | 172.16.2.73   | 172.16.2.152  |
>>> | 172.16.2.73   | 172.16.2.152  |
>>> | 172.16.2.73   | 172.16.2.152  |
>>> | 172.16.2.152  | 172.16.2.73   |
>>> +---------------+---------------+
>>> 12 rows selected (5.654 seconds)
>>>
>>> And here's what my table structure looks like (as seen via MapR NFS):
>>>
>>> $ tree /mapr/vgonzalez.drill/tmp/flows/ | head -15
>>> /mapr/vgonzalez.drill/tmp/flows/
>>> └── 2015
>>>     └── 11
>>>         ├── 10
>>>         │   ├── 21
>>>         │   │   ├── 39
>>>         │   │   │   ├── 03
>>>         │   │   │   │   ├── _common_metadata
>>>         │   │   │   │   ├── _metadata
>>>         │   │   │   │   ├──
>>> part-r-00000-853882bd-66d8-4505-96ba-f0a282e374de.gz.parquet
>>>         │   │   │   │   └── _SUCCESS
>>>         │   │   │   └── 20
>>>         │   │   │       ├── _common_metadata
>>>         │   │   │       ├── _metadata
>>>         │   │   │       ├──
>>> part-r-00000-37a94549-8e56-46d5-be88-cb28e6d8bc35.gz.parquet
>>>
>>> My parquet was created in Spark, not Drill. Not sure if that's relevant.
>>>
>>> I have authentication and impersonation turned on, and the files are
>>> owned
>>> by mapr:mapr. Here's my drill-override.conf:
>>>
>>> drill.exec: {
>>>   cluster-id: "vgonzalez_drill-drillbits",
>>> zk.connect:
>>>
>>> "ip-172-16-2-36.ec2.internal:5181,ip-172-16-2-37.ec2.internal:5181,ip-172-16-2-38.ec2.internal:5181"
>>> }
>>> drill.exec.impersonation: { enabled: true, max_chained_user_hops: 3 }
>>> drill.exec { security.user.auth { enabled: true, packages +=
>>> "org.apache.drill.exec.rpc.user.security", impl: "pam", pam_profiles: [
>>> "login","sudo","sshd","password-auth" ] } }
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Nov 10, 2015 at 1:17 PM, John Omernik <[email protected]> wrote:
>>>
>>> > Cool, looking forward to it.
>>> >
>>> > On Mon, Nov 9, 2015 at 7:21 PM, Vince Gonzalez <
>>> [email protected]>
>>> > wrote:
>>> >
>>> > > Hey John, I have a secure cluster and some parquet files, I'll try
>>> this
>>> > out
>>> > > and report back.
>>> > >
>>> > > On Monday, November 9, 2015, John Omernik <[email protected]> wrote:
>>> > >
>>> > > > Has anyone been able to try/test this? I am curious if it's me only
>>> > issue
>>> > > > or something more of bug so I can open a JIRA if needed.
>>> > > >
>>> > > > John
>>> > > >
>>> > > > On Fri, Nov 6, 2015 at 11:06 AM, John Omernik <[email protected]
>>> > > > <javascript:;>> wrote:
>>> > > >
>>> > > > > If someone has authorization/authentication setup, to reproduce:
>>> > > > >
>>> > > > > Have a Parquet table with directories underneath the main (I have
>>> > > > > directories per day)
>>> > > > >
>>> > > > > Then issue REFRESH TABLE METADATA on the root of the table
>>> running an
>>> > > > > authenticated user other than the drill bit user. (I am using
>>> mapr, I
>>> > > > used
>>> > > > > my user to run the query, and yes I have access to the data)
>>> > > > >
>>> > > > > Then run a normal query and see what the result is. .
>>> > > > >
>>> > > > > John
>>> > > > >
>>> > > > > On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala <
>>> > > > > [email protected] <javascript:;>> wrote:
>>> > > > >
>>> > > > >> This doesn't make sense and seems like a bug.
>>> > > > >> I think the right behavior is for the Drillbit to access the
>>> cache
>>> > as
>>> > > > >> Drillbit user at the query time (there is no user level metadata
>>> > cache
>>> > > > in
>>> > > > >> Drill at this point).
>>> > > > >>
>>> > > > >>
>>> > > > >>
>>> > > > >> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik <[email protected]
>>> > > > <javascript:;>> wrote:
>>> > > > >>
>>> > > > >> > I ran REFRESH TABLE METADATA on a table, it completed
>>> > successfully.
>>> > > > >> >
>>> > > > >> > When I tried a subsequent query, I get a IOException:
>>> Permission
>>> > > > Denied
>>> > > > >> on
>>> > > > >> > .drill.parquet_metadata.
>>> > > > >> >
>>> > > > >> > I am running drill with authentication.  I ran the REFRESH
>>> TABLE
>>> > > > >> METADATA
>>> > > > >> > as user X, it appears the .drill.parquet_metadata was created
>>> and
>>> > > > owned
>>> > > > >> by
>>> > > > >> > the user the drill bits are running as as is created with
>>> > > -rwxr-x-r-x
>>> > > > >> >
>>> > > > >> > My question is this: So, I can see why the file is owned by
>>> the
>>> > > drill
>>> > > > >> bit
>>> > > > >> > user, and the file is created with all can read permissions,
>>> but
>>> > why
>>> > > > am
>>> > > > >> I
>>> > > > >> > getting a permission denied when user X is trying to run a
>>> query?
>>> > > > >> >
>>> > > > >>
>>> > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Reply via email to