For me it's very strange. If I delete all the .drill.parquet_metadata
files, I can create and then run a query.  I can wait 5 minutes, and come
back and run the same query, and then I get the permission denied, if I try
to run the REFRESH METADATA again, then it too fails with permission denied
until I erase all the files.

What is strange here is the .drill.parquet_metadata file is owned by the
drillbit user, and has rwxr-xr-x.  Thus, based on those permissions, the
nondrillbit user STILL should be able to read the file with no issues.
 (This is not something that your last bullet describes, instead it's
restricting others from writing, not reading)

In addition, when I try to run the query, it appears that the non-drillbit
user is trying to issue a file create, and per Keys, it's already there
(and they don't have permissions to write).

There are a number of things that are not happening correctly then based on
your understanding/description of what's happening

1. The file that is created is not limited in reading to the drillbit user
2. When a query is run, the file is not accessed by the drillbit user, it's
not even accessed by the authenticated user, instead the authenticated user
tries to overwrite the file (which makes very little sense to me on a
select query)

The only thing that is (apparently) happening correctly is the initial
REFRESH command is creating the files as the drillbit user, however,
subsequent operations don't seem to be working right... so I am not sure if
that is a 3rd bullet in the "things that appear broken" list.

Using the Drill Audit logs was very helpful here, if there is anything else
I can do to help test/troubleshoot this, let me know.




On Wed, Nov 11, 2015 at 4:54 PM, Vince Gonzalez <vince.gonza...@gmail.com>
wrote:

> Ok, I'm seeing the behavior you describe except for the last bullet - the
> permissions on the file would allow for anyone to read the cache file.
>
> $ ls -la
> total 3499
> drwxr-xr-x 2 ec2-user ec2-user       5 Nov 11 21:18 .
> drwxrwxrwx 4 ec2-user ec2-user       2 Nov 11 21:18 ..
> -rwxr-xr-x 1 ec2-user ec2-user 1068250 Nov 11 21:18 1_0_0.parquet
> -rwxr-xr-x 1 ec2-user ec2-user  789341 Nov 11 21:18 1_1_0.parquet
> -rwxr-xr-x 1 ec2-user ec2-user  952667 Nov 11 21:18 1_2_0.parquet
> -rwxr-xr-x 1 ec2-user ec2-user  755805 Nov 11 21:18 1_3_0.parquet
> *-rwxr-xr-x 1 mapr     mapr       14033 Nov 11 21:18
> .drill.parquet_metadata*
>
> On Wed, Nov 11, 2015 at 5:29 PM, Neeraja Rentachintala <
> nrentachint...@maprtech.com> wrote:
>
> > John, Vince
> > I am little confused by this email thread.
> > From the original description by John, I thought that the issue refresh
> > metadata command is running successfully (and the cache is created with
> the
> > Drillbit user as owner) , but at query time it fails for any user (even
> > though the user has permissions on the directory/dataset).
> >
> > Per the latest discussion, it seems like you are hitting permission
> denied
> > when running 'refresh metadata' command itself.
> >
> > Just wanted to share what I think the right behavior here is. Feel free
> to
> > comment.
> >
> > - When Refresh metadata command is run, the cache files get created with
> > drillbit user as the owner (irrespective of whoever is running the
> command
> > and impersonation is turned on)
> > - When a select query comes in on the table , the corresponding cache
> file
> > is always accessed as drillbit user (irrespective of whoever is running
> the
> > command and impersonation is turned on)
> > - The cache file created through refresh metadata command should restrict
> > access to any other users other than the drillbit user (so there is no
> > leakage of metadata for someone going to file system opening the file i.e
> > cache is Drill's internal planning purposes and not meant as user level
> > cache).
> >
> > If the above is not happening, it seems like a bug.
> >
> > thanks
> > Neeraja
> >
> > On Wed, Nov 11, 2015 at 2:07 PM, kbotzum <kbot...@maprtech.com> wrote:
> >
> > > MapR audit records print the errno value to indicate success/failure.
> > Thus
> > > status 17 means errno 17 which means EEXIST. Looks like Drill is trying
> > to
> > > create a file that already exists.
> > >
> > > I’ll defer to others as to why Drill might do that.
> > >
> > > Keys
> > > _______________________________
> > > Keys Botzum
> > > Senior Principal Technologist
> > > kbot...@mapr.com
> > > 443-718-0098
> > > MapR Technologies
> > > http://www.mapr.com
> > >
> > >
> > >
> > > On Nov 11, 2015, at 4:09 PM, John Omernik <j...@omernik.com> wrote:
> > >
> > > > I turned on MapR Auditing (This is a handy feature) and found that
> > when I
> > > > run a query (that is giving me access denied.. my query is select *
> > from
> > > > table limit 1) Per MapR the user I am logged in as (mapradm) is
> trying
> > to
> > > > do a create operation on the .drill.parquet_metadata operation and I
> > > > guessing it's failing with status: 17 (Not sure what this means,
> > > successes
> > > > appear to be "0".  What was intersting was the "CREATE" being
> attempted
> > > > three times.   Any thoughts on why a select * from tables limit 1
> would
> > > try
> > > > to initiate a create operation on the .drill.parquet_metadata file?
> > > >
> > > > On Wed, Nov 11, 2015 at 2:25 PM, John Omernik <j...@omernik.com>
> > wrote:
> > > >
> > > >> I take it back.
> > > >>
> > > >> I went to run a query, in the same session that had worked, and now
> I
> > am
> > > >> getting permission denied.
> > > >>
> > > >> I do have a query running created new directories every 5 minutes,
> > > >> however, these aren't the directories that are giving me permission
> > > denied.
> > > >>  Did you try running an aggregate query accross all data? This is a
> > > >> interesting one to track down, not sure why I am getting the access
> > > denied
> > > >> now,
> > > >>
> > > >> the .drill.parquet_metadata file in the directory that I am getting
> > the
> > > >> error on is owned by mapr:mapr and has rwxr-xr-x  permissions. This
> > > tells
> > > >> me that both the user of the drillbits (mapr) and the user I am
> logged
> > > into
> > > >> in sqlline (mapradm) should be able to read the file... so why do I
> > get
> > > an
> > > >> access denied in running a query. I any assistance would be valuable
> > > here
> > > >> in that there are some great performance increases with the metadata
> > > >> caching, and I don't want to miss out on that.
> > > >>
> > > >> On Wed, Nov 11, 2015 at 2:18 PM, John Omernik <j...@omernik.com>
> > wrote:
> > > >>
> > > >>> All files are owned by mapr:mapr?
> > > >>>
> > > >>> I have a setup where mapr is the user running the drillbit, but
> then
> > I
> > > >>> have a directory that is owned by a another user. mapradm:mapradm
> on
> > > all
> > > >>> files. (Permissions on directories and files appears to be
> > rwxr-x-r-x)
> > > When
> > > >>> I run the REFRESH TABLE metatdata the .drill.parquet_metadata file
> > gets
> > > >>> created as mapr:mapr with rwxr-xr-x.
> > > >>>
> > > >>> So
> > > >>> Drillbit User:mapr
> > > >>> Directory (and subdirectories/files) owner: mapradm:mapradm
> > > >>> Directory permissions (all files and folder under main directory)
> > > >>> rwxr-x-r-x
> > > >>>
> > > >>> I authenticated to drill via sqlline as user mapradm (this user
> > should
> > > be
> > > >>> able to read and write just fine to all directories).
> > > >>>
> > > >>> Now, one thing I did notice is my mapr user was not in the mapradm
> > > group,
> > > >>> therefore, didn't have write permissions anywhere... when I fixed
> > that
> > > on
> > > >>> all nodes, and then I manually deleted the metadatafiles, things
> seem
> > > to be
> > > >>> working. I wonder if that was my issue?
> > > >>>
> > > >>> Basically, the user running the drillbits need to be able to write
> > > files
> > > >>> (the .drill.parquet_metadata)  or something bad will happen :) I
> will
> > > do
> > > >>> more testing. This may be a good candidate for some documentation
> > work
> > > to
> > > >>> understand what permissions are required to be able to query these.
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Wed, Nov 11, 2015 at 1:36 PM, Vince Gonzalez <
> > > vince.gonza...@gmail.com
> > > >>>> wrote:
> > > >>>
> > > >>>> Hi John, I tried this and didn't find any issues. Let me know if I
> > > didn't
> > > >>>> follow your reproduction faithfully.
> > > >>>>
> > > >>>> $ sqlline -u jdbc:drill: -n ec2-user -p mapr
> > > >>>> apache drill 1.2.0
> > > >>>> "drill baby drill"
> > > >>>> 0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`;
> > > >>>> +-------+------------------------------------------------------+
> > > >>>> |  ok   |                       summary                        |
> > > >>>> +-------+------------------------------------------------------+
> > > >>>> | true  | Successfully updated metadata for table /tmp/flows.  |
> > > >>>> +-------+------------------------------------------------------+
> > > >>>> 1 row selected (32.27 seconds)
> > > >>>> 0: jdbc:drill:> select srcIP,dstIP from dfs.`/tmp/flows` limit 12;
> > > >>>> +---------------+---------------+
> > > >>>> |     srcIP     |     dstIP     |
> > > >>>> +---------------+---------------+
> > > >>>> | 172.16.2.152  | 172.16.1.58   |
> > > >>>> | 172.16.1.58   | 172.16.2.152  |
> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> > > >>>> | 172.16.2.73   | 172.16.2.152  |
> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> > > >>>> | 172.16.2.73   | 172.16.2.152  |
> > > >>>> | 172.16.2.73   | 172.16.2.152  |
> > > >>>> | 172.16.2.73   | 172.16.2.152  |
> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> > > >>>> +---------------+---------------+
> > > >>>> 12 rows selected (5.654 seconds)
> > > >>>>
> > > >>>> And here's what my table structure looks like (as seen via MapR
> > NFS):
> > > >>>>
> > > >>>> $ tree /mapr/vgonzalez.drill/tmp/flows/ | head -15
> > > >>>> /mapr/vgonzalez.drill/tmp/flows/
> > > >>>> └── 2015
> > > >>>>    └── 11
> > > >>>>        ├── 10
> > > >>>>        │   ├── 21
> > > >>>>        │   │   ├── 39
> > > >>>>        │   │   │   ├── 03
> > > >>>>        │   │   │   │   ├── _common_metadata
> > > >>>>        │   │   │   │   ├── _metadata
> > > >>>>        │   │   │   │   ├──
> > > >>>> part-r-00000-853882bd-66d8-4505-96ba-f0a282e374de.gz.parquet
> > > >>>>        │   │   │   │   └── _SUCCESS
> > > >>>>        │   │   │   └── 20
> > > >>>>        │   │   │       ├── _common_metadata
> > > >>>>        │   │   │       ├── _metadata
> > > >>>>        │   │   │       ├──
> > > >>>> part-r-00000-37a94549-8e56-46d5-be88-cb28e6d8bc35.gz.parquet
> > > >>>>
> > > >>>> My parquet was created in Spark, not Drill. Not sure if that's
> > > relevant.
> > > >>>>
> > > >>>> I have authentication and impersonation turned on, and the files
> are
> > > >>>> owned
> > > >>>> by mapr:mapr. Here's my drill-override.conf:
> > > >>>>
> > > >>>> drill.exec: {
> > > >>>>  cluster-id: "vgonzalez_drill-drillbits",
> > > >>>> zk.connect:
> > > >>>>
> > > >>>>
> > >
> >
> "ip-172-16-2-36.ec2.internal:5181,ip-172-16-2-37.ec2.internal:5181,ip-172-16-2-38.ec2.internal:5181"
> > > >>>> }
> > > >>>> drill.exec.impersonation: { enabled: true, max_chained_user_hops:
> 3
> > }
> > > >>>> drill.exec { security.user.auth { enabled: true, packages +=
> > > >>>> "org.apache.drill.exec.rpc.user.security", impl: "pam",
> > pam_profiles:
> > > [
> > > >>>> "login","sudo","sshd","password-auth" ] } }
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Tue, Nov 10, 2015 at 1:17 PM, John Omernik <j...@omernik.com>
> > > wrote:
> > > >>>>
> > > >>>>> Cool, looking forward to it.
> > > >>>>>
> > > >>>>> On Mon, Nov 9, 2015 at 7:21 PM, Vince Gonzalez <
> > > >>>> vince.gonza...@gmail.com>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Hey John, I have a secure cluster and some parquet files, I'll
> try
> > > >>>> this
> > > >>>>> out
> > > >>>>>> and report back.
> > > >>>>>>
> > > >>>>>> On Monday, November 9, 2015, John Omernik <j...@omernik.com>
> > wrote:
> > > >>>>>>
> > > >>>>>>> Has anyone been able to try/test this? I am curious if it's me
> > only
> > > >>>>> issue
> > > >>>>>>> or something more of bug so I can open a JIRA if needed.
> > > >>>>>>>
> > > >>>>>>> John
> > > >>>>>>>
> > > >>>>>>> On Fri, Nov 6, 2015 at 11:06 AM, John Omernik <
> j...@omernik.com
> > > >>>>>>> <javascript:;>> wrote:
> > > >>>>>>>
> > > >>>>>>>> If someone has authorization/authentication setup, to
> reproduce:
> > > >>>>>>>>
> > > >>>>>>>> Have a Parquet table with directories underneath the main (I
> > have
> > > >>>>>>>> directories per day)
> > > >>>>>>>>
> > > >>>>>>>> Then issue REFRESH TABLE METADATA on the root of the table
> > > >>>> running an
> > > >>>>>>>> authenticated user other than the drill bit user. (I am using
> > > >>>> mapr, I
> > > >>>>>>> used
> > > >>>>>>>> my user to run the query, and yes I have access to the data)
> > > >>>>>>>>
> > > >>>>>>>> Then run a normal query and see what the result is. .
> > > >>>>>>>>
> > > >>>>>>>> John
> > > >>>>>>>>
> > > >>>>>>>> On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala <
> > > >>>>>>>> nrentachint...@maprtech.com <javascript:;>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> This doesn't make sense and seems like a bug.
> > > >>>>>>>>> I think the right behavior is for the Drillbit to access the
> > > >>>> cache
> > > >>>>> as
> > > >>>>>>>>> Drillbit user at the query time (there is no user level
> > metadata
> > > >>>>> cache
> > > >>>>>>> in
> > > >>>>>>>>> Drill at this point).
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik <
> j...@omernik.com
> > > >>>>>>> <javascript:;>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> I ran REFRESH TABLE METADATA on a table, it completed
> > > >>>>> successfully.
> > > >>>>>>>>>>
> > > >>>>>>>>>> When I tried a subsequent query, I get a IOException:
> > > >>>> Permission
> > > >>>>>>> Denied
> > > >>>>>>>>> on
> > > >>>>>>>>>> .drill.parquet_metadata.
> > > >>>>>>>>>>
> > > >>>>>>>>>> I am running drill with authentication.  I ran the REFRESH
> > > >>>> TABLE
> > > >>>>>>>>> METADATA
> > > >>>>>>>>>> as user X, it appears the .drill.parquet_metadata was
> created
> > > >>>> and
> > > >>>>>>> owned
> > > >>>>>>>>> by
> > > >>>>>>>>>> the user the drill bits are running as as is created with
> > > >>>>>> -rwxr-x-r-x
> > > >>>>>>>>>>
> > > >>>>>>>>>> My question is this: So, I can see why the file is owned by
> > > >>>> the
> > > >>>>>> drill
> > > >>>>>>>>> bit
> > > >>>>>>>>>> user, and the file is created with all can read permissions,
> > > >>>> but
> > > >>>>> why
> > > >>>>>>> am
> > > >>>>>>>>> I
> > > >>>>>>>>>> getting a permission denied when user X is trying to run a
> > > >>>> query?
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>>
> > > >>
> > >
> > >
> >
>

Reply via email to