I've created https://issues.apache.org/jira/browse/DRILL-4143

The output from Drill and the Markup interpreter on Jira apparently had a
family argument at Thanksgiving, and don't agree on all things... Looking
at the JIRA, while it's not pretty, it still conveys what I am going for.
Please review, and see if I left anything out from this thread, I tried to
summarize and provide a reproduction plan.

On Thu, Nov 26, 2015 at 11:04 PM, Jacques Nadeau <jacq...@dremio.com> wrote:

> Yes, please do.
> On Nov 25, 2015 7:07 AM, "John Omernik" <j...@omernik.com> wrote:
>
> > Should we do a JIRA on this? It seems important...
> >
> > On Wed, Nov 11, 2015 at 5:15 PM, John Omernik <j...@omernik.com> wrote:
> >
> > > For me it's very strange. If I delete all the .drill.parquet_metadata
> > > files, I can create and then run a query.  I can wait 5 minutes, and
> come
> > > back and run the same query, and then I get the permission denied, if I
> > try
> > > to run the REFRESH METADATA again, then it too fails with permission
> > denied
> > > until I erase all the files.
> > >
> > > What is strange here is the .drill.parquet_metadata file is owned by
> the
> > > drillbit user, and has rwxr-xr-x.  Thus, based on those permissions,
> the
> > > nondrillbit user STILL should be able to read the file with no issues.
> > >  (This is not something that your last bullet describes, instead it's
> > > restricting others from writing, not reading)
> > >
> > > In addition, when I try to run the query, it appears that the
> > non-drillbit
> > > user is trying to issue a file create, and per Keys, it's already there
> > > (and they don't have permissions to write).
> > >
> > > There are a number of things that are not happening correctly then
> based
> > > on your understanding/description of what's happening
> > >
> > > 1. The file that is created is not limited in reading to the drillbit
> > user
> > > 2. When a query is run, the file is not accessed by the drillbit user,
> > > it's not even accessed by the authenticated user, instead the
> > authenticated
> > > user tries to overwrite the file (which makes very little sense to me
> on
> > a
> > > select query)
> > >
> > > The only thing that is (apparently) happening correctly is the initial
> > > REFRESH command is creating the files as the drillbit user, however,
> > > subsequent operations don't seem to be working right... so I am not
> sure
> > if
> > > that is a 3rd bullet in the "things that appear broken" list.
> > >
> > > Using the Drill Audit logs was very helpful here, if there is anything
> > > else I can do to help test/troubleshoot this, let me know.
> > >
> > >
> > >
> > >
> > > On Wed, Nov 11, 2015 at 4:54 PM, Vince Gonzalez <
> > vince.gonza...@gmail.com>
> > > wrote:
> > >
> > >> Ok, I'm seeing the behavior you describe except for the last bullet -
> > the
> > >> permissions on the file would allow for anyone to read the cache file.
> > >>
> > >> $ ls -la
> > >> total 3499
> > >> drwxr-xr-x 2 ec2-user ec2-user       5 Nov 11 21:18 .
> > >> drwxrwxrwx 4 ec2-user ec2-user       2 Nov 11 21:18 ..
> > >> -rwxr-xr-x 1 ec2-user ec2-user 1068250 Nov 11 21:18 1_0_0.parquet
> > >> -rwxr-xr-x 1 ec2-user ec2-user  789341 Nov 11 21:18 1_1_0.parquet
> > >> -rwxr-xr-x 1 ec2-user ec2-user  952667 Nov 11 21:18 1_2_0.parquet
> > >> -rwxr-xr-x 1 ec2-user ec2-user  755805 Nov 11 21:18 1_3_0.parquet
> > >> *-rwxr-xr-x 1 mapr     mapr       14033 Nov 11 21:18
> > >> .drill.parquet_metadata*
> > >>
> > >> On Wed, Nov 11, 2015 at 5:29 PM, Neeraja Rentachintala <
> > >> nrentachint...@maprtech.com> wrote:
> > >>
> > >> > John, Vince
> > >> > I am little confused by this email thread.
> > >> > From the original description by John, I thought that the issue
> > refresh
> > >> > metadata command is running successfully (and the cache is created
> > with
> > >> the
> > >> > Drillbit user as owner) , but at query time it fails for any user
> > (even
> > >> > though the user has permissions on the directory/dataset).
> > >> >
> > >> > Per the latest discussion, it seems like you are hitting permission
> > >> denied
> > >> > when running 'refresh metadata' command itself.
> > >> >
> > >> > Just wanted to share what I think the right behavior here is. Feel
> > free
> > >> to
> > >> > comment.
> > >> >
> > >> > - When Refresh metadata command is run, the cache files get created
> > with
> > >> > drillbit user as the owner (irrespective of whoever is running the
> > >> command
> > >> > and impersonation is turned on)
> > >> > - When a select query comes in on the table , the corresponding
> cache
> > >> file
> > >> > is always accessed as drillbit user (irrespective of whoever is
> > running
> > >> the
> > >> > command and impersonation is turned on)
> > >> > - The cache file created through refresh metadata command should
> > >> restrict
> > >> > access to any other users other than the drillbit user (so there is
> no
> > >> > leakage of metadata for someone going to file system opening the
> file
> > >> i.e
> > >> > cache is Drill's internal planning purposes and not meant as user
> > level
> > >> > cache).
> > >> >
> > >> > If the above is not happening, it seems like a bug.
> > >> >
> > >> > thanks
> > >> > Neeraja
> > >> >
> > >> > On Wed, Nov 11, 2015 at 2:07 PM, kbotzum <kbot...@maprtech.com>
> > wrote:
> > >> >
> > >> > > MapR audit records print the errno value to indicate
> > success/failure.
> > >> > Thus
> > >> > > status 17 means errno 17 which means EEXIST. Looks like Drill is
> > >> trying
> > >> > to
> > >> > > create a file that already exists.
> > >> > >
> > >> > > I’ll defer to others as to why Drill might do that.
> > >> > >
> > >> > > Keys
> > >> > > _______________________________
> > >> > > Keys Botzum
> > >> > > Senior Principal Technologist
> > >> > > kbot...@mapr.com
> > >> > > 443-718-0098
> > >> > > MapR Technologies
> > >> > > http://www.mapr.com
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Nov 11, 2015, at 4:09 PM, John Omernik <j...@omernik.com>
> wrote:
> > >> > >
> > >> > > > I turned on MapR Auditing (This is a handy feature) and found
> that
> > >> > when I
> > >> > > > run a query (that is giving me access denied.. my query is
> select
> > *
> > >> > from
> > >> > > > table limit 1) Per MapR the user I am logged in as (mapradm) is
> > >> trying
> > >> > to
> > >> > > > do a create operation on the .drill.parquet_metadata operation
> > and I
> > >> > > > guessing it's failing with status: 17 (Not sure what this means,
> > >> > > successes
> > >> > > > appear to be "0".  What was intersting was the "CREATE" being
> > >> attempted
> > >> > > > three times.   Any thoughts on why a select * from tables limit
> 1
> > >> would
> > >> > > try
> > >> > > > to initiate a create operation on the .drill.parquet_metadata
> > file?
> > >> > > >
> > >> > > > On Wed, Nov 11, 2015 at 2:25 PM, John Omernik <j...@omernik.com
> >
> > >> > wrote:
> > >> > > >
> > >> > > >> I take it back.
> > >> > > >>
> > >> > > >> I went to run a query, in the same session that had worked, and
> > >> now I
> > >> > am
> > >> > > >> getting permission denied.
> > >> > > >>
> > >> > > >> I do have a query running created new directories every 5
> > minutes,
> > >> > > >> however, these aren't the directories that are giving me
> > permission
> > >> > > denied.
> > >> > > >>  Did you try running an aggregate query accross all data? This
> > is a
> > >> > > >> interesting one to track down, not sure why I am getting the
> > access
> > >> > > denied
> > >> > > >> now,
> > >> > > >>
> > >> > > >> the .drill.parquet_metadata file in the directory that I am
> > getting
> > >> > the
> > >> > > >> error on is owned by mapr:mapr and has rwxr-xr-x  permissions.
> > This
> > >> > > tells
> > >> > > >> me that both the user of the drillbits (mapr) and the user I am
> > >> logged
> > >> > > into
> > >> > > >> in sqlline (mapradm) should be able to read the file... so why
> > do I
> > >> > get
> > >> > > an
> > >> > > >> access denied in running a query. I any assistance would be
> > >> valuable
> > >> > > here
> > >> > > >> in that there are some great performance increases with the
> > >> metadata
> > >> > > >> caching, and I don't want to miss out on that.
> > >> > > >>
> > >> > > >> On Wed, Nov 11, 2015 at 2:18 PM, John Omernik <
> j...@omernik.com>
> > >> > wrote:
> > >> > > >>
> > >> > > >>> All files are owned by mapr:mapr?
> > >> > > >>>
> > >> > > >>> I have a setup where mapr is the user running the drillbit,
> but
> > >> then
> > >> > I
> > >> > > >>> have a directory that is owned by a another user.
> > mapradm:mapradm
> > >> on
> > >> > > all
> > >> > > >>> files. (Permissions on directories and files appears to be
> > >> > rwxr-x-r-x)
> > >> > > When
> > >> > > >>> I run the REFRESH TABLE metatdata the .drill.parquet_metadata
> > file
> > >> > gets
> > >> > > >>> created as mapr:mapr with rwxr-xr-x.
> > >> > > >>>
> > >> > > >>> So
> > >> > > >>> Drillbit User:mapr
> > >> > > >>> Directory (and subdirectories/files) owner: mapradm:mapradm
> > >> > > >>> Directory permissions (all files and folder under main
> > directory)
> > >> > > >>> rwxr-x-r-x
> > >> > > >>>
> > >> > > >>> I authenticated to drill via sqlline as user mapradm (this
> user
> > >> > should
> > >> > > be
> > >> > > >>> able to read and write just fine to all directories).
> > >> > > >>>
> > >> > > >>> Now, one thing I did notice is my mapr user was not in the
> > mapradm
> > >> > > group,
> > >> > > >>> therefore, didn't have write permissions anywhere... when I
> > fixed
> > >> > that
> > >> > > on
> > >> > > >>> all nodes, and then I manually deleted the metadatafiles,
> things
> > >> seem
> > >> > > to be
> > >> > > >>> working. I wonder if that was my issue?
> > >> > > >>>
> > >> > > >>> Basically, the user running the drillbits need to be able to
> > write
> > >> > > files
> > >> > > >>> (the .drill.parquet_metadata)  or something bad will happen
> :) I
> > >> will
> > >> > > do
> > >> > > >>> more testing. This may be a good candidate for some
> > documentation
> > >> > work
> > >> > > to
> > >> > > >>> understand what permissions are required to be able to query
> > >> these.
> > >> > > >>>
> > >> > > >>>
> > >> > > >>>
> > >> > > >>>
> > >> > > >>> On Wed, Nov 11, 2015 at 1:36 PM, Vince Gonzalez <
> > >> > > vince.gonza...@gmail.com
> > >> > > >>>> wrote:
> > >> > > >>>
> > >> > > >>>> Hi John, I tried this and didn't find any issues. Let me know
> > if
> > >> I
> > >> > > didn't
> > >> > > >>>> follow your reproduction faithfully.
> > >> > > >>>>
> > >> > > >>>> $ sqlline -u jdbc:drill: -n ec2-user -p mapr
> > >> > > >>>> apache drill 1.2.0
> > >> > > >>>> "drill baby drill"
> > >> > > >>>> 0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`;
> > >> > > >>>>
> > +-------+------------------------------------------------------+
> > >> > > >>>> |  ok   |                       summary
> > |
> > >> > > >>>>
> > +-------+------------------------------------------------------+
> > >> > > >>>> | true  | Successfully updated metadata for table /tmp/flows.
> > |
> > >> > > >>>>
> > +-------+------------------------------------------------------+
> > >> > > >>>> 1 row selected (32.27 seconds)
> > >> > > >>>> 0: jdbc:drill:> select srcIP,dstIP from dfs.`/tmp/flows`
> limit
> > >> 12;
> > >> > > >>>> +---------------+---------------+
> > >> > > >>>> |     srcIP     |     dstIP     |
> > >> > > >>>> +---------------+---------------+
> > >> > > >>>> | 172.16.2.152  | 172.16.1.58   |
> > >> > > >>>> | 172.16.1.58   | 172.16.2.152  |
> > >> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> > >> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> > >> > > >>>> | 172.16.2.73   | 172.16.2.152  |
> > >> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> > >> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> > >> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> > >> > > >>>> | 172.16.2.73   | 172.16.2.152  |
> > >> > > >>>> | 172.16.2.73   | 172.16.2.152  |
> > >> > > >>>> | 172.16.2.73   | 172.16.2.152  |
> > >> > > >>>> | 172.16.2.152  | 172.16.2.73   |
> > >> > > >>>> +---------------+---------------+
> > >> > > >>>> 12 rows selected (5.654 seconds)
> > >> > > >>>>
> > >> > > >>>> And here's what my table structure looks like (as seen via
> MapR
> > >> > NFS):
> > >> > > >>>>
> > >> > > >>>> $ tree /mapr/vgonzalez.drill/tmp/flows/ | head -15
> > >> > > >>>> /mapr/vgonzalez.drill/tmp/flows/
> > >> > > >>>> └── 2015
> > >> > > >>>>    └── 11
> > >> > > >>>>        ├── 10
> > >> > > >>>>        │   ├── 21
> > >> > > >>>>        │   │   ├── 39
> > >> > > >>>>        │   │   │   ├── 03
> > >> > > >>>>        │   │   │   │   ├── _common_metadata
> > >> > > >>>>        │   │   │   │   ├── _metadata
> > >> > > >>>>        │   │   │   │   ├──
> > >> > > >>>> part-r-00000-853882bd-66d8-4505-96ba-f0a282e374de.gz.parquet
> > >> > > >>>>        │   │   │   │   └── _SUCCESS
> > >> > > >>>>        │   │   │   └── 20
> > >> > > >>>>        │   │   │       ├── _common_metadata
> > >> > > >>>>        │   │   │       ├── _metadata
> > >> > > >>>>        │   │   │       ├──
> > >> > > >>>> part-r-00000-37a94549-8e56-46d5-be88-cb28e6d8bc35.gz.parquet
> > >> > > >>>>
> > >> > > >>>> My parquet was created in Spark, not Drill. Not sure if
> that's
> > >> > > relevant.
> > >> > > >>>>
> > >> > > >>>> I have authentication and impersonation turned on, and the
> > files
> > >> are
> > >> > > >>>> owned
> > >> > > >>>> by mapr:mapr. Here's my drill-override.conf:
> > >> > > >>>>
> > >> > > >>>> drill.exec: {
> > >> > > >>>>  cluster-id: "vgonzalez_drill-drillbits",
> > >> > > >>>> zk.connect:
> > >> > > >>>>
> > >> > > >>>>
> > >> > >
> > >> >
> > >>
> >
> "ip-172-16-2-36.ec2.internal:5181,ip-172-16-2-37.ec2.internal:5181,ip-172-16-2-38.ec2.internal:5181"
> > >> > > >>>> }
> > >> > > >>>> drill.exec.impersonation: { enabled: true,
> > >> max_chained_user_hops: 3
> > >> > }
> > >> > > >>>> drill.exec { security.user.auth { enabled: true, packages +=
> > >> > > >>>> "org.apache.drill.exec.rpc.user.security", impl: "pam",
> > >> > pam_profiles:
> > >> > > [
> > >> > > >>>> "login","sudo","sshd","password-auth" ] } }
> > >> > > >>>>
> > >> > > >>>>
> > >> > > >>>>
> > >> > > >>>>
> > >> > > >>>>
> > >> > > >>>> On Tue, Nov 10, 2015 at 1:17 PM, John Omernik <
> > j...@omernik.com>
> > >> > > wrote:
> > >> > > >>>>
> > >> > > >>>>> Cool, looking forward to it.
> > >> > > >>>>>
> > >> > > >>>>> On Mon, Nov 9, 2015 at 7:21 PM, Vince Gonzalez <
> > >> > > >>>> vince.gonza...@gmail.com>
> > >> > > >>>>> wrote:
> > >> > > >>>>>
> > >> > > >>>>>> Hey John, I have a secure cluster and some parquet files,
> > I'll
> > >> try
> > >> > > >>>> this
> > >> > > >>>>> out
> > >> > > >>>>>> and report back.
> > >> > > >>>>>>
> > >> > > >>>>>> On Monday, November 9, 2015, John Omernik <
> j...@omernik.com>
> > >> > wrote:
> > >> > > >>>>>>
> > >> > > >>>>>>> Has anyone been able to try/test this? I am curious if
> it's
> > me
> > >> > only
> > >> > > >>>>> issue
> > >> > > >>>>>>> or something more of bug so I can open a JIRA if needed.
> > >> > > >>>>>>>
> > >> > > >>>>>>> John
> > >> > > >>>>>>>
> > >> > > >>>>>>> On Fri, Nov 6, 2015 at 11:06 AM, John Omernik <
> > >> j...@omernik.com
> > >> > > >>>>>>> <javascript:;>> wrote:
> > >> > > >>>>>>>
> > >> > > >>>>>>>> If someone has authorization/authentication setup, to
> > >> reproduce:
> > >> > > >>>>>>>>
> > >> > > >>>>>>>> Have a Parquet table with directories underneath the main
> > (I
> > >> > have
> > >> > > >>>>>>>> directories per day)
> > >> > > >>>>>>>>
> > >> > > >>>>>>>> Then issue REFRESH TABLE METADATA on the root of the
> table
> > >> > > >>>> running an
> > >> > > >>>>>>>> authenticated user other than the drill bit user. (I am
> > using
> > >> > > >>>> mapr, I
> > >> > > >>>>>>> used
> > >> > > >>>>>>>> my user to run the query, and yes I have access to the
> > data)
> > >> > > >>>>>>>>
> > >> > > >>>>>>>> Then run a normal query and see what the result is. .
> > >> > > >>>>>>>>
> > >> > > >>>>>>>> John
> > >> > > >>>>>>>>
> > >> > > >>>>>>>> On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala <
> > >> > > >>>>>>>> nrentachint...@maprtech.com <javascript:;>> wrote:
> > >> > > >>>>>>>>
> > >> > > >>>>>>>>> This doesn't make sense and seems like a bug.
> > >> > > >>>>>>>>> I think the right behavior is for the Drillbit to access
> > the
> > >> > > >>>> cache
> > >> > > >>>>> as
> > >> > > >>>>>>>>> Drillbit user at the query time (there is no user level
> > >> > metadata
> > >> > > >>>>> cache
> > >> > > >>>>>>> in
> > >> > > >>>>>>>>> Drill at this point).
> > >> > > >>>>>>>>>
> > >> > > >>>>>>>>>
> > >> > > >>>>>>>>>
> > >> > > >>>>>>>>> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik <
> > >> j...@omernik.com
> > >> > > >>>>>>> <javascript:;>> wrote:
> > >> > > >>>>>>>>>
> > >> > > >>>>>>>>>> I ran REFRESH TABLE METADATA on a table, it completed
> > >> > > >>>>> successfully.
> > >> > > >>>>>>>>>>
> > >> > > >>>>>>>>>> When I tried a subsequent query, I get a IOException:
> > >> > > >>>> Permission
> > >> > > >>>>>>> Denied
> > >> > > >>>>>>>>> on
> > >> > > >>>>>>>>>> .drill.parquet_metadata.
> > >> > > >>>>>>>>>>
> > >> > > >>>>>>>>>> I am running drill with authentication.  I ran the
> > REFRESH
> > >> > > >>>> TABLE
> > >> > > >>>>>>>>> METADATA
> > >> > > >>>>>>>>>> as user X, it appears the .drill.parquet_metadata was
> > >> created
> > >> > > >>>> and
> > >> > > >>>>>>> owned
> > >> > > >>>>>>>>> by
> > >> > > >>>>>>>>>> the user the drill bits are running as as is created
> with
> > >> > > >>>>>> -rwxr-x-r-x
> > >> > > >>>>>>>>>>
> > >> > > >>>>>>>>>> My question is this: So, I can see why the file is
> owned
> > by
> > >> > > >>>> the
> > >> > > >>>>>> drill
> > >> > > >>>>>>>>> bit
> > >> > > >>>>>>>>>> user, and the file is created with all can read
> > >> permissions,
> > >> > > >>>> but
> > >> > > >>>>> why
> > >> > > >>>>>>> am
> > >> > > >>>>>>>>> I
> > >> > > >>>>>>>>>> getting a permission denied when user X is trying to
> run
> > a
> > >> > > >>>> query?
> > >> > > >>>>>>>>>>
> > >> > > >>>>>>>>>
> > >> > > >>>>>>>>
> > >> > > >>>>>>>>
> > >> > > >>>>>>>
> > >> > > >>>>>>
> > >> > > >>>>>
> > >> > > >>>>
> > >> > > >>>
> > >> > > >>>
> > >> > > >>
> > >> > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Reply via email to