I have added JIRA items for these (RANGER-413, 414,415,416) and let me know the 
description is detailed enough to be taken as requirement statement.

Thanks
Sethukumar

From: Sethukumar Ramachandran
Sent: Thursday, April 16, 2015 9:07 AM
To: [email protected]; [email protected]
Subject: RE: Some Apache Ranger queries/thoughts

Sure and happy to contribute (taking up requirements to coding, anything for 
that matter). Please let me know. Give me few days to start creating JIRA. Then 
we can refine the requirements and start on..


Thanks
Sethukumar

From: Don Bosco Durai [mailto:[email protected]] On Behalf Of Don Bosco 
Durai
Sent: Thursday, April 16, 2015 4:59 AM
To: [email protected]; [email protected]
Subject: Re: Some Apache Ranger queries/thoughts

Hi Sethukumar

You requests are reasonable. Let's start with creating the JIRA. Also if you 
are planning to do some specific contribution, then let us know.

Thanks

Bosco


From: Sethukumar Ramachandran 
<[email protected]<mailto:[email protected]>>
Reply-To: 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Tuesday, April 14, 2015 at 9:01 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: RE: Some Apache Ranger queries/thoughts

Thanks Durai for the responses. I'm happy to contribute to Ranger in whatever 
way I can. I shall create JIRA with detailed descriptions/requirements for  
these items (1) eliminating multiple entries for a single event (2) auditable 
actions in hdfs and hive (would be really nice if this is based on some 
configurable patterns) (3) Ranger to capture the exact nature of event (update, 
create, delete, permission modified, ACL created etc..) .

On the fourth item it is not exactly the policy changes (policy changes in 
Ranger keep track of old value and new value for any kind of changes) but any 
changes happening in HDFS and HIVE which can be defined in some fashion. For 
example, in HDFS we need to audit file/folder creation, modification to the 
same, deletion, user creation, user permission changes, ACL changes, HIVE 
grants and revokes etc. just to list some of them (can go in detail in JIRA 
with exact requirements). For these kind of changes it is required to keep 
track of what changes from what value to what value and by whom and when. If 
such a change attempt resulted in failure that also need to be audited.


Hope this outlines the requirements. I shall start creating JIRAs for these and 
let me know in whatever way I can contribute to this.


Thanks
Sethukumar

From: Don Bosco Durai [mailto:[email protected]] On Behalf Of Don Bosco 
Durai
Sent: Wednesday, April 15, 2015 6:44 AM
To: [email protected]<mailto:[email protected]>; 
[email protected]<mailto:[email protected]>
Subject: Re: Some Apache Ranger queries/thoughts

Hi Sethukumar

Thanks for your input. My responses are inline.

Regards

Bosco


From: Sethukumar Ramachandran 
<[email protected]<mailto:[email protected]>>
Reply-To: 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Tuesday, April 14, 2015 at 2:48 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Some Apache Ranger queries/thoughts

Hello all,

We are using HDP 2.2 and setup Apache Ranger along with it in Ubuntu 12.04. We 
are not able to fulfill our audit related requirement through Ranger. At 
present we have the following items which we were not able to get through 
Ranger. Please let us know whether we are missing something or ways to improve.



1.       As part of our audit requirements we are required to capture 
PermissionDenied type of exceptions  (or any exceptions for that matter) in 
HDFS and GRANT related issues in Hive. At present we are not able to capture 
these in Ranger. But HDFS audit logs and hiverserver logs have some relevant 
information on this. As a single point of information on audit related stuff we 
would like to have these in Ranger than looking around in those logs.  How Can 
we do this with Ranger?
Bosco: This is our ultimate goal. With Hive we might be auditing all user level 
activities. With HDFS, we are auditing all file access related actions. Would 
you be able to list out the actions you want to audit. This will help us to 
scope the work. Please create a JIRA to track this.


2.       Both HDFS and Hive plugins for Ranger actually captures multiple audit 
entries for the same event and this is bit an overhead from auditing 
perspective. Is it possible to have a single and clear audit entry in Ranger 
for a particular auditable event? Is there some configuration available for 
this to work?
Bosco: In the release under development (Apache Ranger 0.5), the HDFS audit has 
been optimized to only one call per request. For Hive, we are just capturing 
one action per request. I am now sure whether you are referring to "USE" 
action. Anyway, for Hive, it would be good if you can let us know which ones 
are duplicate. We can look into it.


3.       If we have an HDFS read, write or delete operation we get multiple 
entries in Ranger audit. But we are not able to figure about the exact nature 
of change happened in HDFS by looking  through the Ranger Audit trail records. 
Similar is the case for Hive related operations. The resource name that Ranger 
captures is sometimes vague and point to /tmp folder and all
Bosco: Hopefully, eliminating the multiple entries will ease some of your pain. 
Regarding Hive access to HDFS, since Hive creates a lot of temporary 
intermediate files, there is a lot of noise. Your concerns are valid. I feel, 
we should extend our UI search to be more smart and help the admin users to 
suppress (filter out) accesses to /tmp folders and similar transient resources. 
Can you help us documenting and track the requirement by creating a JIRA? FYI, 
we are moving our audits to Solr. This gives a lot more search and filter 
capabilities and you can also use Banana (or other BI tools) to write your own 
custom Audit dashboard. Something that might be interesting to you.


4.       If there is a change in HDFS or Hive (grants, data delete/update), as 
a requirement we need to store the old value and new value along with who made 
the change, when the change was made and whether it was successful or not. But 
this is not happening now. How can we achieve this with Ranger?
Bosco: Assuming you are referring to policy changes, all Hive related policy 
changes (Ranger UI, Ranger REST or Hive GRANT/REVOKE) are logged into Ranger. 
You can check them from Ranger -> Audit -> Admin tab. For HDFS, all policy 
changes done via Ranger UI and Ranger REST are logged in Ranger.




Thanks & Regards,
Sethukumar Ramachandran

Reply via email to