Thanks Durai for the responses. I'm happy to contribute to Ranger in whatever way I can. I shall create JIRA with detailed descriptions/requirements for these items (1) eliminating multiple entries for a single event (2) auditable actions in hdfs and hive (would be really nice if this is based on some configurable patterns) (3) Ranger to capture the exact nature of event (update, create, delete, permission modified, ACL created etc..) .
On the fourth item it is not exactly the policy changes (policy changes in Ranger keep track of old value and new value for any kind of changes) but any changes happening in HDFS and HIVE which can be defined in some fashion. For example, in HDFS we need to audit file/folder creation, modification to the same, deletion, user creation, user permission changes, ACL changes, HIVE grants and revokes etc. just to list some of them (can go in detail in JIRA with exact requirements). For these kind of changes it is required to keep track of what changes from what value to what value and by whom and when. If such a change attempt resulted in failure that also need to be audited. Hope this outlines the requirements. I shall start creating JIRAs for these and let me know in whatever way I can contribute to this. Thanks Sethukumar From: Don Bosco Durai [mailto:[email protected]] On Behalf Of Don Bosco Durai Sent: Wednesday, April 15, 2015 6:44 AM To: [email protected]; [email protected] Subject: Re: Some Apache Ranger queries/thoughts Hi Sethukumar Thanks for your input. My responses are inline. Regards Bosco From: Sethukumar Ramachandran <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Tuesday, April 14, 2015 at 2:48 AM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Cc: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Some Apache Ranger queries/thoughts Hello all, We are using HDP 2.2 and setup Apache Ranger along with it in Ubuntu 12.04. We are not able to fulfill our audit related requirement through Ranger. At present we have the following items which we were not able to get through Ranger. Please let us know whether we are missing something or ways to improve. 1. As part of our audit requirements we are required to capture PermissionDenied type of exceptions (or any exceptions for that matter) in HDFS and GRANT related issues in Hive. At present we are not able to capture these in Ranger. But HDFS audit logs and hiverserver logs have some relevant information on this. As a single point of information on audit related stuff we would like to have these in Ranger than looking around in those logs. How Can we do this with Ranger? Bosco: This is our ultimate goal. With Hive we might be auditing all user level activities. With HDFS, we are auditing all file access related actions. Would you be able to list out the actions you want to audit. This will help us to scope the work. Please create a JIRA to track this. 2. Both HDFS and Hive plugins for Ranger actually captures multiple audit entries for the same event and this is bit an overhead from auditing perspective. Is it possible to have a single and clear audit entry in Ranger for a particular auditable event? Is there some configuration available for this to work? Bosco: In the release under development (Apache Ranger 0.5), the HDFS audit has been optimized to only one call per request. For Hive, we are just capturing one action per request. I am now sure whether you are referring to "USE" action. Anyway, for Hive, it would be good if you can let us know which ones are duplicate. We can look into it. 3. If we have an HDFS read, write or delete operation we get multiple entries in Ranger audit. But we are not able to figure about the exact nature of change happened in HDFS by looking through the Ranger Audit trail records. Similar is the case for Hive related operations. The resource name that Ranger captures is sometimes vague and point to /tmp folder and all Bosco: Hopefully, eliminating the multiple entries will ease some of your pain. Regarding Hive access to HDFS, since Hive creates a lot of temporary intermediate files, there is a lot of noise. Your concerns are valid. I feel, we should extend our UI search to be more smart and help the admin users to suppress (filter out) accesses to /tmp folders and similar transient resources. Can you help us documenting and track the requirement by creating a JIRA? FYI, we are moving our audits to Solr. This gives a lot more search and filter capabilities and you can also use Banana (or other BI tools) to write your own custom Audit dashboard. Something that might be interesting to you. 4. If there is a change in HDFS or Hive (grants, data delete/update), as a requirement we need to store the old value and new value along with who made the change, when the change was made and whether it was successful or not. But this is not happening now. How can we achieve this with Ranger? Bosco: Assuming you are referring to policy changes, all Hive related policy changes (Ranger UI, Ranger REST or Hive GRANT/REVOKE) are logged into Ranger. You can check them from Ranger -> Audit -> Admin tab. For HDFS, all policy changes done via Ranger UI and Ranger REST are logged in Ranger. Thanks & Regards, Sethukumar Ramachandran
