Hello, I answer to myself as I am happy to say that I solved my problem and I have been able to access Hive tables from SparkSQL with Ranger enabled. Policies defined in Ranger are properly enforced in Spark.
So here is how to do it (assuming you have been able to make it work without Ranger): - Check that you have set hive.security.authorization.manager= org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizerFactory - Get ranger-hive-security.xml and ranger-hive-audit.xml from your ranger hive plugin folder and copy them in you spark conf directory. - Add these jars from your ranger distribution to your classpath (or use the --driver-class-path argument for spark): ranger-hive-plugin, ranger-plugins-common, ranger-plugins-audit, guava That's all. It should work. The only thing which bothers me a little bit now is that SparkSQL does not handle 'doAs=F'. It is not surprising considering Spark is run by the user and not by a server process own by a system user. So I am afraid it will be an issue with Ranger, as all tables written with hive will be owned by hive but all tables written with Spark will be owned by the user who wrote them. We have to find a solution for that. Regards, Julien 2016-01-19 13:38 GMT+01:00 Julien Carme <[email protected]>: > Hello, > > Thanks Madhan and Bosco for your answers. > > I am using HDP 2.3 and installed Ranger from Ambari. I suppose Ambari does > run enable-hive-plugin, as Ranger does work correctly with Hive when I use > Hive through the hiveserver2. It is only when I try to use it from Spark > (using SparkSQL) that it does not work. > > SparkSQL does not use Hiveserver2, but it does not use HiveCLI either (at > least not directly). Hive engine is not used at all. SparkSQL is a > standalone SQL engine which is part of Spark, it gets Hive tables directly > from where they are stored, using metadata it gets from HCAT. At least it > is my understanding. > > Until recently, SparkSQL was ignoring Ranger, just like the Hive CLI, and > it was working (I could access Hive data from Spark on a cluster with > Ranger up, but of course Ranger rules were ignored). But since a recent > update, SparkSQL now clearly does interact with Ranger, as I get Ranger > exceptions when I use SparkSQL. I think that it gets the value of > hive.security.authorization.manager (which in my system is a Ranger > class) and instantiate this class in order to comply with security rules > defined by this class. I am no expert in Spark internals or Ranger, this is > just assumptions. > > I have solved multiple classpath (ranger jar not found) and configuration > file (xa-secure.xml ?) issues in order to reach the point where I am now. > Now I don't get missing class or missing file exceptions, but it still does > not work, and I get the issue describe in my previous mail (see below). > > I will try to continue my investigations. If I make progress I will post > it here. But any additional help would be appreciated. > > Best regards, > > Julien > > > 2016-01-18 22:24 GMT+01:00 Don Bosco Durai <[email protected]>: > >> Ideally, Ranger shouldn’t be in play when HiveCLI is used. If I am not >> wrong, Spark using HiveCLI API. >> >> To avoid this issue, I thought we only update hiveserver2.properties. >> Julien, I assume you are using the standard enable plugin scripts. >> >> Thanks >> >> Bosco >> >> >> From: Madhan Neethiraj <[email protected]> on behalf of Madhan >> Neethiraj <[email protected]> >> Reply-To: <[email protected]> >> Date: Monday, January 18, 2016 at 9:54 AM >> To: "[email protected]" <[email protected]> >> Subject: Re: Spark + Hive + Ranger >> >> Julien, >> >> Ranger Hive plugin requires additional configuration, like whereto >> location of Ranger Admin, name of the service containing policies for Hive, >> etc. Such configurations (in files named ranger-*.xml) are created when >> enable-hive-plugin.sh script is run with appropriate values in >> install.properties. This script also update hive-site.xml with necessary >> changes – like registering Ranger as authorizer in >> hive.security.authorization.manager. If you haven’t installed the plugin >> using enable-hive-plugin.sh, please do so and let us know the result. >> >> Hope this helps. >> >> Madhan >> >> >> From: Julien Carme <[email protected]> >> Reply-To: "[email protected]" < >> [email protected]> >> Date: Monday, January 18, 2016 at 9:27 AM >> To: "[email protected]" <[email protected]> >> Subject: Spark + Hive + Ranger >> >> Hello, >> >> I try to access Hive from Spark in an Hadoop cluster where I use Ranger >> to control Hive access. >> >> As Ranger is installed, I have setup hive accordingly: >> >> hive.security.authorization.manager= >> org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizerFactory >> >> When I run Spark and I request it to access Hive table, it is using this >> class to access it but I get several errors: >> >> 16/01/18 17:51:50 INFO provider.AuditProviderFactory: No v3 audit >> configuration found. Trying v2 audit configurations >> 16/01/18 17:51:50 ERROR util.PolicyRefresher: >> PolicyRefresher(serviceName=null): failed to refresh policies. Will >> continue to use last known version of policies (-1) >> com.sun.jersey.api.client.ClientHandlerException: >> java.lang.IllegalArgumentException: URI is not absolute >> at >> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) >> at com.sun.jersey.api.client.Client.handle(Client.java:648) >> at >> com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) >> at >> com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) >> at >> com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:503) >> at >> org.apache.ranger.admin.client.RangerAdminRESTClient.getServicePoliciesIfUpdated(RangerAdminRESTClient.java:71) >> at >> org.apache.ranger.plugin.util.PolicyRefresher.loadPolicyfromPolicyAdmin(PolicyRefresher.java:205) >> >> >> >> -- >> >> And then (but it is not clear at all the two errors are connected) : >> >> 16/01/18 17:51:50 INFO ql.Driver: Starting task [Stage-0:DDL] in serial >> mode >> 16/01/18 17:51:50 ERROR authorizer.RangerHiveAuthorizer: >> filterListCmdObjects: Internal error: null RangerAccessResult object >> received back from isAccessAllowed()! >> 16/01/18 17:51:50 ERROR authorizer.RangerHiveAuthorizer: >> filterListCmdObjects: Internal error: null RangerAccessResult object >> received back from isAccessAllowed()! >> 16/01/18 17:51:50 ERROR authorizer.RangerHiveAuthorizer: >> filterListCmdObjects: Internal error: null RangerAccessResult object >> received back from isAccessAllowed()! >> 1 >> -- >> >> And then the access to Hive tables fails. >> >> I am not sure where to go from there. Any help would be appreciated. >> >> Best Regards, >> >> Julien >> >> >> >> >
