Re: pyspark connect to spark thrift server port

Artemis User Fri, 21 Oct 2022 07:49:19 -0700

I guess there are some confusions here between the metastore and theactual Hive database. Spark (as well as Apache Hive) requires twodatabases for Hive DB operations. Metastore is used for storingmetadata only (e.g., schema info), whereas the actual Hive database,accessible through Thrift server, is used for applications. The reasonwhy Hive needs its metadata stored separately as a server is because fordistributed database operations.

My previous message referred to how to secure the metastore database,not the actual Hive tables. Looks like you are looking for how tosecure access to Hive not metastore (metastore isn't used by generalusers), and your current configuration wasn't set up with the right useraccess control. Hive actually supports role-based access model justlike other RDBMS. You may refer to the Hive admin guide for moredetails(https://cwiki.apache.org/confluence/display/Hive/SQL+Standard+Based+Hive+Authorization).You can use beeline or SQL scripts via beeline to set user privilegesand roles.


On 10/21/22 1:27 AM, [email protected] wrote:

Hello Artemis,
Understand, if i gave hive metastore uri to anyone to connect usingpyspark. the port 9083 is open for anyone without authenticationfeature. The only way pyspark able to connect to hive is through 9083and not through port 10000.On Friday, October 21, 2022 at 04:06:38 AM GMT+8, Artemis User<[email protected]> wrote:
By default, Spark uses Apache Derby (running in embedded mode withstore content defined in local files) for hosting the Hive metastore. You can externalize the metastore on a JDBC-compliant database (e.g.,PostgreSQL) and use the database authentication provided by thedatabase. The JDBC configuration shall be defined in a hive-site.xmlfile in the Spark conf directory. Please see the metastore adminguide for more details, including an init script for setting up yourmetastore(https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration<https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration>).
On 10/20/22 4:31 AM, [email protected]<mailto:[email protected]> wrote:Currently my pyspark code able to connect to hive metastore at port9083. However using this approach i can't put in-place any securitymechanism like LDAP and sql authentication control. Is there anyway toconnect from pyspark to spark thrift server on port 10000 withoutexposing hive metastore url to the pyspark ? I would like toauthenticate the user before allow to execute spark sql, and usershould only allow to query from databases,tables that they have theaccess.
Thank you,
comet

Re: pyspark connect to spark thrift server port

Reply via email to