attilapiros opened a new pull request, #43064:
URL: https://github.com/apache/spark/pull/43064

   
   ### What changes were proposed in this pull request?
   
   Supporting Hive 4.0 metastore where partition filters even for CHAR and a 
VARCHAR types can be pushed down.
   
   **Hive 4.0 is still beta! This is why this is work on progress PR.** 
   
   ### Why are the changes needed?
   
   Supporting more Hive versions (with extra performance improvement) is good 
for our users.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. Regarding supporting Hive 4.0 metastore the documentation is updated 
accordingly.
   
   ### How was this patch tested?
   
   #### Manually
   
   I used the docker image of apache/hive:4.0.0-beta-1 for starting a metastore 
and a hiveserver2 (along with a hadoop3 docker image).
   
   Created a table:
   ```
   CREATE EXTERNAL TABLE testTable1 ( 
     column1 String 
   ) PARTITIONED BY (partColumn1 CHAR(30), partColumn2 VARCHAR(30)) LOCATION 
'hdfs://hadoop3:8020/tmp/hive_external/';
   ```
   
   Inserted some values in beeline:
   
   ```
   insert into table testtable1 values ("column1_v1", "partcolumn1_v1", 
"partcolumn2_v1"), ("column1_v2", "partcolumn1_v2", "partcolumn2_v2");
   ```
   
   Started my spark in the hiveserver2 container as:
   ```
   ./bin/spark-shell --conf spark.sql.hive.metastore.version=4.0.0 --conf 
spark.sql.hive.metastore.jars="/opt/hive/lib/*"
   ```
   
   Run the query as:
   ```
   scala> sql("select * from testtable1 where partcolumn1 = 'partcolumn1_v1' 
and partcolumn2 = 'partcolumn2_v1'").show
   Hive Session ID = 6846fe0e-968a-474d-afec-4f67b3a2a274
   +----------+--------------------+--------------+
   |   column1|         partcolumn1|   partcolumn2|
   +----------+--------------------+--------------+
   |column1_v1|partcolumn1_v1   ...|partcolumn2_v1|
   +----------+--------------------+--------------+
   ```
   
   And check the HMS calls in the metastore container in the file 
`/tmp/hive/hive.log`:
   ```
   ...
   2023-09-22T21:06:34,293  INFO [Metastore-Handler-Pool: Thread-1356] 
HiveMetaStore.audit: ugi=hive       ip=172.30.0.5   cmd=source:172.30.0.5 
get_partitions_by_filter : tbl=hive.default.testtable1
   ...
   ```
   
   Which contains the expected `get_partitions_by_filter`.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to