Hi, 

you can also load other data source without Hive using spark read format into a 
spark Dataframe . From there you can also combine the results using the 
Dataframe world.

The use cases of hive is to have a common Abstraction layer when you want to do 
data tagging, access management under one hoof using tools like Apache Ranger / 
Apache atlas.



Br,

Dennis 

Von meinem iPhone gesendet

> Am 25.04.2021 um 15:37 schrieb krchia <kangren.c...@gmail.com>:
> 
> Does it make sense to keep a Hive installation when your parquet files come
> with a transactional metadata layer like Delta Lake / Apache Iceberg?
> 
> My understanding from this:
> https://github.com/delta-io/delta/issues/85
> 
> is that Hive is no longer necessary other than discovering where the table
> is stored. Hence, we can simply do something like:
> ```
> df = spark.read.delta($LOCATION)
> df.createOrReplaceTempView("myTable")
> res = spark.sql("select * from myTable")
> ```
> and this approach still gets all the benefits of having the metadata for
> partition discovery / SQL optimization? With Delta, the Hive metastore
> should only store a pointer from the table name to the path of the table,
> and all other metadata will come from the Delta log, which will be processed
> in Spark.
> 
> One reason i can think of keeping Hive is to keep track of other data
> sources that don't necessarily have a Delta / Iceberg transactional metadata
> layer. But i'm not sure if it's still worth it, are there any use cases i
> might have missed out on keeping a Hive installation after migrating to
> Delta / Iceberg?
> 
> Please correct me if i've used any terms wrongly.
> 
> 
> 
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to