Hi, you can also load other data source without Hive using spark read format into a spark Dataframe . From there you can also combine the results using the Dataframe world.
The use cases of hive is to have a common Abstraction layer when you want to do data tagging, access management under one hoof using tools like Apache Ranger / Apache atlas. Br, Dennis Von meinem iPhone gesendet > Am 25.04.2021 um 15:37 schrieb krchia <kangren.c...@gmail.com>: > > Does it make sense to keep a Hive installation when your parquet files come > with a transactional metadata layer like Delta Lake / Apache Iceberg? > > My understanding from this: > https://github.com/delta-io/delta/issues/85 > > is that Hive is no longer necessary other than discovering where the table > is stored. Hence, we can simply do something like: > ``` > df = spark.read.delta($LOCATION) > df.createOrReplaceTempView("myTable") > res = spark.sql("select * from myTable") > ``` > and this approach still gets all the benefits of having the metadata for > partition discovery / SQL optimization? With Delta, the Hive metastore > should only store a pointer from the table name to the path of the table, > and all other metadata will come from the Delta log, which will be processed > in Spark. > > One reason i can think of keeping Hive is to keep track of other data > sources that don't necessarily have a Delta / Iceberg transactional metadata > layer. But i'm not sure if it's still worth it, are there any use cases i > might have missed out on keeping a Hive installation after migrating to > Delta / Iceberg? > > Please correct me if i've used any terms wrongly. > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org