Thanks Abhishek. Will it work on hive acid table which is not compacted ? i.e table having base and delta files?
Let’s say hive acid table customer Create table customer(customer_id int, customer_name string, customer_email string) cluster by customer_id buckets 10 location ‘/test/customer’ tableproperties(transactional=true) And table hdfs path having below directories /test/customer/base_15234/ /test/customer/delta_1234_456 That means table having updates and major compaction not run. Will it spark reader works ? Thank you, Naresh On Fri, Jul 26, 2019 at 7:38 AM Abhishek Somani <abhisheksoman...@gmail.com> wrote: > Hi All, > > We at Qubole <https://www.qubole.com/> have open sourced a datasource > that will enable users to work on their Hive ACID Transactional Tables > <https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions> > using Spark. > > Github: https://github.com/qubole/spark-acid > > Hive ACID tables allow users to work on their data transactionally, and > also gives them the ability to Delete, Update and Merge data efficiently > without having to rewrite all of their data in a table, partition or file. > We believe that being able to work on these tables from Spark is a much > desired value add, as is also apparent in > https://issues.apache.org/jira/browse/SPARK-15348 and > https://issues.apache.org/jira/browse/SPARK-16996 with multiple people > looking for it. Currently the datasource supports reading from these ACID > tables only, and we are working on adding the ability to write into these > tables via Spark as well. > > The datasource is also available as a spark package, and instructions on > how to use it are available on the Github page > <https://github.com/qubole/spark-acid>. > > We welcome your feedback and suggestions. > > Thanks, > Abhishek Somani > -- Thanks, Naresh www.linkedin.com/in/naresh-dulam http://hadoopandspark.blogspot.com/