Hi Ayan, I may be wrong about this, but I think that Delta files are in Parquet format. But I am sure that you have already checked this. Am I missing something?
Regards, Gourav Sengupta On Fri, Jun 21, 2019 at 6:39 AM ayan guha <guha.a...@gmail.com> wrote: > Hi > We used spark.sql to create a table using DELTA. We also have a hive > metastore attached to the spark session. Hence, a table gets created in > Hive metastore. We then tried to query the table from Hive. We faced > following issues: > > 1. SERDE is SequenceFile, should have been Parquet > 2. Scema fields are not passed. > > Essentially the hive DDL looks like: > > *CREATE TABLE `TABLE NAME`(** `col` array<string> COMMENT 'from > deserializer')* > > *ROW FORMAT SERDE ** > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' **WITH > SERDEPROPERTIES ( ** 'path'=WASB PATH**') **STORED AS INPUTFORMAT * > * 'org.apache.hadoop.mapred.SequenceFileInputFormat'* > > *OUTPUTFORMAT ** > 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat' **LOCATION ** > '* *WASB PATH'* > > *TBLPROPERTIES ( ** 'spark.sql.create.version'='2.4.0',** > 'spark.sql.sources.provider'='DELTA',** > 'spark.sql.sources.schema.numParts'='1',* > * 'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[]}',** > 'transient_lastDdlTime'='1556544657')* > > Is this expected? And will the use case be supported in future releases? > > > We are now experimenting > > Best > > Ayan > > On Fri, Jun 21, 2019 at 11:06 AM Liwen Sun <liwen....@databricks.com> > wrote: > >> Hi James, >> >> Right now we don't have plans for having a catalog component as part of >> Delta Lake, but we are looking to support Hive metastore and also DDL >> commands in the near future. >> >> Thanks, >> Liwen >> >> On Thu, Jun 20, 2019 at 4:46 AM James Cotrotsios < >> jamescotrots...@gmail.com> wrote: >> >>> Is there a plan to have a business catalog component for the Data Lake? >>> If not how would someone make a proposal to create an open source project >>> related to that. I would be interested in building out an open source data >>> catalog that would use the Hive metadata store as a baseline for technical >>> metadata. >>> >>> >>> On Wed, Jun 19, 2019 at 3:04 PM Liwen Sun <liwen....@databricks.com> >>> wrote: >>> >>>> We are delighted to announce the availability of Delta Lake 0.2.0! >>>> >>>> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart: >>>> https://docs.delta.io/0.2.0/quick-start.html >>>> >>>> To view the release notes: >>>> https://github.com/delta-io/delta/releases/tag/v0.2.0 >>>> >>>> This release introduces two main features: >>>> >>>> *Cloud storage support* >>>> In addition to HDFS, you can now configure Delta Lake to read and write >>>> data on cloud storage services such as Amazon S3 and Azure Blob Storage. >>>> For configuration instructions, please see: >>>> https://docs.delta.io/0.2.0/delta-storage.html >>>> >>>> *Improved concurrency* >>>> Delta Lake now allows concurrent append-only writes while still >>>> ensuring serializability. For concurrency control in Delta Lake, please >>>> see: https://docs.delta.io/0.2.0/delta-concurrency.html >>>> >>>> We have also greatly expanded the test coverage as part of this release. >>>> >>>> We would like to acknowledge all community members for contributing to >>>> this release. >>>> >>>> Best regards, >>>> Liwen Sun >>>> >>>> > > -- > Best Regards, > Ayan Guha >