Hi
We used spark.sql to create a table using DELTA. We also have a hive
metastore attached to the spark session. Hence, a table gets created in
Hive metastore. We then tried to query the table from Hive. We faced
following issues:
1. SERDE is SequenceFile, should have been Parquet
2. Scema fields are not passed.
Essentially the hive DDL looks like:
*CREATE TABLE `TABLE NAME`(** `col` array<string> COMMENT 'from
deserializer')*
*ROW FORMAT SERDE **
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' **WITH
SERDEPROPERTIES ( ** 'path'=WASB PATH**') **STORED AS INPUTFORMAT *
* 'org.apache.hadoop.mapred.SequenceFileInputFormat'*
*OUTPUTFORMAT **
'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat' **LOCATION **
'* *WASB PATH'*
*TBLPROPERTIES ( ** 'spark.sql.create.version'='2.4.0',**
'spark.sql.sources.provider'='DELTA',**
'spark.sql.sources.schema.numParts'='1',*
* 'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[]}',**
'transient_lastDdlTime'='1556544657')*
Is this expected? And will the use case be supported in future releases?
We are now experimenting
Best
Ayan
On Fri, Jun 21, 2019 at 11:06 AM Liwen Sun <[email protected]> wrote:
> Hi James,
>
> Right now we don't have plans for having a catalog component as part of
> Delta Lake, but we are looking to support Hive metastore and also DDL
> commands in the near future.
>
> Thanks,
> Liwen
>
> On Thu, Jun 20, 2019 at 4:46 AM James Cotrotsios <
> [email protected]> wrote:
>
>> Is there a plan to have a business catalog component for the Data Lake?
>> If not how would someone make a proposal to create an open source project
>> related to that. I would be interested in building out an open source data
>> catalog that would use the Hive metadata store as a baseline for technical
>> metadata.
>>
>>
>> On Wed, Jun 19, 2019 at 3:04 PM Liwen Sun <[email protected]>
>> wrote:
>>
>>> We are delighted to announce the availability of Delta Lake 0.2.0!
>>>
>>> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
>>> https://docs.delta.io/0.2.0/quick-start.html
>>>
>>> To view the release notes:
>>> https://github.com/delta-io/delta/releases/tag/v0.2.0
>>>
>>> This release introduces two main features:
>>>
>>> *Cloud storage support*
>>> In addition to HDFS, you can now configure Delta Lake to read and write
>>> data on cloud storage services such as Amazon S3 and Azure Blob Storage.
>>> For configuration instructions, please see:
>>> https://docs.delta.io/0.2.0/delta-storage.html
>>>
>>> *Improved concurrency*
>>> Delta Lake now allows concurrent append-only writes while still ensuring
>>> serializability. For concurrency control in Delta Lake, please see:
>>> https://docs.delta.io/0.2.0/delta-concurrency.html
>>>
>>> We have also greatly expanded the test coverage as part of this release.
>>>
>>> We would like to acknowledge all community members for contributing to
>>> this release.
>>>
>>> Best regards,
>>> Liwen Sun
>>>
>>>
--
Best Regards,
Ayan Guha