Re: Kafka Hudi pipeline design

2020-07-21 Thread Lian Jiang
an coordinate themselves by using some locking mechanism? IMHO, it is ok to sacrifice some performance to make the concurrent writing correct. Appreciate your insight. Regards Lian On Tue, Jul 21, 2020 at 2:13 AM Balaji Varadarajan wrote: > Please see answers inline... > > On

Kafka Hudi pipeline design

2020-07-19 Thread Lian Jiang
Hi, I have a kafka topic using a kafka s3 connector to dump data into s3 hourly in parquet format. These parquet files are partitioned in ingestion time and each record has fields which are deeply nested jsons. Each record is a monolithic data containing multiple events each has its own event time

Re: hudi dependency conflicts for test

2020-05-26 Thread Lian Jiang
0 at 11:02 PM Lian Jiang wrote: > > > The root cause is that I need to use java 8 instead of the default java > 11 > > in intellij. Thanks everyone for helping and cheers! > > > > On Thu, May 21, 2020 at 1:09 PM Lian Jiang > wrote: > > > > > The examples

Re: hudi dependency conflicts for test

2020-05-21 Thread Lian Jiang
The root cause is that I need to use java 8 instead of the default java 11 in intellij. Thanks everyone for helping and cheers! On Thu, May 21, 2020 at 1:09 PM Lian Jiang wrote: > The examples in quick start work for me in spark-shell. I am trying to use > scala unit test to make these ex

Re: hudi dependency conflicts for test

2020-05-21 Thread Lian Jiang
> tableName, > "hoodie.datasource.write.precombine.field" -> "timestamp" > ) > > val inputDF = spark.range(0, 5). >withColumn("key", $"id"). >withColumn("data", lit("data")). >withColumn("timestamp",

Re: hudi dependency conflicts for test

2020-05-21 Thread Lian Jiang
or racing , Raymond! > > On Thu, May 21, 2020 at 10:08 AM Shiyan Xu > wrote: > > > Hi Lian, it appears that you need to have spark-avro_2.11:2.4.4 in your > > classpath. > > > > > > > > On Thu, May 21, 2020 at 10:04 AM Lian Jiang > wrote: > > >

Re: hudi dependency conflicts for test

2020-05-21 Thread Lian Jiang
ee any difference that could cause this issue. As it works > with 0.5.2, I am assuming you are not blocked. Let us know otherwise. > Balaji.VOn Wednesday, May 20, 2020, 01:17:08 PM PDT, Lian Jiang < > jiangok2...@gmail.com> wrote: > > Thanks Vinoth. > > Below dependency

hudi roadmap and user feedback

2020-05-20 Thread Lian Jiang
Hi, I am interested to know the road map of apache hudi since it impacts the AWS EMR support. I hope that when apache hudi has formal release (e.g. v1.0), EMR's hudi support will also claim GA soon. https://hudi.apache.org/roadmap is a little out of date and I see recent vote about hudi graduatio

Re: hudi dependency conflicts for test

2020-05-20 Thread Lian Jiang
7; testCompile group: 'org.mockito', name: 'mockito-scala_2.11', version: '1.5.12' compile group: 'org.apache.iceberg', name: 'iceberg-api', version: '0.8.0-incubating' Cheers! On Wed, May 20, 2020 at 5:00 AM Vinoth Chandar wrote:

hudi dependency conflicts for test

2020-05-18 Thread Lian Jiang
Hi, I am using hudi in a scala gradle project: dependencies { compile group: 'org.apache.spark', name: 'spark-core_2.11', version: '2.4.4' compile group: 'org.apache.spark', name: 'spark-sql_2.11', version: '2.4.4' compile group: 'org.scala-lang', name: 'scala-library', version: '2.11