Re: About github issue 639

2019-05-08 Thread Jun Zhu
Yes, fixed On Thu, May 9, 2019 at 6:13 AM Vinoth Chandar wrote: > Images don't render on the mailing list. :( > Seems like the issue if fixed now? > > On Tue, May 7, 2019 at 10:15 PM Jun Zhu > wrote: > > > Hi, > > I run the new code pull from master branch, and compare with another > > stream

Re: About github issue 639

2019-05-08 Thread Vinoth Chandar
Images don't render on the mailing list. :( Seems like the issue if fixed now? On Tue, May 7, 2019 at 10:15 PM Jun Zhu wrote: > Hi, > I run the new code pull from master branch, and compare with another > stream job which run hudi 0.4.5 on maven. Both running per 10 minutes. > The roll-back wor

Re: About github issue 639

2019-05-07 Thread Jun Zhu
Hi, I run the new code pull from master branch, and compare with another stream job which run hudi 0.4.5 on maven. Both running per 10 minutes. The roll-back worked. Top is 0.4.5, bottom is 0.4.6 [image: Screen Shot 2019-05-08 at 1.06.17 PM.png] [image: Screen Shot 2019-05-08 at 1.06.54 PM.png] And

Re: About github issue 639

2019-05-03 Thread Vinoth Chandar
No worries. This just landed on master, you can give it a shot. You ll also end up picking up interval tree based filtering for global index, which will speed things along a lot. Fyi Have a good holiday! Thanks Vinoth On Fri, May 3, 2019 at 7:19 PM Jun Zhu wrote: > Hi team, > i will try that,

Re: About github issue 639

2019-05-03 Thread Jun Zhu
Hi team, i will try that, thank you so much, sorry for late reply, just have a holiday in china😅. Thanks Jun On Wed, May 1, 2019 at 7:08 PM Vinoth Chandar wrote: > Hi Jun, > > I was able to track that the HoodieSparkSQLWriter (common path for > streaming sink and batch datasource) ends up callin

Re: About github issue 639

2019-05-01 Thread Vinoth Chandar
Hi Jun, I was able to track that the HoodieSparkSQLWriter (common path for streaming sink and batch datasource) ends up calling DataSourceUtils.createHoodieClient, which creates the client as follows return new HoodieWriteClient<>(jssc, writeConfig); There is a third parameter that denotes wheth

Re: About github issue 639

2019-04-30 Thread vbal...@apache.org
Hi Jun, You had mentioned that you are seeing the log message "insert failed with 1 errors" Did you see any exception stack traces before this message. You can also take a look at spark UI to see if stdout/stderr of failed tasks (if present). Also, it looks like if you also enable "trace" level

Re: About github issue 639

2019-04-30 Thread Vinoth Chandar
Hi Jun, Basically you are saying streaming path leaves some inflights behind.. let me see if I can reproduce it. If you have a simple test case, please share Thanks Vinoth On Tue, Apr 30, 2019 at 1:04 AM Jun Zhu wrote: > Hi Vinoth, > In spark streaming log I find "2019-04-30 03:26:11 ERROR > H

Re: About github issue 639

2019-04-30 Thread Jun Zhu
Hi Vinoth, In spark streaming log I find "2019-04-30 03:26:11 ERROR HoodieSparkSQLWriter:182 - insert failed with 1 errors :"(no continue error logs) , during which commit end with inflight and not cleaned. Just for feedback, we can dedup data correctly in batch way. Should add more logic for excep

Re: About github issue 639

2019-04-29 Thread Vinoth Chandar
Another option to try would be setting the spark.sql.hive.convertMetastoreParquet=false, if you are querying via the Hive table registered by Hudi. On Sat, Apr 27, 2019 at 7:02 PM Jun Zhu wrote: > Thanks for explanation vinoth, code was same list in > https://github.com/apache/incubator-hudi/iss

Re: About github issue 639

2019-04-27 Thread Jun Zhu
Thanks for explanation vinoth, code was same list in https://github.com/apache/incubator-hudi/issues/639, with setting table format to `.option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY, DataSourceWriteOptions.MOR_STORAGE_TYPE_OPT_VAL)`. And the result data was stored on aws s3. I will try more o

Re: About github issue 639

2019-04-26 Thread Vinoth Chandar
Hi, >>The duplicates was found in inflight commit parquet files. Wondering if this was expected? Spark shell should not even be reading in-flight parquet files. Can you double check if the spark access is properly configured? http://hudi.apache.org/querying_data.html#spark Inflight should be roll

Re: About github issue 639

2019-04-26 Thread Jun Zhu
Hi Vinoth, Some continue question about this thread. Here is what I found after running a few days: in .hoodie folder, due to retain policy maybe, there is an obviously line(list in the end of email). Before it the cleaned commit was archived, find duplication when query inflight commit correspond

Re: About github issue 639

2019-04-18 Thread Vinoth Chandar
Hi Jun, Responses below. >>1. Some file inflight may never reach commit? yes. the next attempt at writing will first issue a rollback to clean up such partial/leftover files first, before it begins the new commit. >>2. In occasion which inflight and parquet file generated by inflight still exist

About github issue 639

2019-04-17 Thread Jun Zhu
Hi, Link: https://github.com/apache/incubator-hudi/issues/639 Sorry , failed open https://lists.apache.org/list.html?dev@hudi.apache.org. I have some follow up questions for issue 639: So, the sequence of events is . We write parquet files and then upon > successful writing of all attempted parque