[GitHub] [hudi] garyli1019 commented on pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-09-04 Thread GitBox
garyli1019 commented on pull request #1848: URL: https://github.com/apache/hudi/pull/1848#issuecomment-687479902 @luffyd Thanks for reporting. I created a ticket to track this: https://issues.apache.org/jira/browse/HUDI-1270

[GitHub] [hudi] garyli1019 commented on pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-08-06 Thread GitBox
garyli1019 commented on pull request #1848: URL: https://github.com/apache/hudi/pull/1848#issuecomment-670078060 > ``` > [ERROR] Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 66.614 s <<< FAILURE! - in org.apache.hudi.functional.TestCOWDataSource > [ERROR] org.apach

[GitHub] [hudi] garyli1019 commented on pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-08-03 Thread GitBox
garyli1019 commented on pull request #1848: URL: https://github.com/apache/hudi/pull/1848#issuecomment-668398894 > to be clear, you are saying it should all be working correct? assuming you may not have conflicts with #1807 , can you please rebase this off latest masteR? Yes the cus

[GitHub] [hudi] garyli1019 commented on pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-08-03 Thread GitBox
garyli1019 commented on pull request #1848: URL: https://github.com/apache/hudi/pull/1848#issuecomment-668363766 > small summary on what the follow-up work here is? @vinothchandar For 0.6.0 release, the only one left is incremental pulling. I am currently working on it and will proba

[GitHub] [hudi] garyli1019 commented on pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-07-31 Thread GitBox
garyli1019 commented on pull request #1848: URL: https://github.com/apache/hudi/pull/1848#issuecomment-667466538 Tested on 100GB MOR table. A few partitions have 100% duplicate upsert log file, the other has parquet files only. For parquet files only partitions, the `SNAPSHOT` query is a

[GitHub] [hudi] garyli1019 commented on pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-07-30 Thread GitBox
garyli1019 commented on pull request #1848: URL: https://github.com/apache/hudi/pull/1848#issuecomment-666091024 @bvaradar I tested on Spark 2.4.0 cdh release with a small dataset, and found a broadcast configuration issue. Pushed a new commit with the fix. Now this work fine on my cluster

[GitHub] [hudi] garyli1019 commented on pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-07-29 Thread GitBox
garyli1019 commented on pull request #1848: URL: https://github.com/apache/hudi/pull/1848#issuecomment-665798153 @bvaradar Thanks for trying this out. `java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.PartitionedFile` looks strange. I will try it out on my production

[GitHub] [hudi] garyli1019 commented on pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-07-26 Thread GitBox
garyli1019 commented on pull request #1848: URL: https://github.com/apache/hudi/pull/1848#issuecomment-664055034 Added support for `PruneFilterScan`. Please review this PR again. Thank you! This is an automated message from t

[GitHub] [hudi] garyli1019 commented on pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-07-20 Thread GitBox
garyli1019 commented on pull request #1848: URL: https://github.com/apache/hudi/pull/1848#issuecomment-661642358 @vinothchandar @umehrot2 Ready for review. Thanks! This is an automated message from the Apache Git Service. To