Re: [IMP] Understanding present state and planning ahead

2019-05-01 Thread Vinoth Chandar
Another weekly bump! It'd help greatly if you can take the survey. (we have some interesting results already). And also please share your use-case here https://github.com/apache/incubator-hudi/issues/661 , so we can incorporate into the powered-by page. This would be much appreciated, since knowi

Re: DISCUSS HUDI-106 Dynamic bloom filters

2019-05-01 Thread nishith agarwal
That's a good pointer. Let me take this up and look into it. -Nishith On Sat, Apr 27, 2019 at 10:55 PM Vinoth Chandar wrote: > > https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/util/bloom/DynamicBloomFilter.html > is > something a team mate pointed to me. > Cannot find anything else

Re: NPE for Merge On Read use case in quickstart

2019-05-01 Thread Bhavani Sudha Saktheeswaran
Hi Tristan, you might want to include "--schemaprovider-class com.uber.hoodie.utilities.schema.FilebasedSchemaProvider" in the spark submit command. I also faced similar issue when I tried the Docker demo. I think there is a PR pending for Docs that includes this change. Thanks, Sudha On Wed, May

NPE for Merge On Read use case in quickstart

2019-05-01 Thread Baker, Tristan
Hi, Been working through the quickstart here: https://hudi.apache.org/docker_demo.html I get an NPE when running the merge on read spark job. Here’s the spark-submit command (copied from the quickstart instructions) https://gist.github.com/tcbakes/4a11cff217fb8a98205b4cc46cd29750 Here’s the

[DISCUSS] HIP 3: Timeline Service with Incremental File System View Syncing 

2019-05-01 Thread vbal...@apache.org
Hi All, I created a HIP for file-system view caching and metadata management. https://cwiki.apache.org/confluence/display/HUDI/HIP-3 Please review and let me know your comments. Thanks,Balaji.V

Re: Starting point for contribution

2019-05-01 Thread Abhishek Sharma
Thanks. I am assigning this Jira to myself. On Wed, May 1, 2019 at 7:21 AM Vinoth Chandar wrote: > sg. https://issues.apache.org/jira/browse/HUDI-101 is a very simple > starter > task > > On Wed, May 1, 2019 at 4:05 AM Abhishek Sharma > > wrote: > > > Thanks, VBalaji & Vinoth. Will go through t

Re: Starting point for contribution

2019-05-01 Thread Vinoth Chandar
sg. https://issues.apache.org/jira/browse/HUDI-101 is a very simple starter task On Wed, May 1, 2019 at 4:05 AM Abhishek Sharma wrote: > Thanks, VBalaji & Vinoth. Will go through the Jira list and will pick one. > > Thanks > > On Tue, Apr 30, 2019 at 10:39 PM Vinoth Chandar wrote: > > > +1 if y

Re: multi-partitioned hudi table | partitions not created

2019-05-01 Thread Vinoth Chandar
I recommend using the HiveSync tool to manage the registration and not do it manually. Otherwise, what you see are expected behavior.. part1, part2 will be on the file, if it was on the data frame On Mon, Apr 29, 2019 at 11:11 PM SATISH SIDNAKOPPA < satish.sidnakoppa...@gmail.com> wrote: > files

Re: About github issue 639

2019-05-01 Thread Vinoth Chandar
Hi Jun, I was able to track that the HoodieSparkSQLWriter (common path for streaming sink and batch datasource) ends up calling DataSourceUtils.createHoodieClient, which creates the client as follows return new HoodieWriteClient<>(jssc, writeConfig); There is a third parameter that denotes wheth

Re: Starting point for contribution

2019-05-01 Thread Abhishek Sharma
Thanks, VBalaji & Vinoth. Will go through the Jira list and will pick one. Thanks On Tue, Apr 30, 2019 at 10:39 PM Vinoth Chandar wrote: > +1 if you can find a JIRA that interests you, we 'd be happy to discuss it > over mailing list before you begin working and offer some guidance if > needed