Re: [DISCUSS] Hudi Reverse Streamer

2023-08-21 Thread Pratyaksh Sharma
Jul 11, 2023 at 2:18 PM Pratyaksh Sharma > wrote: > > > Update: I will be raising the initial draft of RFC in the next couple of > > days. > > > > On Thu, Jun 15, 2023 at 2:28 AM Rajesh Mahindra > > wrote: > > > > > Great. We also need it for u

Re: [DISCUSS] Hudi Reverse Streamer

2023-07-11 Thread Pratyaksh Sharma
Update: I will be raising the initial draft of RFC in the next couple of days. On Thu, Jun 15, 2023 at 2:28 AM Rajesh Mahindra wrote: > Great. We also need it for use cases of loading data into warehouses, and > would love to help. > > On Wed, Jun 14, 2023 at 9:06 AM Pratyaksh Sha

Re: [DISCUSS] Hudi Reverse Streamer

2023-06-14 Thread Pratyaksh Sharma
you expressed interest? > > > >On Mon, Apr 10, 2023 at 7:32 PM Léo Biscassi > wrote: > > > >> +1 > >> This would be great! > >> > >> Cheers, > >> > >> On Mon, Apr 3, 2023 at 3:00 PM Pratyaksh Sharma > >> wrote:

Re: [DISCUSS] Should we support a service to manage all deltastreamer jobs?

2023-06-14 Thread Pratyaksh Sharma
Hi, Personally I am in favour of creating such a UI where monitoring and managing configurations is just a click away. That makes life a lot easier for users. So +1 on the proposal. I remember the work for it had started long back around 2019. You can check this RFC

Re: [DISCUSS] Hudi Reverse Streamer

2023-04-03 Thread Pratyaksh Sharma
Hi Vinoth, I am aligned with the first reason that you mentioned. Better to have a separate tool to take care of this. On Mon, Apr 3, 2023 at 9:01 PM Vinoth Chandar wrote: > +1 > > I was thinking that we add a new utility and NOT extend DeltaStreamer by > adding a Sink interface, for the

Re: [DISCUSS] Hudi Reverse Streamer

2023-03-31 Thread Pratyaksh Sharma
+1 to this. I can help drive some of this work. On Fri, Mar 31, 2023 at 10:09 AM Prashant Wason wrote: > Could be useful. Also, may be useful for backup / replication scenario > (keeping a copy of data in alternate/cloud DC). > > HoodieDeltaStreamer already has the concept of "sources". This

Re: [DISCUSS] Merging Nov and Dec community sync calls

2022-11-17 Thread Pratyaksh Sharma
+1 as well. On Thu, Nov 17, 2022 at 9:57 PM sagar sumit wrote: > +1 > > On Thu, Nov 17, 2022 at 9:44 AM Sivabalan wrote: > > > +1 makes sense. > > > > On Wed, 16 Nov 2022 at 17:40, Y Ethan Guo wrote: > > > > > +1 on having a single community sync all on Dec 14 during the holiday > > > season.

Re: [DISCUSS] Build tool upgrade

2022-10-03 Thread Pratyaksh Sharma
My two cents. I have seen open source projects take more than 20-25 minutes for building on maven, so I guess we are fine for now. But we can definitely investigate and try to optimize if we can. On Sun, Oct 2, 2022 at 9:33 AM Shiyan Xu wrote: > Yes, Vinoth, agree on the efforts and impact

Re: [DISCUSS]: Integrate column stats index with all query engines

2022-08-10 Thread Pratyaksh Sharma
时间:2022年8月11日(星期四) 中午12:11 > 收件人:"dev" > 主题:Re: [DISCUSS]: Integrate column stats index with all query engines > > > > +1 for this. > > Suggested new reviewers on the RFC. > https://github.com/apache/hudi/pull/6345/files#r943073339 > > On Wed, Aug 10, 2022

[DISCUSS]: Integrate column stats index with all query engines

2022-08-10 Thread Pratyaksh Sharma
Hello community, With the introduction of multi modal index in Hudi, there is a lot of scope for improvement on the querying side. There are 2 major ways of reducing the data scan at the time of querying - partition pruning and file pruning. While with the latest developments in the community,

Re: need help with Hudi Delete

2022-07-15 Thread Pratyaksh Sharma
g", > "logicalType" : "timestamp-micros" > }, "null" ] > }, { > "name" : "date_updated_utc", > "type" : [ { > "type" : "long", > "logicalType" : &qu

Re: need help with Hudi Delete

2022-07-15 Thread Pratyaksh Sharma
Hi, Hudi is complaining because '_hoodie_is_soft_deleted' is present in the parquet file's schema but is not present in your incoming schema. >From my experience, I would say it is a standard practice to add an extra field which acts as a marker for soft deletion and needs to be persisted with

Re: [VOTE] Monthly Community Sync Time

2022-05-18 Thread Pratyaksh Sharma
I would go with 8 AM PT. If that is not feasible, then 8.30 AM. On Wed, May 18, 2022 at 7:14 AM Vinoth Govindarajan < vinoth.govindara...@gmail.com> wrote: > +1 > > I vote for 9 am as well. > > > > On Tue, May 17, 2022, 1:31 PM Vinoth Chandar < > mail.vinoth.chan...@gmail.com> > wrote: > > > +1

Re: [DISCUSS] Hudi community sync time

2022-04-28 Thread Pratyaksh Sharma
I would propose 8 AM or 8.30 AM PST though since 9 AM PST will clash with my other meetings. But happy to go with time that suits most of the folks. On Thu, Apr 28, 2022 at 3:31 AM Vinoth Govindarajan < vinoth.govindara...@gmail.com> wrote: > +1 for 9 am PST call, the current time is super early

Re: [ANNOUNCE] New Apache Hudi Committer - Zhaojing Yu

2022-04-01 Thread Pratyaksh Sharma
Congratulations Zhaojing! On Thu, Mar 31, 2022 at 8:27 PM Vinoth Chandar wrote: > Congrats! > > On Thu, Mar 31, 2022 at 4:06 AM leesf wrote: > > > Congrats! > > > > Vino Yang 于2022年3月31日周四 17:03写道: > > > > > Congrats! > > > > > > Best, > > > Vino > > > > > > Gary Li 于2022年3月25日周五 19:11写道: >

Re: Can you please add osskall...@gmail.com as a contributor

2022-02-23 Thread Pratyaksh Sharma
Sure, please go ahead with that jira. Someone from the community will give you permissions soon. :) On Wed, Feb 23, 2022 at 6:46 PM rajeh kalluri wrote: > I am a newbee and would like to contribute where I can. > > Currently looking at https://issues.apache.org/jira/browse/HUDI-96 and > would

Re: [VOTE] Release 0.10.1, release candidate #2

2022-01-24 Thread Pratyaksh Sharma
+1 - Compilation OK - Validation script OK On Sun, Jan 23, 2022 at 8:09 PM Nishith wrote: > +1 binding > > -Nishith > > > On Jan 22, 2022, at 7:49 PM, Vinoth Chandar wrote: > > > > +1 (binding) > > > > Ran my rc checks on updated link and changing my vote to a +1 > > > >> On Sat, Jan 22,

Re: [DISCUSS] Hudi Community Communication Updates

2021-11-10 Thread Pratyaksh Sharma
Hi Rajesh, I do not have any strong opinions for/against point #1. Point #2 definitely seems useful to me. I hope messages from #general channel will be formatted as respective threads in either case - if the thread started on the same day or if some reply comes on some ongoing thread. On Tue,

Re: Monthly or Bi-Monthly Dev meeting?

2021-10-22 Thread Pratyaksh Sharma
y Li wrote: > > > > > > > >> Hi Vinoth, > > > >> > > > >> Summertime 8 AM PST was 11 PM in China so I guess it works for some > > > forks, > > > >> but switching to wintertime it was 12 AM in China. It

Re: HoodieMultiTableDeltaStreamer failing due to missing file path delimiter

2021-10-08 Thread Pratyaksh Sharma
Hi Philip, I checked the configs that you are passing and it all looks good. Indeed the problem is the absence of forward slash which should not happen in general. Can you try printing the configs once and see if the configFile path is getting passed properly? Also as a workaround, you can

Re: Monthly or Bi-Monthly Dev meeting?

2021-10-05 Thread Pratyaksh Sharma
ain. Thanks Vinoth for bringing it up ! > > > > > > On Thu, Sep 23, 2021 at 12:14 PM Sivabalan wrote: > > > > > > > > +1 on monthly meet up. > > > > > > > > On Thu, Sep 23, 2021 at 11:01 AM vino yang > >

Re: Monthly or Bi-Monthly Dev meeting?

2021-09-23 Thread Pratyaksh Sharma
Monthly should be good. Been a long time since we connected in these meetings. :) On Thu, Sep 23, 2021 at 7:02 PM Vinoth Chandar < mail.vinoth.chan...@gmail.com> wrote: > 1 hour monthly is what I was proposing to be specific. > > On Thu, Sep 23, 2021 at 6:30 AM Gary Li wrote: > > > +1 for

Re: [ANNOUNCE] Apache Hudi 0.9.0 released

2021-09-01 Thread Pratyaksh Sharma
Great news! This one really feels like a major release with so many good features getting added. :) On Wed, Sep 1, 2021 at 7:19 AM Udit Mehrotra wrote: > The Apache Hudi team is pleased to announce the release of Apache Hudi > 0.9.0. > > This release comes almost 5 months after 0.8.0. It

Re: [DISCUSS] Enable Github Discussions

2021-08-11 Thread Pratyaksh Sharma
+1 I have never used it, but we can try this out. :) On Thu, Jul 15, 2021 at 9:43 AM Vinoth Chandar wrote: > Hi all, > > I would like to propose that we explore the use of github discussions. Few > other apache projects have also been trying this out. > > Please chime in > > Thanks > Vinoth >

Re: Website redesign

2021-08-10 Thread Pratyaksh Sharma
Hi Vinoth, Listing all the blog posts in the sidebar using 'blogSidebarCount: ALL' option looks good to me. We can wait for some time to see if anyone has a different opinion on this. :) On Mon, Aug 9, 2021 at 2:33 AM Vinoth Govindarajan < vinoth.govindara...@gmail.com> wrote: > Hi Pratyaksh, >

Re: Website redesign

2021-08-04 Thread Pratyaksh Sharma
Hi Vinoth, One small feedback. I feel the navigation to newer posts or older posts on Blog (https://hudi.apache.org/blog) page should be present at the top as well along with the current bottom position. Right now, one has to scroll all the way to the bottom to be able to navigate which is

Re: [DISCUSS] Hudi is the data lake platform

2021-07-29 Thread Pratyaksh Sharma
Guess we should rebrand Hudi on README.md file as well - https://github.com/apache/hudi#readme? This page still mentions the following - "Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals. Hudi manages the storage of large analytical datasets on DFS (Cloud

Re: [DISCUSS] Improving hudi user experience by providing more ways to configure hudi jobs

2021-05-22 Thread Pratyaksh Sharma
+1 from my side. Introducing new configs based on types definitely improves user experience as compared to supplying full class names. We just need to define the enums properly. On Sat, May 22, 2021 at 9:13 AM wangxianghu wrote: > Hi community: > > > > Here I want to start a discussion about

Re: Welcome new committers and PMC Members!

2021-05-12 Thread Pratyaksh Sharma
Congratulations Gary and Wenning! Well deserved! On Wed, May 12, 2021 at 8:46 PM leesf wrote: > Congratulations Gary and Wenning > > Nishith 于2021年5月12日周三 上午11:23写道: > > > Congratulations Gary and Wenning! > > > > -Nishith > > > > > On May 11, 2021, at 7:18 PM, vino yang wrote: > > > > > >

Re: [DISCUSS] Hudi is the data lake platform

2021-04-13 Thread Pratyaksh Sharma
Definitely we are doing much more than only ingesting and managing data over DFS. +1 from my side as well. :) On Tue, Apr 13, 2021 at 10:02 PM Susu Dong wrote: > I love this rebranding. Totally agree. +1 > > On Wed, Apr 14, 2021 at 1:25 AM Raymond Xu > wrote: > > > +1 The vision looks

Re: Apache Hudi 0.8.0 Released

2021-04-09 Thread Pratyaksh Sharma
Great news! On Fri, Apr 9, 2021 at 11:42 AM Sivabalan wrote: > Awesome! Great job Gary on the release work! > > On Fri, Apr 9, 2021 at 1:59 AM Gary Li wrote: > > > Thanks Vinoth. > > > > The page for 0.8.0 is ready > > https://hudi.apache.org/docs/0.8.0-spark_quick-start-guide.html. > > The

Re: Community Sync Meeting

2021-02-11 Thread Pratyaksh Sharma
10, 2021, at 9:22 PM, Pratyaksh Sharma > wrote: > > > > +1 > > > > It was easier to attend the meetings when we had them on a regular basis. > > Now if someone missed one meeting, he is prone to lose the track of when > > the next meeting is. :

Re: Community Sync Meeting

2021-02-10 Thread Pratyaksh Sharma
+1 It was easier to attend the meetings when we had them on a regular basis. Now if someone missed one meeting, he is prone to lose the track of when the next meeting is. :) On Thu, Feb 11, 2021 at 12:44 AM Raymond Xu wrote: > Vinoth, I think this could be caused by the extra step of checking

Re: Congrats to our newest committers!

2021-01-27 Thread Pratyaksh Sharma
Congratulations both of you! On Wed, Jan 27, 2021 at 8:43 PM Vinoth Chandar wrote: > Congrats both! Well deserved indeed! Glad to have you on the community. > > On Wed, Jan 27, 2021 at 7:00 AM Shi ShaoFeng > wrote: > > > Congratulations, Wang Xianghu and Li Wei! > > > > 在 2021/1/27

Re: [DISCUSS] Support multiple ordering fields

2021-01-19 Thread Pratyaksh Sharma
Hi We can use transformer to have a combination of multiple ordering fields. However custom Comparable implementation is not possible in that case. So overall a +1 from my side as well. On Tue, Jan 19, 2021 at 1:58 PM 刘金辉 <965147...@qq.com> wrote: > +1,Currently we have encountered such

Re: Accomplishments and Roadmap.

2020-12-14 Thread Pratyaksh Sharma
We can have this as part of our bi weekly meeting. :) On Tue, Dec 15, 2020 at 10:24 AM Gary Li wrote: > +1 > > Gary Li > > From: Sivabalan > Sent: Tuesday, December 15, 2020 3:34:00 AM > To: dev@hudi.apache.org > Subject: Re: Accomplishments and Roadmap. > >

Re: Congrats to our newest committers!

2020-12-03 Thread Pratyaksh Sharma
Congratulations Satish and Prashant! On Fri, Dec 4, 2020 at 12:22 AM Vinoth Chandar wrote: > Hi all, > > I am really happy to announce our newest set of committers. > > *Satish Kotha*: Satish has ramped very quickly across our entire code base > and contributed bug fixes and also drove large,

Re: [DISCUSS] New Community Weekly Sync up Time

2020-09-15 Thread Pratyaksh Sharma
Hi, Just wanted to confirm the time for this week's sync up. @Vinoth Chandar On Thu, Sep 10, 2020 at 1:58 AM Pratyaksh Sharma wrote: > Great. I request others to also please chime in so that we can finalise > the time for sync up. > > On Wed, Sep 9, 2020 at 9:00 AM Balaji Varadara

Re: [DISCUSS] Standardizing Java date time APIs

2020-09-14 Thread Pratyaksh Sharma
t; Also quote from the "About" section in https://www.joda.org/joda-time/ > > > Joda-Time is the *de facto* standard date and time library for Java prior > to Java SE 8. Users are now asked to migrate to java.time (JSR-310). > > Another motive to do 5) :) > > On Sat

Re: [DISCUSS] Standardizing Java date time APIs

2020-09-13 Thread Pratyaksh Sharma
Hi Raymond, I have a question here. Does java.time.format.DateTimeFormatter support parsing multiple input date formats like joda DateTimeFormatter does? Support for multiple input date formats was the reason we migrated from SimpleDateFormat to joda formatter. Please let us know. On Sun, Sep

Re: Hadoop & hive 3.1 Support

2020-09-10 Thread Pratyaksh Sharma
Hi Selvaraj, Currently Hudi works with Hadoop 2.7.3 and Hive 2.3.1. Jiras are already filed for extending support for hadoop 3.x and hive 3.x. 1. https://issues.apache.org/jira/browse/HUDI-6 (hive 3.x) 2. https://issues.apache.org/jira/browse/HUDI-259 (hadoop 3.x) The work on HUDI-259 is

Re: Congrats to our newest committers!

2020-09-09 Thread Pratyaksh Sharma
>>>> His most notable contributions are towards driving large parts of > > >> the > > >>>> > > >>>>>> implementation of RFC-12, Hive/Spark integration points. He has > > >> also > > >>&

Re: [DISCUSS] New Community Weekly Sync up Time

2020-09-09 Thread Pratyaksh Sharma
; > On Tue, Sep 8, 2020, 5:09 PM Vinoth Chandar wrote: > > > Anyone else wants to chime in for a new time, that works for > everyone? > > > > Personally, I can do this time. > > > > love to hear more inputs. > > > > On

Re: Hudi CLI AWS Glue & S3 Tables

2020-09-09 Thread Pratyaksh Sharma
Hi Adam, I have not used the CLI tool much, but s3 filesystem is already supported in Hudi. You may check the following class to see the list of file systems already supported - https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/fs/StorageSchemes.java .

Re: [DISCUSS] Formalizing the release process

2020-09-08 Thread Pratyaksh Sharma
Missed this thread, the plan looks good to me as well. On Wed, Sep 9, 2020 at 8:31 AM Vinoth Chandar wrote: > Would love to understand the general skepticism a bit more. > Is it rooted more on hitting those in the short term? or even in the longer > run with a better test infrastructure in

[DISCUSS] New Community Weekly Sync up Time

2020-09-02 Thread Pratyaksh Sharma
Hi everyone, Currently we are having weekly sync ups between 9 PM - 10 PM PST on tuesdays. Since I have switched my job last to last month (in India), this time is exactly clashing with the daily standup time at my current org. This is the reason I have not been able to attend the syncups for

Re: [ANNOUNCE] Apache Hudi 0.6.0 released

2020-08-24 Thread Pratyaksh Sharma
Great news! :) On Tue, Aug 25, 2020 at 10:09 AM Vinoth Chandar wrote: > - announce > > Folks, please keep the follow ups to dev@ and users@ > > > > On Mon, Aug 24, 2020 at 9:26 PM vino yang wrote: > > > Great news! > > > > Thanks to Bhavani Sudha for driving the release! And thanks to every

Re: [DISCUSS] Support for `_hoodie_record_key` as a virtual column

2020-08-21 Thread Pratyaksh Sharma
This is a good option to have. :) On Thu, Aug 20, 2020 at 11:25 PM Vinoth Chandar wrote: > IIRC _hoodie_record_key was supposed to this standardized key field. :) > Anyways, it's good to provide this option to the user. > So +1 for. RFC/further discussion. > > To level set, I want to also share

Re: [Question] How to use Hudi for migrating a historical mysql table?

2020-08-21 Thread Pratyaksh Sharma
Hi Gurudatt, You can use Debezium for migrating historical data as well. Using Debezium will enable you to migrate existing as well as new data using DeltaStreamer. I have used it in my previous org for the same use case. On Fri, Aug 21, 2020 at 12:30 PM wowtua...@gmail.com wrote: > > You can

Re: [DISCUSS] Release 0.6.0 timelines

2020-08-19 Thread Pratyaksh Sharma
Hi Allen, Yes, it's a serialization (runtime) issue. I am working on fixing it. On Wed, Aug 19, 2020 at 7:04 PM Sivabalan wrote: > That would be of great help Allen. Much appreciated. > > On Wed, Aug 19, 2020 at 9:30 AM Allen Underwood > wrote: > > > Thanks Sivabalan, > > > > That's

Re: DISCUSS code, config, design walk through sessions

2020-07-30 Thread Pratyaksh Sharma
gt; > > > > Typo: date TBD (not data :)) > > > > > > > > > > > > > > On Tue, Jul 14, 2020 at 11:20 AM Adam Feldman < > > afeldm...@gmail.com > > > > > > > > > > wrote: > > > > > &

Re: DISCUSS code, config, design walk through sessions

2020-07-13 Thread Pratyaksh Sharma
> > > > > > On Sun, Jul 12, 2020 at 4:12 PM Ranganath Tirumala < > > ranganath.tirum...@gmail.com> wrote: > > > > > So, Is this confirmed for 14th July 9:30pm PST? > > > > > > On Sat, 11 Jul 2020 at 14:32, Gurudatt Kulkarni >

Re: DISCUSS code, config, design walk through sessions

2020-07-10 Thread Pratyaksh Sharma
@Vinoth Chandar Time zones are indeed tricky. Maybe we can do a poll again to decide on the time for these sessions given the community size has increased much more now as compared to last time we decided on weekly sync timings? This might help all the new members of our community as well. :) On

Re: DISCUSS code, config, design walk through sessions

2020-07-06 Thread Pratyaksh Sharma
This is a great idea and really helpful one. On Mon, Jul 6, 2020 at 1:09 PM wrote: > +1 > It can also attract more partners to join us. > > > > On 07/06/2020 15:34, Ranganath Tirumala wrote: > +1 > > On Mon, 6 Jul 2020 at 16:59, David Sheard < > david.she...@datarefactory.com.au> > wrote: > > >

Re: [DISCUSS] Make delete marker configurable?

2020-06-28 Thread Pratyaksh Sharma
The suggestion looks good to me as well. On Sun, Jun 28, 2020 at 8:17 AM Sivabalan wrote: > +1, I just left it as a todo for future patch when I worked on it. > > On Sat, Jun 27, 2020 at 8:32 PM Bhavani Sudha > wrote: > > > Hi Raymond, > > > > I am trying to understand the use case . Can you

Re: TLP Announcement

2020-06-04 Thread Pratyaksh Sharma
That is a great news. On Thu, Jun 4, 2020 at 7:58 PM Vinoth Chandar wrote: > Hello all, > > The ASF press release announcing Apache Hudi as TLP is live! Thanks for all > your contributions! We could not have been achieved that without such a > great community effort! > > Please help spread the

Re: Apache Hudi Graduation vote on general@incubator

2020-05-22 Thread Pratyaksh Sharma
That is a great news! Congratulations to the entire community. :) On Fri, May 22, 2020 at 10:03 PM Gary Li wrote: > Huge congrats to the Hudi community! Great job! > > On Fri, May 22, 2020 at 9:30 AM Vinoth Chandar wrote: > > > Folks, I am very happy share that the graduation resolution has

Re: Subscribing to commits@

2020-05-22 Thread Pratyaksh Sharma
Yeah, I have also subscribed. On Fri, May 22, 2020 at 1:07 AM Lamber Ken wrote: > thanks, very useful to me. > > +1 recommended it to anyone. > > On 2020/02/28 01:00:36, Vinoth Chandar wrote: > > Folks, > > > > Realized some folks may not have noticed this. But > >

Re: Unable to run hudi-cli integration tests

2020-05-19 Thread Pratyaksh Sharma
Hi hddong, Thank you for your help. Looks like brew installation of spark was the issue. I set up spark on my machine using spark binaries, and it runs fine now. On Mon, May 18, 2020 at 9:02 PM Pratyaksh Sharma wrote: > Hi hddong, > > The concerned test in my

Re: Unable to run hudi-cli integration tests

2020-05-18 Thread Pratyaksh Sharma
n master branch and check if the exception exist. > > Pratyaksh Sharma 于2020年5月18日周一 上午1:30写道: > > > Hi, > > > > For me also the test runs but looking at the error, it looks like no work > > or deduping is done, which is strange. Here is the error -> > > >

Re: Unable to run hudi-cli integration tests

2020-05-17 Thread Pratyaksh Sharma
due to some config of > local spark. > I had got this Exception before, but it run successfully after `mvn clean > package ...`. > > Regards > hddong > > Pratyaksh Sharma 于2020年5月17日周日 下午8:42写道: > > > Hi hddong, > > > > Strange but nothing seems to

Re: Unable to run hudi-cli integration tests

2020-05-17 Thread Pratyaksh Sharma
Hi hddong, Strange but nothing seems to work for me. I tried doing mvn clean and then run travis tests. Also I tried running the command `mvn clean package -DskipTests -DskipITs -Pspark-shade-unbundle-avro` first and then run the test using `mvn -Dtest=ITTestRepairsCommand#testDeduplicateWithReal

Unable to run hudi-cli integration tests

2020-05-16 Thread Pratyaksh Sharma
Hi, If I try to run integration tests defined in hudi-cli package using the command - ./scripts/run_travis_tests.sh integration They always fail with the below error - ERROR org.springframework.shell.core.SimpleExecutionStrategy - Command failed java.lang.reflect.UndeclaredThrowableException

Re: [DISCUSS] Bug bash?

2020-05-08 Thread Pratyaksh Sharma
We can include this issue in our bug bash - https://github.com/apache/incubator-hudi/issues/1599. On Fri, May 8, 2020 at 12:51 AM Pratyaksh Sharma wrote: > Missed this thread. Happy to volunteer in fixing as many bugs as possible. > :) > > On Thu, May 7, 2020 at 7:53 PM Siva

Re: [DISCUSS] Bug bash?

2020-05-07 Thread Pratyaksh Sharma
Missed this thread. Happy to volunteer in fixing as many bugs as possible. :) On Thu, May 7, 2020 at 7:53 PM Sivabalan wrote: > sure. thanks for the detailed pointers. Will work on it. > > On Thu, May 7, 2020 at 1:50 AM Vinoth Chandar wrote: > > > siva, That would be great. Next step is to put

Re: [VOTE] Apache Hudi graduation to top level project

2020-05-07 Thread Pratyaksh Sharma
+1 Would love to see Hudi as a TLP. On Thu, May 7, 2020 at 10:51 PM Luciano Resende wrote: > +1 > > On Wed, May 6, 2020 at 1:58 PM Vinoth Chandar wrote: > > > > Hello all, > > > > Per our discussion on the dev mailing list ( > > >

Re: Extracting a partition field from something like a timestamp using Deltastreamer / Configurations?

2020-05-04 Thread Pratyaksh Sharma
Hi Vinoth, I have already raised a lira for this some time back - https://issues.apache.org/jira/browse/HUDI-859. I am waiting for https://github.com/apache/incubator-hudi/pull/1433 to be closed before I can start working on it. :) On Mon, May 4, 2020 at 9:53 PM Vinoth Chandar wrote: > Hi

Re: [DISCUSS] Next Release timeline

2020-04-22 Thread Pratyaksh Sharma
Major release looks good to me. On Wed, Apr 22, 2020 at 2:29 PM Bhavani Sudha wrote: > Hello all, > > I wanted to kick start the discussion on timeline and logistics for the > next release. Here are couple things we need to figure out. > >1. Should the next release be a minor or major

Re: [DISCUSS] moving blog from cwiki to website

2020-04-21 Thread Pratyaksh Sharma
+1 I have seen other Apache projects having blogs on their website like Apache Pinot. On Wed, Apr 22, 2020 at 11:05 AM Bhavani Sudha Saktheeswaran wrote: > +1 > > On Tue, Apr 21, 2020 at 10:23 PM tison wrote: > > > Hi Vinoth, > > > > +1 for moving blogs. > > > > cwiki looks belong to

Re: Manual deletion of a parquet file

2020-04-14 Thread Pratyaksh Sharma
taking this up, Pratyaksh! > > On Tue, Apr 14, 2020 at 2:58 AM Pratyaksh Sharma > wrote: > > > Hi Vinoth, > > > > Thank you for your guidance. > > > > I went through the code for RepairsCommand in Hudi-cli package which > > internally calls DedupeSparkJob.sca

Re: Manual deletion of a parquet file

2020-04-14 Thread Pratyaksh Sharma
ect first commit to actually fail since > files got deleted midway into writing. > - if both of them indeed succeeded, then then its just the duplicates > > > Thanks > Vinoth > > > > > > On Mon, Apr 13, 2020 at 6:12 AM Pratyaksh Sharma > wrote: > > > Hi, >

Manual deletion of a parquet file

2020-04-13 Thread Pratyaksh Sharma
Hi, >From my experience so far of working with Hudi, I understand that Hudi is not designed to handle concurrent writes from 2 different sources for example 2 instances of HoodieDeltaStreamer are simultaneously running and writing to the same dataset. I have experienced such a case can result in

Re: New Committer: lamber-ken

2020-04-08 Thread Pratyaksh Sharma
Congratulations lamberken! On Wed, Apr 8, 2020 at 11:10 AM Jiayi Liao wrote: > Congratulations! > > Best, > Jiayi Liao > > On Wed, Apr 8, 2020 at 12:15 PM tison wrote: > > > Congrats lamber! > > > > Best, > > tison. > > > > > > vino yang 于2020年4月8日周三 上午11:45写道: > > > > > Congrats lamber! Well

Re: New PPMC Member : Bhavani Sudha

2020-04-08 Thread Pratyaksh Sharma
Congratulations Sudha! On Wed, Apr 8, 2020 at 9:16 AM vino yang wrote: > Congrats sudha, well deserved! > > Best, > Vino > > leesf 于2020年4月8日周三 上午9:31写道: > > > Congrats sudha, well deserved! > > > > Balaji Varadarajan 于2020年4月8日周三 上午6:55写道: > > > > > Congratulations Sudha :) Well deserved.

Re: Bring back support for spark 2.3?

2020-03-21 Thread Pratyaksh Sharma
vro... (spark-avro 2.4.4 should interplay with spark 2.3?) > > Thanks > VInoth > > On Sun, Feb 23, 2020 at 2:02 PM Pratyaksh Sharma > wrote: > > > Hi, > > > > As discussed in last to last week's weekly sync, I want to put forward > this > > poin

Re: [DISCUSS] Support for complex record keys with TimestampBasedKeyGenerator

2020-03-21 Thread Pratyaksh Sharma
fic custom > > generator. If we are anticipating more such classes for specialized > types, > > you can use a generic way to support overriding key-generator for > > individual partition-fields once and for all. > > Balaji.VOn Monday, February 24, 2020,

Re: [Online Meetup] Apache Kylin × Apache Hudi Meetup, Mar. 14, 2020

2020-03-21 Thread Pratyaksh Sharma
No worries :) On Sat, Mar 21, 2020 at 2:47 PM leesf wrote: > sorry pratyaksh, it may mainly for chinese developers. :) > > Pratyaksh Sharma 于2020年3月20日周五 下午6:05写道: > > > Cannot follow it properly due to language issues. :) > > > > On Fri, Mar 20, 2020 at 3:05

Re: [Online Meetup] Apache Kylin × Apache Hudi Meetup, Mar. 14, 2020

2020-03-20 Thread Pratyaksh Sharma
Cannot follow it properly due to language issues. :) On Fri, Mar 20, 2020 at 3:05 PM leesf wrote: > PDF is now available. Please check it out. > > Using Apache Hudi to build the next-generation data lake and its > application in medical big data >

Re: Need clarity on these test cases in TestHoodieDeltaStreamer

2020-03-20 Thread Pratyaksh Sharma
@Balaji As part of bug fix, the addition of 200 records has been removed from our code base. I guess there is no need of documenting this. If you still feel there is a need, please let me know. Will do the needful. On Wed, Mar 4, 2020 at 1:07 PM Pratyaksh Sharma wrote: > Sure. I will share

Re: [DISCUSS] Consider defaultValue of field when writing to Hudi dataset

2020-03-20 Thread Pratyaksh Sharma
https://issues.apache.org/jira/browse/HUDI-727 tracks this. On Tue, Feb 25, 2020 at 2:23 PM Pratyaksh Sharma wrote: > Hi Vinoth, > > > in avro you define it as an optional field (union of type and null).. > Yes that is correct. But imagine if someone does not want to populate &g

Re: Schema Reference in HudiDeltaStreamer

2020-03-16 Thread Pratyaksh Sharma
t;string"],"default":null} > > On Mon, Mar 16, 2020 at 2:22 PM Pratyaksh Sharma > wrote: > > > How have you mentioned the field in your schema file? Is it a nullable > > field or is it having default value? > > > > On Mon, Mar 16, 2020

Re: Schema Reference in HudiDeltaStreamer

2020-03-16 Thread Pratyaksh Sharma
erged schema, It is not working. > I didn't try HiveSync Tool for this. Is there any option to refer glue? > > > On Mon, Mar 16, 2020 at 12:56 PM Pratyaksh Sharma > wrote: > > > Hi Raghvendra, > > > > As mentioned in the FAQ, this error occurs when your schema h

Re: Schema Reference in HudiDeltaStreamer

2020-03-16 Thread Pratyaksh Sharma
ma could you please help me into this? > > Thanks > Raghvendra > > On Sun, 15 Mar 2020 at 6:08 PM, Pratyaksh Sharma > wrote: > > > This might help - Caused by: org.apache.parquet.io > .InvalidRecordException: > > Parquet/Avro schema mismatch: Avro field 'col1' no

Re: Schema Reference in HudiDeltaStreamer

2020-03-15 Thread Pratyaksh Sharma
This might help - Caused by: org.apache.parquet.io.InvalidRecordException: Parquet/Avro schema mismatch: Avro field 'col1' not found

Re: Need clarity on these test cases in TestHoodieDeltaStreamer

2020-03-03 Thread Pratyaksh Sharma
e: > >> I will sync up with Pratyaksh offline on this. >> >> On Thu, Feb 27, 2020 at 11:24 PM Pratyaksh Sharma >> wrote: >> >> > Hi Balaji, >> > >> > Right now I am facing some different issue in the same test case. The >> >

Re: Need clarity on these test cases in TestHoodieDeltaStreamer

2020-02-27 Thread Pratyaksh Sharma
tyaksh, would you mind opening a PR to documenting it. > Balaji.V > > Sent from Yahoo Mail for iPhone > > > On Wednesday, February 26, 2020, 11:14 PM, Pratyaksh Sharma < > pratyaks...@gmail.com> wrote: > > Hi, > > I figured out the issue yesterday. Thank you for h

Re: Need clarity on these test cases in TestHoodieDeltaStreamer

2020-02-26 Thread Pratyaksh Sharma
t; > I don't remember the reason behind this. > Sivabalan, Can you explain the reason when you get a chance. > Thanks,Balaji.V > On Wednesday, February 26, 2020, 06:03:53 AM PST, Pratyaksh Sharma < > pratyaks...@gmail.com> wrote: > > Anybody got a chance to look at th

Re: Need clarity on these test cases in TestHoodieDeltaStreamer

2020-02-26 Thread Pratyaksh Sharma
Anybody got a chance to look at this? On Mon, Feb 24, 2020 at 1:04 AM Pratyaksh Sharma wrote: > Hi, > > While working on one of my PRs, I am stuck with the following test cases > in TestHoodieDeltaStreamer - > 1. testUpsertsCOWContinuousMode > 2. testUpsertsMORContinuou

Re: [DISCUSS] Adding common errors and solutions to FAQs

2020-02-25 Thread Pratyaksh Sharma
gt; On Mon, Feb 24, 2020 at 4:34 AM Pratyaksh Sharma > wrote: > > > Hi Vinoth, > > > > Have added few more issues which I faced while adopting Hudi. Please > have a > > look. > > > > I guess everyone in community should make it a habit to try adding

Re: [DISCUSS] Consider defaultValue of field when writing to Hudi dataset

2020-02-25 Thread Pratyaksh Sharma
vroUtils.java#L124 > seems like it's being copied over? > > On Mon, Feb 24, 2020 at 4:21 AM Pratyaksh Sharma > wrote: > > > Hi, > > > > Currently we recommend users to evolve schema in backwards compatible > way. > > When one is trying to evolve sche

Re: [DISCUSS] Adding common errors and solutions to FAQs

2020-02-24 Thread Pratyaksh Sharma
errors and like Siva mentioned, it will be easier to fix common issues. On Fri, Feb 21, 2020 at 11:33 PM Vinoth Chandar wrote: > Thanks Pratyaksh! Do you have any suggestions on priming this page with > many more common issues? > > On Thu, Feb 20, 2020 at 12:54 AM Pratyaksh Sha

[DISCUSS] Support for complex record keys with TimestampBasedKeyGenerator

2020-02-24 Thread Pratyaksh Sharma
Hi, We have TimestampBasedKeyGenerator for defining custom partition paths and we have ComplexKeyGenerator for supporting having combination of fields as record key or partition key. However we do not have support for the case where one wants to have combination of fields as record key along

Bring back support for spark 2.3?

2020-02-23 Thread Pratyaksh Sharma
Hi, As discussed in last to last week's weekly sync, I want to put forward this point on our mailing list also. Since with 0.5.1 release, we have upgraded spark to 2.4 in our master branch, we are facing difficulties after rebasing our codebase with master. At our organisation we are using spark

Need clarity on these test cases in TestHoodieDeltaStreamer

2020-02-23 Thread Pratyaksh Sharma
Hi, While working on one of my PRs, I am stuck with the following test cases in TestHoodieDeltaStreamer - 1. testUpsertsCOWContinuousMode 2. testUpsertsMORContinuousMode For both of them, at line [1] and [2], we are adding 200 to totalRecords while asserting record count and distance count

Multiple clean instants with same timestamp

2020-02-23 Thread Pratyaksh Sharma
Hi, I recently came across a strange issue for table T. For the same timestamp, 2 clean instants were present in .hoodie folder, one of them in completed state and other one in inflight state. As a result, if I try to run cleaner or DeltaStreamer for this table T, it was failing with the below

Re: updatePartitionsToTable() is time consuming and redundant.

2020-02-19 Thread Pratyaksh Sharma
Hi Balaji, We are using Hadoop 3.1.0. Here is the output of the function you wanted to see - Path is : /data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191117 Is Absolute :true Stripped Path =/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191117 Stripped path does not

Re: [DISCUSS] Adding common errors and solutions to FAQs

2020-02-17 Thread Pratyaksh Sharma
igh level).. > Any volunteers to drive this? we can keep that updated as new issues come > up here or elsewhere. > > On Sun, Feb 9, 2020 at 3:25 AM Pratyaksh Sharma > wrote: > > > This would a valuable addition to our FAQs page. I also have few errors > > that I faced

Re: Re: Please welcome our new PPMCs and Committer

2020-02-16 Thread Pratyaksh Sharma
Congratulations Leesf, Vino and Siva. Well deserved all of you. :) On Sat, Feb 15, 2020 at 6:05 PM leesf wrote: > Thanks you guys, it is really a great honor for me and I am very excited. > Really happy to participate in such a nice community, also will continue > making hudi a better data lake

Re: [DISCUSS] Adding common errors and solutions to FAQs

2020-02-09 Thread Pratyaksh Sharma
This would a valuable addition to our FAQs page. I also have few errors that I faced while adopting Hudi, I can help adding all of them. On Sun, Feb 9, 2020 at 4:12 PM vino yang wrote: > +1 from my side, it is valuable. > > Best, > Vino > > leesf 于2020年2月9日周日 上午11:25写道: > > > Hi Sivabalan, > >

Re: Regards to Uber Schema Registry ( Hive Schema + Schema Registry )

2020-02-07 Thread Pratyaksh Sharma
eated with original target schema and newSchema simply includes hoodie metadata fields. So I feel this check is redundant. On Fri, Feb 7, 2020 at 2:08 PM Pratyaksh Sharma wrote: > @Vinoth Chandar How does re-ordering affect here like > you mentioned? Parquet files access fields by name rathe

  1   2   >