Re: [DISCUSS] querying commit metadata from spark DataSource

2020-06-01 Thread Satish Kotha
Got it. I'll look into implementation choices for creating a new data source. Appreciate all the feedback. On Mon, Jun 1, 2020 at 7:53 PM Vinoth Chandar wrote: > >Is it to separate data and metadata access? > Correct. We already have modes for querying data using format("hudi"). I > feel it

Re: [DISCUSS] querying commit metadata from spark DataSource

2020-06-01 Thread Vinoth Chandar
>Is it to separate data and metadata access? Correct. We already have modes for querying data using format("hudi"). I feel it will get very confusing to mix data and metadata in the same source.. for e.g a lot of options we support for data may not even make sense for the TimelineRelation. >This

Re: [DISCUSS] querying commit metadata from spark DataSource

2020-06-01 Thread Satish Kotha
Thanks for the feedback. What is the advantage of doing spark.read.format(“hudi-timeline”).load(basepath) as opposed to doing new relation? Is it to separate data and metadata access? Are you looking for similar functionality as HoodieDatasourceHelpers? > This class seems like a list of static

Re: [DISSCUSS] Trigger a Travis-CI rebuild without pushing a commit

2020-06-01 Thread Vinoth Chandar
Great! I left some comment on the PR. around licensing and maintenance overhead. On Sun, May 31, 2020 at 11:51 PM Lamber Ken wrote: > Hi forks, > > Learned from travis and github actions api docs these days, I used my > project as a demo[1], > the demo pull request will always fail, please use

Re: [DISCUSS] querying commit metadata from spark DataSource

2020-06-01 Thread Vinoth Chandar
Also please take a look at https://issues.apache.org/jira/browse/HUDI-309. This was an effort to make the timeline more generalized for querying (for a different purpose).. but good to revisit now.. On Sun, May 31, 2020 at 11:04 PM vbal...@apache.org wrote: > > I strongly recommend using a

Re: How to extend the timeline server schema to accommodate business metadata

2020-06-01 Thread Vinoth Chandar
Hi Mario, Thanks for the detailed explanation. Hudi already allows extra metadata to be written atomically with each commit i.e write operation. In fact, that is how we track checkpoints for our delta streamer tool.. It may not solve the need for querying the data together with this information.

Re: How to extend the timeline server schema to accommodate business metadata

2020-06-01 Thread Mario de Sá Vera
Hi Balaji, business metadata are all types of info related to the business where the Hudi solution is being used... from a COB (ie close of business date) related to that commit to any qualifier related to that commit that might be useful to be associated with that commit id. If we enable the

Re: [DISSCUSS] Trigger a Travis-CI rebuild without pushing a commit

2020-06-01 Thread Lamber Ken
Hi forks, Learned from travis and github actions api docs these days, I used my project as a demo[1], the demo pull request will always fail, please use "rerun tests" command, it will rerun tests automatically. if you are interested, try it. Best, Lamber-Ken [1]

Re: [DISCUSS] querying commit metadata from spark DataSource

2020-06-01 Thread vbal...@apache.org
I strongly recommend using a separate datasource relation (option 1) to query timeline. It is elegant and fits well with spark APIs. Thanks.Balaji.VOn Saturday, May 30, 2020, 01:18:45 PM PDT, Vinoth Chandar wrote: Hi satish, Are you looking for similar functionality as