RE: Questime abou the Payload in Hudi

2019-05-17 Thread FIXED-TERM Cheng Yuanbin (CR/PJ-AI-S1)
Hi, I am very interested in fix this behavior. Actually, I have implemented a new Payload for our use case which can upload both the delta and the parquest record. However, there still contain some problem in that implementation. Go back to the question 3, in the Payload, there are three

Re: Upgrade HUDI to Hive 2.x

2019-05-17 Thread Vinoth Chandar
I am in favor of deprecating Hive 1.x unless someone has a strong objection. Most cloud offerings like EMR/Data Proc all support Hive 2.x and Hive 3.x is going to grow. This seems like a move in the right direction /thanks/vinoth On Fri, May 17, 2019 at 11:55 AM nishith agarwal wrote: > All, >

Upgrade HUDI to Hive 2.x

2019-05-17 Thread nishith agarwal
All, Is anyone using Hudi with Hive 1.x ? Currently, Hudi has a dependency on Hive 1.x and works against Hive 2.x by using specific profiles. There are non-backwards compatible changes in the HiveRecordReader for Hive 1.x vs Hive 2.x. I'm planning to upgrade to Hive 2.x which would essentially

Re: Read RO table in Spark as hive table | No records returned

2019-05-17 Thread Vinoth Chandar
Glad you got it working.. Any reason why you are not using the Hive sync tool to manage the table creation/registration to Hive? On Fri, May 17, 2019 at 7:04 AM satish.sidnakoppa...@gmail.com < satish.sidnakoppa...@gmail.com> wrote: > > > On 2019/05/17 12:45:26, satish.sidnakoppa...@gmail.com <

Re: Questime abou the Payload in Hudi

2019-05-17 Thread Vinoth Chandar
Hi, What you mentioned is correct. @Override public Optional combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema) throws IOException { // combining strategy here trivially ignores currentValue on disk and writes this record return getInsertValue(schema); } I think we

Re: Questime abou the Payload in Hudi

2019-05-17 Thread Vinoth Chandar
Thanks for the clear example. Let me check this out and get back shortly. On Thu, May 16, 2019 at 5:29 PM Yanjia Li wrote: > Hello Vinoth, > > I could add an example here to clarify this question. > > We have DF1{id:1, ts: 9} and DF2{id:1, ts:1; id:1, ts:2}. We save DF1 > first, then upsert DF2

Re: Read RO table in Spark as hive table | No records returned

2019-05-17 Thread satish . sidnakoppa . it
On 2019/05/17 12:45:26, satish.sidnakoppa...@gmail.com wrote: > > > On 2019/05/17 12:37:10, satish.sidnakoppa...@gmail.com > wrote: > > Hi Team, > > > > Data is returned when queried from hive. > > But not in spark ,Could you assist in finding the gap. > > > > Details below > > > >

Re: Read RO table in Spark as hive table | No records returned

2019-05-17 Thread satish . sidnakoppa . it
On 2019/05/17 12:37:10, satish.sidnakoppa...@gmail.com wrote: > Hi Team, > > Data is returned when queried from hive. > But not in spark ,Could you assist in finding the gap. > > Details below > > **Approach 1 --- > successful > >

Read RO table in Spark as hive table | No records returned

2019-05-17 Thread satish . sidnakoppa . it
Hi Team, Data is returned when queried from hive. But not in spark ,Could you assist in finding the gap. Details below **Approach 1 --- successful select * from emp_cow limit 2; 20190503171506 20190503171506_0_4244 default