Thanks everyone for helping out prakash!
On Thu, Jul 16, 2020 at 10:24 AM Sivaprakash
wrote:
> Great !!
>
> Got it working !!
>
> 'hoodie.datasource.write.recordkey.field': 'COL1,COL2',
> 'hoodie.datasource.write.keygenerator.class':
> 'org.apache.hudi.keygen.ComplexKeyGenerator',
>
> Thank you.
Great !!
Got it working !!
'hoodie.datasource.write.recordkey.field': 'COL1,COL2',
'hoodie.datasource.write.keygenerator.class':
'org.apache.hudi.keygen.ComplexKeyGenerator',
Thank you.
On Thu, Jul 16, 2020 at 7:10 PM Adam Feldman wrote:
> Hi Sivaprakash,
> To be able to specify multiple keys
Hi Sivaprakash,
To be able to specify multiple keys, in a comma separated notation, you
must also set the KEYGENERATOR_CLASS_OPT_KEY to
classOf[ComplexKeyGenerator].getName. Please see description here:
https://hudi.apache.org/docs/writing_data.html#datasource-writer.
Note: RECORDKEY_FIELD_OPT_KEY
Looks like this property does the trick
Property: hoodie.datasource.write.recordkey.field, Default: uuid
Record key field. Value to be used as the recordKey component of HoodieKey.
Actual value will be obtained by invoking .toString() on the field value.
Nested fields can be specified using the do
Hello Balaji
Thank you for your info !!
I tried those options but what I find is (I'm trying to understand how hudi
internally manages its files)
First Write
1.
('NR001', 'YXXXTRE', 'YXXXTRE_445343')
('NR002', 'TRE', 'TRE_445343')
('NR003', 'YZZZTRE', 'YZZZTRE_445343')
Commit time for
Hi Sivaprakash,
Uniqueness of records is determined by the record key you specify to hudi. Hudi
supports filtering out existing records (by record key). By default, it would
upsert all incoming records.
Please look at
https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-HowdoesHudihandledu
Yes I'm 10 records that I mentioned from Step - 1. But, I re-write whole
dataset the second time also. I see that commit_time is getting updated for
all 50 records (which I feel normal) But I'm not sure how to see/prove to
myself that the data is not growing (to 100 records; actually it should be
o
Hi Sivaprakash,
Not an expert here either, but for your second question. Yes, I believe
when writing delta to the table you must identify the actual delta yourself
and only write the new/changed/removed records. I guess we could put a
request in for hudi to take care of this, but two possible issue
Hi Sivaprakash,
So I'm by no means an expert on this, but I think you might find what
you're looking for here:
https://hudi.apache.org/docs/concepts.html
I'm not sure I fully understand Step 2 you mentioned - I'm writing 50
records out of which only 10 records have been changed - does that mean
t
This might be a basic question - I'm experimenting with Hudi (Pyspark). I
have used Insert/Upsert options to write delta into my data lake. However,
one is not clear to me
Step 1:- I write 50 records
Step 2:- Im writing 50 records out of which only *10 records have been
changed* (I'm using upsert
10 matches
Mail list logo