Hi,
What is the design that can be used/implemented when we re-ingest the data
without affecting incremental query?
- Is it possible to maintain a delta dataset across partitions (
hoodie.datasource.write.partitionpath.field) ? In my case it is a date.
- Can I do a snapshot query on a
Hello
What I see is; If I we want to implement GDPR (
https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-HowdoIdeleterecordsinthedatasetusingHudi)
then old version of commit files should be removed (otherwise incremental
query with point-time options can still read the data which is deleted
2020 at 5:07 PM Balaji Varadarajan
wrote:
> Hi Sivaprakash,
> You can configure cleaner to clean the older file versions which contain
> those records to be deleted. You can take a look at
> https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-WhatdoestheHudicleanerdo
>
Hello
Do we have any option to delete a record from every partition? Which mean I
want to completely wipe out particular record from complete data set (first
commit, all the changes, delta commit etc)
Currently, when I delete it affects only the last commit but if I do an
incremental query on th
Great !!
Got it working !!
'hoodie.datasource.write.recordkey.field': 'COL1,COL2',
'hoodie.datasource.write.keygenerator.class':
'org.apache.hudi.keygen.ComplexKeyGenerator',
Thank you.
On Thu, Jul 16, 2020 at 7:10 PM Adam Feldman wrote:
> Hi Si
dot notation eg: a.b.c
However I couldn't provide more than one column like this... COL1.COL2
'hoodie.datasource.write.recordkey.field: 'COL1.COL2'
Anything wrong with the syntax? (tried with comma as well)
On Thu, Jul 16, 2020 at 6:41 PM Sivaprakash
wrote:
> Hello B
ates 3 records again? I thought it would
create only the 2nd record
- Trying to understand the storage volume efficiency here
- Some configuration has to be enabled to fix this?
configuration that I use
- COPY_ON_WRITE, Append, Upsert
- First Column (NR001) is configured as
*hoodie
should be
only 50 records).
On Thu, Jul 16, 2020 at 4:01 PM Allen Underwood
wrote:
> Hi Sivaprakash,
>
> So I'm by no means an expert on this, but I think you might find what
> you're looking for here:
> https://hudi.apache.org/docs/concepts.html
>
> I'm no
This might be a basic question - I'm experimenting with Hudi (Pyspark). I
have used Insert/Upsert options to write delta into my data lake. However,
one is not clear to me
Step 1:- I write 50 records
Step 2:- Im writing 50 records out of which only *10 records have been
changed* (I'm using upsert