Hello,
Thanks.
In fact I use HDP 2.6.5 and previous Orc version with transactionid for example
and the update flag.
Sorry with the row__id iw would have been easier
So, Here after the Orc files content (with hive --orcfiledump)
hive --orcfiledump hdfs://XXXX/delta_0198994_0198994_0000/bucket_00000
{"operation":0,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":198994,"row":{...}}
hive --orcfiledump hdfs://XXXX/delta_0199073_0199073_0000/bucket_00000
{"operation":1,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":199073,"row":{...}}
=> When I execute a SELECT statement on the PK I see 2 records. The link is not
done.
Still, the meta looks good.
The only difference I see is that in Orc files I mix operations.
When I perform a MERGE statement for example 2 directories have been created
hdfs://XXXX/delta_0199073_0199073_0000
hdfs://XXXX/delta_0199073_0199073_0002
And the first one contains updates (operation:1) and the second one, inserts
(operation:0)
And here after the row__id values
(related file: hdfs://XXXX/delta_0198994_0198994_0000/bucket_00000)
row__id {"transactionid":198994,"bucketid":0,"rowid":14}
ts 1580751316822642886 (5h35)
id 764925
(related file: hdfs://XXXX/delta_0199073_0199073_0000/bucket_00000 )
row__id {"transactionid":198994,"bucketid":0,"rowid":14}
ts 1580757191368914630 (7h13)
id 764925
=> And I have one duplicated value if I execute a Query based on the id
(primary key)
So I don't understand because the transactionid=originalTransaction and all
metadatas seems to be good.
Probably a problem in the sort but I follow the rule that data are ordered by
originalTransaction,bucketId,rowId ascendingly and currentTransaction
descendingly. It works pretty well except for some tables with lot of updates.
The only thing I can see at the moment it is the fact that I mix different
types of operations in one bucket. The Merge query for example create different
directories (one per operation)
David
On 2020/02/05 12:22:28, Peter Vary <[email protected]> wrote:
> Hi David,
>
> There is no tombstone for the updated record.
> In ACID v2 there is no update for the rows. Only insert and delete. So update
> is handled as delete (old) row, insert (new/independent) row.
> The delete is stored in the delete delta directories., and the file do not
> have to contain the {row} struct at the end.
>
> Hope this helps,
> Peter
>
> > On Feb 5, 2020, at 09:39, David Morin <[email protected]> wrote:
> >
> > Hi,
> >
> > It works pretty well but... still problems sometimes occur
> > Do we have to separate operations ?
> >
> > Here after Orc files content:
> > hive --orcfiledump hdfs://XXXX/delta_0198994_0198994_0000/bucket_00000
> > {"operation":0,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":198994,"row":{...}}
> >
> > hive --orcfiledump hdfs://XXXX/delta_0199073_0199073_0000/bucket_00000
> > {"operation":1,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":199073,"row":{...}}
> >
> > => When I execute a SELECT statement on the PK I see 2 records. The link is
> > not done.
> > Still, the meta looks good.
> > The only difference I see is that in Orc files I mix operations.
> > When I perform a MERGE statement for example 2 directories have been
> > created
> > hdfs://XXXX/delta_0199073_0199073_0000
> > hdfs://XXXX/delta_0199073_0199073_0002
> > And the first one contains updates (operation:1) and the second one,
> > inserts (operation:0)
> >
> > Thanks for your help
> > David
> >
> > On 2019/12/01 16:57:08, David Morin <[email protected]> wrote:
> >> Hi Peter,
> >>
> >> At the moment I have a pipeline based on Flink to write Orc Files. These
> >> Orc Files can be read from Hive thanks to external tables and, then, a
> >> merge statement (triggered by oozie) push these data into tables managed
> >> by Hive (transactional tables => ORC). Hive version is 2.1 because this is
> >> the one provided by HDP 2.6.5.
> >> We've developed a system that write Hive Delta Files for the managed
> >> tables directly from Flink.
> >> The current streaming apis for Hive 2 are not suitable for our needs and
> >> we cannot use the new Hive 3 streaming api yet. This system uses the Flink
> >> state to store Hive metadata (originalTransaction, bucket, rowId, ..)
> >> Thanks for your reply because yes, when files are ordered by
> >> originalTransacion, bucket, rowId
> >> it works ! I just have to use 1 transaction instead of 2 at the moment and
> >> it will be ok.
> >>
> >> Thanks
> >> David
> >>
> >> On 2019/11/29 11:18:05, Peter Vary <[email protected]> wrote:
> >>> Hi David,
> >>>
> >>> Not entirely sure what you are doing here :), my guess is that you are
> >>> trying to write ACID tables outside of hive. Am I right? What is the
> >>> exact use-case? There might be better solutions out there than writing
> >>> the files by hand.
> >>>
> >>> As for your question below: Yes, the files should be ordered by:
> >>> originalTransacion, bucket, rowId triple, otherwise you will get wrong
> >>> results.
> >>>
> >>> Thanks,
> >>> Peter
> >>>
> >>>> On Nov 19, 2019, at 13:30, David Morin <[email protected]> wrote:
> >>>>
> >>>> here after more details about ORC content and the fact we have duplicate
> >>>> rows:
> >>>>
> >>>> /delta_0011365_0011365_0000/bucket_00003
> >>>>
> >>>> {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11365,"row":{"TS":1574156027915254212,"cle":5218,...}}
> >>>> {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11365,"row":{"TS":1574156027915075038,"cle":5216,...}}
> >>>>
> >>>>
> >>>> /delta_0011368_0011368_0000/bucket_00003
> >>>>
> >>>> {"operation":2,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11368,"row":null}
> >>>> {"operation":2,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11368,"row":null}
> >>>>
> >>>> /delta_0011369_0011369_0000/bucket_00003
> >>>>
> >>>> {"operation":0,"originalTransaction":11369,"bucket":3,"rowId":1,"currentTransaction":11369,"row":{"TS":1574157407855174144,"cle":5216,...}}
> >>>> {"operation":0,"originalTransaction":11369,"bucket":3,"rowId":0,"currentTransaction":11369,"row":{"TS":1574157407855265906,"cle":5218,...}}
> >>>>
> >>>> +-------------------------------------------------+-------+--+
> >>>> | row__id | cle |
> >>>> +-------------------------------------------------+-------+--+
> >>>> | {"transactionid":11367,"bucketid":0,"rowid":0} | 5209 |
> >>>> | {"transactionid":11369,"bucketid":0,"rowid":0} | 5211 |
> >>>> | {"transactionid":11369,"bucketid":1,"rowid":0} | 5210 |
> >>>> | {"transactionid":11369,"bucketid":2,"rowid":0} | 5214 |
> >>>> | {"transactionid":11369,"bucketid":2,"rowid":1} | 5215 |
> >>>> | {"transactionid":11365,"bucketid":3,"rowid":0} | 5218 |
> >>>> | {"transactionid":11365,"bucketid":3,"rowid":1} | 5216 |
> >>>> | {"transactionid":11369,"bucketid":3,"rowid":1} | 5216 |
> >>>> | {"transactionid":11369,"bucketid":3,"rowid":0} | 5218 |
> >>>> | {"transactionid":11369,"bucketid":4,"rowid":0} | 5217 |
> >>>> | {"transactionid":11369,"bucketid":4,"rowid":1} | 5213 |
> >>>> | {"transactionid":11369,"bucketid":7,"rowid":0} | 5212 |
> >>>> +-------------------------------------------------+-------+--+
> >>>>
> >>>> As you can see we have duplicate rows for column "cle" 5216 and 5218
> >>>> Do we have to keep the rowids ordered ? because this is the only
> >>>> difference I have noticed based on some tests with beeline.
> >>>>
> >>>> Thanks
> >>>>
> >>>>
> >>>>
> >>>> Le mar. 19 nov. 2019 à 00:18, David Morin <[email protected]
> >>>> <mailto:[email protected]>> a écrit :
> >>>> Hello,
> >>>>
> >>>> I'm trying to understand the purpose of the rowid column inside ORC
> >>>> delta file
> >>>> {"transactionid":11359,"bucketid":5,"rowid":0}
> >>>> Orc view:
> >>>> {"operation":0,"originalTransaction":11359,"bucket":5,"rowId":0,"currentTransaction":11359,"row":...}
> >>>> I use HDP 2.6 => Hive 2
> >>>>
> >>>> If I want to be idempotent with INSERT / DELETE / INSERT.
> >>>> Do we have to keep the same rowid ?
> >>>> It seems that when the rowid is changed during the second INSERT I have
> >>>> a duplicate row.
> >>>> For me, I can create a new rowid for the new transaction during the
> >>>> second INSERT but that seems to generate duplicate records.
> >>>>
> >>>> Regards,
> >>>> David
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>
>
>