Orc files in hdf: NullPointerException (RunLengthIntegerReaderV2)

2019-02-11 Thread David Morin
Hello, I face to one error when I try to read my Orc files from Hive (external table) or Pig or with hive --orcfiledump .. These files are generated with Flink using the Orc Java API with Vectorize column. If I create these files locally (/tmp/...), push them to hdfs, then I can read the content of

Re: Read Hive ACID tables in Spark or Pig

2019-03-11 Thread David Morin
Hi, I've just implemented a pipeline to synchronize data between MySQL and Hive (transactional + bucketized) onto HDP cluster. I've used Orc files but without ACID properties. Then, we've created external tables on these hdfs directories that contain these delta Orc files. Then, MERGE INTO queries

How to update Hive ACID tables in Flink

2019-03-11 Thread David Morin
Hello, I've just implemented a pipeline based on Apache Flink to synchronize data between MySQL and Hive (transactional + bucketized) onto HDP cluster. Flink jobs run on Yarn. I've used Orc files but without ACID properties. Then, we've created external tables on these hdfs directories that contai

Re: How to update Hive ACID tables in Flink

2019-03-12 Thread David Morin
s designed for this case, though it only handles insert (not update), > so if you need updates you'd have to do the merge as you are currently > doing. > > Alan. > > On Mon, Mar 11, 2019 at 2:09 PM David Morin > wrote: > >> Hello, >> >> I've just

Re: How to update Hive ACID tables in Flink

2019-03-12 Thread David Morin
; > Alan. > > On Tue, Mar 12, 2019 at 12:24 PM David Morin > wrote: > >> Thanks Alan. >> Yes, the problem is fact was that this streaming API does not handle >> update and delete. >> I've used native Orc files and the next step I've planned to do

Hive Major Compaction fails (cleaning step)

2019-08-25 Thread David Morin
Hello, I've been trying "ALTER TABLE (table_name) COMPACT 'MAJOR'" on my Hive 2 environment, but it always fails (HDP 2.6.5 precisely). It seems that the merged base file is created but the delta is not deleted. I found that it was because the HiveMetastore Client can't connect to the metastore bec

Re: Hive Major Compaction fails (cleaning step)

2019-08-25 Thread David Morin
août 2019 à 07:51, David Morin a écrit : > Hello, > I've been trying "ALTER TABLE (table_name) COMPACT 'MAJOR'" on my Hive 2 > environment, but it always fails (HDP 2.6.5 precisely). It seems that the > merged base file is created but the delta is not delet

Re: Hive Major Compaction fails (cleaning step)

2019-08-25 Thread David Morin
Sorry, the same link in english: http://www.adaltas.com/en/2019/07/25/hive-3-features-tips-tricks/ Le lun. 26 août 2019 à 08:35, David Morin a écrit : > Here after a link related to hive3: > http://www.adaltas.com/fr/2019/07/25/hive-3-fonctionnalites-conseils-astuces/ > The author sug

Re: Hive Major Compaction fails (cleaning step)

2019-08-26 Thread David Morin
elong to hive. Weird, isn't it ? Thus, this is a workaround but a little bit crappy. But I'm open to any more suitable solution. Le lun. 26 août 2019 à 08:51, David Morin a écrit : > Sorry, the same link in english: > http://www.adaltas.com/en/2019/07/25/hive-3-features-tips-tricks/ >

Locks with ACID: need some clarifications

2019-09-09 Thread David Morin
Hello, I use in production HDP 2.6.5 with Hive 2.1.0 We use transactional tables and we try to ingest data in a streaming way (despite the fact we still use Hive 2) I've read some docs but I would like some clarifications concerning the use of Locks with transactional tables. Do we have to use l

Re: Locks with ACID: need some clarifications

2019-09-09 Thread David Morin
is changes in Hive 3, where update and delete also take shared locks and > a first committer wins strategy is employed instead. > > Alan. > > On Mon, Sep 9, 2019 at 8:29 AM David Morin > wrote: > >> Hello, >> >> I use in production HDP 2.6.5 with Hive 2.1.0 &g

Re: Locks with ACID: need some clarifications

2019-09-09 Thread David Morin
> > Alan. > > On Mon, Sep 9, 2019 at 10:55 AM David Morin > wrote: > >> Thanks Alan, >> >> When you say "you just can't have two simultaneous deletes in the same >> partition", simultaneous means for the same transaction ? >> If a create 2 &q

ORC: duplicate record - rowid meaning ?

2019-11-18 Thread David Morin
Hello, I'm trying to understand the purpose of the rowid column inside ORC delta file {"transactionid":11359,"bucketid":5,"*rowid*":0} Orc view: {"operation":0,"originalTransaction":11359,"bucket":5,"*rowId* ":0,"currentTransaction":11359,"row":...} I use HDP 2.6 => Hive 2 If I want to be idempot

Re: ORC: duplicate record - rowid meaning ?

2019-11-19 Thread David Morin
tid":3,"rowid":0} | *5218* | | {"transactionid":11365,"bucketid":3,"rowid":1} | *5216* | | {"transactionid":11369,"bucketid":3,"rowid":1} | *5216* | | {"transactionid":11369,"bucketid":

Re: ORC: duplicate record - rowid meaning ?

2019-12-01 Thread David Morin
your question below: Yes, the files should be ordered by: > originalTransacion, bucket, rowId triple, otherwise you will get wrong > results. > > Thanks, > Peter > > > On Nov 19, 2019, at 13:30, David Morin wrote: > > > > here after more d

compaction and stripes size

2019-12-08 Thread David Morin
Hi, When major compactions have been performed on Hive tables based on the Orc format do we have Orc stripes that have been rewritten ? I know that records have not been updated (ignored for some of ones but not updated) but concerning Stripes size, do major compactions impact these ones ? For e

Re: ORC: duplicate record - rowid meaning ?

2020-02-05 Thread David Morin
73_0199073_ hdfs:///delta_0199073_0199073_0002 And the first one contains updates (operation:1) and the second one, inserts (operation:0) Thanks for your help David On 2019/12/01 16:57:08, David Morin wrote: > Hi Peter, > > At the moment I have a pipeline based on Flink to wri

Re: ORC: duplicate record - rowid meaning ?

2020-02-05 Thread David Morin
he rows. Only insert and delete. So update > is handled as delete (old) row, insert (new/independent) row. > The delete is stored in the delete delta directories., and the file do not > have to contain the {row} struct at the end. > > Hope this helps, > Peter > > > On

Re: ORC: duplicate record - rowid meaning ?

2020-02-06 Thread David Morin
be nice to hear back from you if you found something. > > Thanks, > Peter > > > On Feb 5, 2020, at 16:55, David Morin wrote: > > > > Hello, > > > > Thanks. > > In fact I use HDP 2.6.5 and previous Orc version with transactionid for > > e

Re: ORC: duplicate record - rowid meaning ?

2020-02-25 Thread David Morin
Le jeu. 6 févr. 2020 à 12:12, David Morin a écrit : > ok, Peter > No problem. Thx > I'll keep you in touch > > On 2020/02/06 09:42:39, Peter Vary wrote: > > Hi David, > > > > I more familiar with ACID v2 :( > > What I would do is to run an update oper

compaction issue: Compaction cannot compact above this txnid

2020-06-01 Thread David Morin
Hi, I have a compaction issue on my cluster. When I force a compaction (major) on one table I get this error in Metastore logs: 2020-06-01 19:49:35,512 ERROR [-78]: compactor.CompactorMR (CompactorMR.java:run(264)) - No delta files or original files found to compact in hdfs://...hive/wareh

Re: compaction issue: Compaction cannot compact above this txnid

2020-06-02 Thread David Morin
paction for the current database/table On 2020/06/01 20:13:08, David Morin wrote: > Hi, > > I have a compaction issue on my cluster. When I force a compaction (major) on > one table I get this error in Metastore logs: > > 2020-06-01 19:49:35,512 ERROR [-78]: compactor.Com

Re: compaction issue: Compaction cannot compact above this txnid

2020-06-02 Thread David Morin
s looks very confusing when looking at > the logs." > > Thanks, > Peter > > On Jun 2, 2020, at 11:44, David Morin wrote: > > I don't get it. > The transaction id in the error message "No delta files or original files > found to compact in hdfs://... wi

Re: compaction issue: Compaction cannot compact above this txnid

2020-06-02 Thread David Morin
. 2 juin 2020 à 12:57, Peter Vary a écrit : > Hi David, > > You do not really need to run compaction every time. > Is it possible to wait for the compaction to start automatically next time? > > Thanks, > Peter > > On Jun 2, 2020, at 12:51, David Morin wrote: > > Th

Re: compaction issue: Compaction cannot compact above this txnid

2020-06-02 Thread David Morin
ta-access/content/understanding-administering-compactions.html > > Also if everything else fails, you can still issue the ALTER TABLE command > periodically using crontab. Running extra compaction will not hurt that > much. > > Thanks, > Peter > > On Jun 2, 2020, at 14:2