>0.14 . Acid tables have been a real pain for us. We don¹t believe they are >production ready. At least in our use cases, Tez crashes for assorted >reasons or only assigns 1 mapper to the partition. Having delta files and >no base files borks mapper assignments.
Some of the chicken-egg problems for those were solved recently in HIVE-10114. Then TEZ-1993 is coming out in the next version of Tez, into which we¹re plugging in HIVE-7428 (no fix yet). Currently delta-only splits have 0 bytes as the ³file size², so it grouped together to make a 16Mb chunk (rather a huge single 0 sized split). Those patches are the effect of me shaving the yak from the ³1 mapper² issue. After which the writer has to follow up on HIVE-9933 to get the locality of files fixed. >name are left scattered about, borking queries. Latency is higher with >streaming than writing to an orc file in hdfs, forcing obscene quantities >of buckets and orc files smaller than any reasonable orc stripe / hdfs >block size. The compactor hangs seemingly at random for no reason we¹ve >been able to discern. I haven¹t seen these issues yet, but I am not dealing with a large volume insert rate, so haven¹t produced latency issues there. Since I work on Hive performance and I haven¹t seen too many bugs filed, so I haven¹t paid attention to the performance of ACID. Please file bugs when you find them, so that it appears on the radar for folks like me. I¹m poking about because I want a live stream into LLAP to work seamlessly & return sub-second query results when queried (pre-cache/stage & merge etc). >An orc file without a footer is junk data (or, at least, the last stripe >is junk data). I suppose my question should have been 'what will the hive >query do when it encounters this? Skip the stripe / file? Error out the >query? Something else?¹ It should throw an exception, because that¹s a corrupt ORC file. The trucking demo uses Storm without ACID - this is likely to get better once we use Apache Falcon to move the data around. Cheers, Gopal