Grant Overby (groverby) wrote: > Thanks for the link to the hive streaming bolt. We rolled our own bolt > many moons ago to utilize hive streaming. We’ve tried it against 0.13 and > 0.14 . Acid tables have been a real pain for us. We don’t believe they are > production ready. At least in our use cases, Tez crashes for assorted > reasons or only assigns 1 mapper to the partition. Having delta files and > no base files borks mapper assignments. Files containing flush in their > name are left scattered about, borking queries. Latency is higher with > streaming than writing to an orc file in hdfs, forcing obscene quantities > of buckets and orc files smaller than any reasonable orc stripe / hdfs > block size. The compactor hangs seemingly at random for no reason we’ve > been able to discern. The issues with flush files borking queries has been resolved in Hive 1.0. I haven't seen any issues with the compactor hanging at random. Could you expand on what parts hung? If you have a reproducible case it would be great to file a JIRA so we can fix it.
Alan. > > > > An orc file without a footer is junk data (or, at least, the last stripe > is junk data). I suppose my question should have been 'what will the hive > query do when it encounters this? Skip the stripe / file? Error out the > query? Something else?’ > > > > > Grant Overby > Software Engineer > Cisco.com <http://www.cisco.com/> > grove...@cisco.com > Mobile: 865 724 4910 > > > > > Think before you print.This email may contain confidential and privileged > material for the sole use of the intended recipient. Any review, use, > distribution or disclosure by others is strictly prohibited. If you are > not the intended recipient (or authorized to receive for the recipient), > please contact the sender by reply email and delete all copies of this > message. > Please click here > <http://www.cisco.com/web/about/doing_business/legal/cri/index.html> for > Company Registration Information. > > > > > > > > On 4/14/15, 4:23 PM, "Gopal Vijayaraghavan" <gop...@apache.org> wrote: > >>> What will Hive do if querying an external table containing orc files >>> that are still being written to? >> Doing that directly won¹t work at all. Because ORC files are only readable >> after the Footer is written out, which won¹t be for any open files. >> >>> I won¹t be able to test these scenarios till tomorrow and would like to >>> have some idea of what to expect this afternoon. >> If I remember correctly, your previous question was about writing ORC from >> Storm. >> >> If you¹re on a recent version of Storm, I¹d advise you to look at >> storm-hive/ >> >> https://github.com/apache/storm/tree/master/external/storm-hive >> >> >> Or alternatively, there¹s a ³hortonworks trucking demo² which does a >> partition insert instead. >> >> Cheers, >> Gopal >> >> >