I would recommend looking at Hive Strict Mode first. It helps users think about their queries by throwing errors for certain operations that may be unexpectedly bad, like a full table scan over a partitioned fact table, when only a certain subset may be needed.
http://my.safaribooksonline.com/book/databases/hadoop/9781449326944/10dot-tuning/strict_mode_tuning_html On Apr 23, 2014 9:51 AM, "Thomas Larsson" <[email protected]> wrote: > Hello. > > We recently had a user that ran an ad-hoc hive query with a JOIN clause > without an ON-predicate, resulting in a huge resultset that then resulted > in our hdfs storage becoming full. > > I am wondering what support and strategies there are to help limit the > damage that ad-hoc queries like this can do. I have looked at HDFS quotas > which might be of some help. Is there anything else. > > Any tips and good links would be appreciated. > > Best Regards > Thomas > >
