I have written nagios scripts that watch the job tracker UI and report when things take too long.
On Thu, Sep 1, 2016 at 11:08 AM, Loïc Chanel <[email protected]> wrote: > On the topic of timeout, if I may say, they are a dangerous way to deal > with requests as a "good" request may last longer than an "evil" one. > Be sure timeouts won't kill any important job before putting them into > place. You can set these things on in the components (Tez, MapReduce ...) > parameters, but not directly into YARN. At least it was the case when I > tried this (one year ago). > > Regards, > > Loïc CHANEL > System & virtualization engineer > TO - XaaS Ind - Worldline (Villeurbanne, France) > > 2016-09-01 16:52 GMT+02:00 Stephen Sprague <[email protected]>: > >> > rogue queries >> >> so this really isn't limited to just hive is it? any dbms system perhaps >> has to contend with this. even malicious rogue queries as a matter of fact. >> >> timeouts are cheap way systems handle this - assuming time is related to >> resource. i'm sure beeline or whatever client you use has a timeout feature. >> >> maybe one could write a separate service - say a governor - that watches >> over YARN (or hdfs or whatever resource is rare) - and terminates the >> process if it goes beyond a threshold. think OOM killer. >> >> but, yeah, i admittedly don't know of something out there already you can >> just tap into but YARN's Resource Manager seems to be place i'd research >> for starters. Just look look at its name. :) >> >> my unsolicited 2 cents. >> >> >> >> On Wed, Aug 31, 2016 at 10:24 PM, ravi teja <[email protected]> wrote: >> >>> Thanks Mich, >>> >>> Unfortunately we have many insert queries. >>> Are there any other ways? >>> >>> Thanks, >>> Ravi >>> >>> On Wed, Aug 31, 2016 at 9:45 PM, Mich Talebzadeh < >>> [email protected]> wrote: >>> >>>> Trt this >>>> >>>> hive.limit.optimize.fetch.max >>>> >>>> - Default Value: 50000 >>>> - Added In: Hive 0.8.0 >>>> >>>> Maximum number of rows allowed for a smaller subset of data for simple >>>> LIMIT, if it is a fetch query. Insert queries are not restricted by this >>>> limit. >>>> >>>> >>>> HTH >>>> >>>> Dr Mich Talebzadeh >>>> >>>> >>>> >>>> LinkedIn * >>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>> >>>> >>>> >>>> http://talebzadehmich.wordpress.com >>>> >>>> >>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>>> any loss, damage or destruction of data or any other property which may >>>> arise from relying on this email's technical content is explicitly >>>> disclaimed. The author will in no case be liable for any monetary damages >>>> arising from such loss, damage or destruction. >>>> >>>> >>>> >>>> On 31 August 2016 at 13:42, ravi teja <[email protected]> wrote: >>>> >>>>> Hi Community, >>>>> >>>>> Many users run adhoc hive queries on our platform. >>>>> Some rogue queries managed to fill up the hdfs space and causing >>>>> mainstream queries to fail. >>>>> >>>>> We wanted to limit the data generated by these adhoc queries. >>>>> We are aware of strict param which limits the data being scanned, but >>>>> it is of less help as huge number of user tables aren't partitioned. >>>>> >>>>> Is there a way we can limit the data generated from hive per query, >>>>> like a hve parameter for setting HDFS quotas for job level *scratch* >>>>> directory or any other approach? >>>>> What's the general approach to gaurdrail such multi-tenant cases. >>>>> >>>>> Thanks in advance, >>>>> Ravi >>>>> >>>> >>>> >>> >> >
