Re: Quota for rogue ad-hoc queries

Edward Capriolo Thu, 01 Sep 2016 08:21:42 -0700

I have written nagios scripts that watch the job tracker UI and report when
things take too long.


On Thu, Sep 1, 2016 at 11:08 AM, Loïc Chanel <loic.cha...@telecomnancy.net>
wrote:

> On the topic of timeout, if I may say, they are a dangerous way to deal
> with requests as a "good" request may last longer than an "evil" one.
> Be sure timeouts won't kill any important job before putting them into
> place. You can set these things on in the components (Tez, MapReduce ...)
> parameters, but not directly into YARN. At least it was the case when I
> tried this (one year ago).
>
> Regards,
>
> Loïc CHANEL
> System & virtualization engineer
> TO - XaaS Ind - Worldline (Villeurbanne, France)
>
> 2016-09-01 16:52 GMT+02:00 Stephen Sprague <sprag...@gmail.com>:
>
>> > rogue queries
>>
>> so this really isn't limited to just hive is it?  any dbms system perhaps
>> has to contend with this.  even malicious rogue queries as a matter of fact.
>>
>> timeouts are cheap way systems handle this - assuming time is related to
>> resource. i'm sure beeline or whatever client you use has a timeout feature.
>>
>> maybe one could write a separate service - say a governor - that watches
>> over YARN (or hdfs or whatever resource is rare) - and terminates the
>> process if it goes beyond a threshold.  think OOM killer.
>>
>> but, yeah, i admittedly don't know of something out there already you can
>> just tap into but YARN's Resource Manager seems to be place i'd research
>> for starters. Just look look at its name. :)
>>
>> my unsolicited 2 cents.
>>
>>
>>
>> On Wed, Aug 31, 2016 at 10:24 PM, ravi teja <raviort...@gmail.com> wrote:
>>
>>> Thanks Mich,
>>>
>>> Unfortunately we have many insert queries.
>>> Are there any other ways?
>>>
>>> Thanks,
>>> Ravi
>>>
>>> On Wed, Aug 31, 2016 at 9:45 PM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>> Trt this
>>>>
>>>> hive.limit.optimize.fetch.max
>>>>
>>>>    - Default Value: 50000
>>>>    - Added In: Hive 0.8.0
>>>>
>>>> Maximum number of rows allowed for a smaller subset of data for simple
>>>> LIMIT, if it is a fetch query. Insert queries are not restricted by this
>>>> limit.
>>>>
>>>>
>>>> HTH
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>> On 31 August 2016 at 13:42, ravi teja <raviort...@gmail.com> wrote:
>>>>
>>>>> Hi Community,
>>>>>
>>>>> Many users run adhoc hive queries on our platform.
>>>>> Some rogue queries managed to fill up the hdfs space and causing
>>>>> mainstream queries to fail.
>>>>>
>>>>> We wanted to limit the data generated by these adhoc queries.
>>>>> We are aware of strict param which limits the data being scanned, but
>>>>> it is of less help as huge number of user tables aren't partitioned.
>>>>>
>>>>> Is there a way we can limit the data generated from hive per query,
>>>>> like a hve parameter for setting HDFS quotas for job level *scratch*
>>>>> directory or any other approach?
>>>>> What's the general approach to gaurdrail such multi-tenant cases.
>>>>>
>>>>> Thanks in advance,
>>>>> Ravi
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Quota for rogue ad-hoc queries

Reply via email to