Re: Adding impala daemons on servers without local HDFS storage

Fawze Abujaber Thu, 19 Apr 2018 10:44:21 -0700

That's very Cool and i will give a try and make a test to see the impact of
this.


I don't see a way to do the change cross all the pools as the default
setting in cloudera manager has no query default option.

On Thu, Apr 19, 2018 at 8:12 PM, Tim Armstrong <[email protected]>
wrote:

> https://impala.apache.org/docs/build/html/topics/impala_admission.html
> also has some examples of setting default query options for pools.
>
> If you're using Cloudera Manager, that has a nice UI for configuring
> resource pools that's more convenient than XML config files:
> https://www.cloudera.com/documentation/enterprise/
> latest/topics/cm_mc_resource_pools.html#concept_xkk_l1d_wr_
> _impala_dynamic_pool_settings
>
> On Thu, Apr 19, 2018 at 10:09 AM, Lars Volker <[email protected]> wrote:
>
>> You can find documentation on the -default_query_options flag here:
>> https://impala.apache.org/docs/build/html/topics/impal
>> a_config_options.html
>>
>> Keep in mind that setting replica_preference to REMOTE will make Impala
>> ignore any locality when deciding where to schedule a read. Even within the
>> group of impalads that have local storage attached, Impala will pick a
>> randomized assignment, optimizing for the number of bytes read by each
>> node. There is currently no logic to schedule a fraction of the reads
>> locally and assign the rest to remote impalads (such a scenario wasn't part
>> of the considerations when working on the scheduler).
>>
>>
>>
>> On Thu, Apr 19, 2018 at 9:47 AM, Fawze Abujaber <[email protected]>
>> wrote:
>>
>>> Thanks Tim for you quick response as usual,
>>>
>>> Can you send me a documentation how to do that or send me detail example
>>> how to do that globally and per pool ...
>>>
>>> Again much appreciate your readiness to help
>>>
>>> On Thu, 19 Apr 2018 at 19:43 Tim Armstrong <[email protected]>
>>> wrote:
>>>
>>>> We have a way to set global and per-pool defaults for query options.
>>>> You can set default query options via the --default_query_options startup
>>>> flag or if you have resource pools set up, you can set default query option
>>>> values for queries submitted to each resource pool (including the default
>>>> pool)
>>>>
>>>> On Tue, Apr 17, 2018 at 3:27 AM, Fawze Abujaber <[email protected]>
>>>> wrote:
>>>>
>>>>> Thanks Tim,
>>>>>
>>>>> That's means that i cannot disable this cross the impala cluster and i
>>>>> need to manage this at the query level, right?
>>>>>
>>>>> Is it any configuration at the cluster level to disable this?
>>>>>
>>>>> On Wed, Apr 4, 2018 at 3:44 AM, Tim Armstrong <[email protected]
>>>>> > wrote:
>>>>>
>>>>>> I agree with Jim's answers.
>>>>>>
>>>>>> You may run into challenges if you have some Impala daemons that have
>>>>>> local DataNodes and some that do not have local DataNodes. By default
>>>>>> Impala always chooses a daemon with a local copy of the data, which would
>>>>>> mean that daemons without a co-located DataNode might never get fragments
>>>>>> scheduled on them. We do have a knob that let's you disable 
>>>>>> locality-based
>>>>>> scheduling https://impala.apache.org/docs
>>>>>> /build/html/topics/impala_replica_preference.html but that may be
>>>>>> too blunt an instrument.
>>>>>>
>>>>>> On Tue, Apr 3, 2018 at 11:34 AM, Jim Apple <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> I think the answers are:
>>>>>>>
>>>>>>> 1. It depends on your workload and your network. I know some users
>>>>>>> run with ONLY remote reads and still get performance they are happy 
>>>>>>> with.
>>>>>>> Your existing nodes will continue to be able to short-circuit read.
>>>>>>>
>>>>>>> 2. This is highly workload-dependent. You want to try and avoid
>>>>>>> spilling, obviously, but if your spinning disk can write 200MB/s it 
>>>>>>> would
>>>>>>> take 3000 seconds, which is 50 minutes, to fill up.
>>>>>>>
>>>>>>> 3. I think the impalads are smart enough to not try and do a
>>>>>>> short-circuit read on data that isn't local.
>>>>>>>
>>>>>>> On Tue, Apr 3, 2018 at 10:22 AM, Fawze Abujaber <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> I have reached a point in my cluster that i don't need more storage
>>>>>>>> for the HDFS and i need to add processing power, i'm using Yarn,Spark 
>>>>>>>> and
>>>>>>>> Impala on the normal nodes for processing.
>>>>>>>>
>>>>>>>> My questions:
>>>>>>>>
>>>>>>>> 1- How much the data locality will impact impala performance as i
>>>>>>>> know impala rely on data locality on it's processing?
>>>>>>>>
>>>>>>>> 2- I have OS disk with 600GB, will this be enough to be used to
>>>>>>>> spill to disk when needed? is it dependent on other factors, the impala
>>>>>>>> daemon memory limit is 35GB.
>>>>>>>>
>>>>>>>> 3- Should i disable the  *HDFS Short Circuit Read*  on these nodes?
>>>>>>>>
>>>>>>>> Will happy to get more recommendation on this ....
>>>>>>>>
>>>>>>>> --
>>>>>>>> Take Care
>>>>>>>> Fawze Abujaber
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Take Care
>>>>> Fawze Abujaber
>>>>>
>>>>
>>>> --
>>> Take Care
>>> Fawze Abujaber
>>>
>>
>>
>


-- 
Take Care
Fawze Abujaber

Re: Adding impala daemons on servers without local HDFS storage

Reply via email to