That's very Cool and i will give a try and make a test to see the impact of this.
I don't see a way to do the change cross all the pools as the default setting in cloudera manager has no query default option. On Thu, Apr 19, 2018 at 8:12 PM, Tim Armstrong <[email protected]> wrote: > https://impala.apache.org/docs/build/html/topics/impala_admission.html > also has some examples of setting default query options for pools. > > If you're using Cloudera Manager, that has a nice UI for configuring > resource pools that's more convenient than XML config files: > https://www.cloudera.com/documentation/enterprise/ > latest/topics/cm_mc_resource_pools.html#concept_xkk_l1d_wr_ > _impala_dynamic_pool_settings > > On Thu, Apr 19, 2018 at 10:09 AM, Lars Volker <[email protected]> wrote: > >> You can find documentation on the -default_query_options flag here: >> https://impala.apache.org/docs/build/html/topics/impal >> a_config_options.html >> >> Keep in mind that setting replica_preference to REMOTE will make Impala >> ignore any locality when deciding where to schedule a read. Even within the >> group of impalads that have local storage attached, Impala will pick a >> randomized assignment, optimizing for the number of bytes read by each >> node. There is currently no logic to schedule a fraction of the reads >> locally and assign the rest to remote impalads (such a scenario wasn't part >> of the considerations when working on the scheduler). >> >> >> >> On Thu, Apr 19, 2018 at 9:47 AM, Fawze Abujaber <[email protected]> >> wrote: >> >>> Thanks Tim for you quick response as usual, >>> >>> Can you send me a documentation how to do that or send me detail example >>> how to do that globally and per pool ... >>> >>> Again much appreciate your readiness to help >>> >>> On Thu, 19 Apr 2018 at 19:43 Tim Armstrong <[email protected]> >>> wrote: >>> >>>> We have a way to set global and per-pool defaults for query options. >>>> You can set default query options via the --default_query_options startup >>>> flag or if you have resource pools set up, you can set default query option >>>> values for queries submitted to each resource pool (including the default >>>> pool) >>>> >>>> On Tue, Apr 17, 2018 at 3:27 AM, Fawze Abujaber <[email protected]> >>>> wrote: >>>> >>>>> Thanks Tim, >>>>> >>>>> That's means that i cannot disable this cross the impala cluster and i >>>>> need to manage this at the query level, right? >>>>> >>>>> Is it any configuration at the cluster level to disable this? >>>>> >>>>> On Wed, Apr 4, 2018 at 3:44 AM, Tim Armstrong <[email protected] >>>>> > wrote: >>>>> >>>>>> I agree with Jim's answers. >>>>>> >>>>>> You may run into challenges if you have some Impala daemons that have >>>>>> local DataNodes and some that do not have local DataNodes. By default >>>>>> Impala always chooses a daemon with a local copy of the data, which would >>>>>> mean that daemons without a co-located DataNode might never get fragments >>>>>> scheduled on them. We do have a knob that let's you disable >>>>>> locality-based >>>>>> scheduling https://impala.apache.org/docs >>>>>> /build/html/topics/impala_replica_preference.html but that may be >>>>>> too blunt an instrument. >>>>>> >>>>>> On Tue, Apr 3, 2018 at 11:34 AM, Jim Apple <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> I think the answers are: >>>>>>> >>>>>>> 1. It depends on your workload and your network. I know some users >>>>>>> run with ONLY remote reads and still get performance they are happy >>>>>>> with. >>>>>>> Your existing nodes will continue to be able to short-circuit read. >>>>>>> >>>>>>> 2. This is highly workload-dependent. You want to try and avoid >>>>>>> spilling, obviously, but if your spinning disk can write 200MB/s it >>>>>>> would >>>>>>> take 3000 seconds, which is 50 minutes, to fill up. >>>>>>> >>>>>>> 3. I think the impalads are smart enough to not try and do a >>>>>>> short-circuit read on data that isn't local. >>>>>>> >>>>>>> On Tue, Apr 3, 2018 at 10:22 AM, Fawze Abujaber <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi All, >>>>>>>> >>>>>>>> I have reached a point in my cluster that i don't need more storage >>>>>>>> for the HDFS and i need to add processing power, i'm using Yarn,Spark >>>>>>>> and >>>>>>>> Impala on the normal nodes for processing. >>>>>>>> >>>>>>>> My questions: >>>>>>>> >>>>>>>> 1- How much the data locality will impact impala performance as i >>>>>>>> know impala rely on data locality on it's processing? >>>>>>>> >>>>>>>> 2- I have OS disk with 600GB, will this be enough to be used to >>>>>>>> spill to disk when needed? is it dependent on other factors, the impala >>>>>>>> daemon memory limit is 35GB. >>>>>>>> >>>>>>>> 3- Should i disable the *HDFS Short Circuit Read* on these nodes? >>>>>>>> >>>>>>>> Will happy to get more recommendation on this .... >>>>>>>> >>>>>>>> -- >>>>>>>> Take Care >>>>>>>> Fawze Abujaber >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Take Care >>>>> Fawze Abujaber >>>>> >>>> >>>> -- >>> Take Care >>> Fawze Abujaber >>> >> >> > -- Take Care Fawze Abujaber
