i tried "set mapred.map.tasks=30" , it does not work, it seems shark does
not support this setting.
i also tried "SET mapred.max.split.size=64000000", it does not work,too.
is there other way to control task number in shark CLI ?



2014-05-26 10:38 GMT+08:00 Aaron Davidson <ilike...@gmail.com>:

> You can try setting "mapred.map.tasks" to get Hive to do the right thing.
>
>
> On Sun, May 25, 2014 at 7:27 PM, qingyang li <liqingyang1...@gmail.com>wrote:
>
>> Hi, Aaron, thanks for sharing.
>>
>> I am using shark to execute query , and table is created on tachyon. I
>> think i can not using RDD#repartition() in shark CLI;
>> if shark support "SET mapred.max.split.size" to control file size ?
>> if yes,  after i create table, i can control file num,  then   I can
>> control task number.
>> if not , do anyone know other way to control task number in shark CLI?
>>
>>
>> 2014-05-26 9:36 GMT+08:00 Aaron Davidson <ilike...@gmail.com>:
>>
>> How many partitions are in your input data set? A possibility is that
>>> your input data has 10 unsplittable files, so you end up with 10
>>> partitions. You could improve this by using RDD#repartition().
>>>
>>> Note that mapPartitionsWithIndex is sort of the "main processing loop"
>>> for many Spark functions. It is iterating through all the elements of the
>>> partition and doing some computation (probably running your user code) on
>>> it.
>>>
>>> You can see the number of partitions in your RDD by visiting the Spark
>>> driver web interface. To access this, visit port 8080 on host running your
>>> Standalone Master (assuming you're running standalone mode), which will
>>> have a link to the application web interface. The Tachyon master also has a
>>> useful web interface, available at port 19999.
>>>
>>>
>>> On Sun, May 25, 2014 at 5:43 PM, qingyang li 
>>> <liqingyang1...@gmail.com>wrote:
>>>
>>>> hi, Mayur, thanks for replying.
>>>> I know spark application should take all cores by default. My question
>>>> is  how to set task number on each core ?
>>>> If one silce, one task,  how can i set silce file size ?
>>>>
>>>>
>>>> 2014-05-23 16:37 GMT+08:00 Mayur Rustagi <mayur.rust...@gmail.com>:
>>>>
>>>> How many cores do you see on your spark master (8080 port).
>>>>> By default spark application should take all cores when you launch it.
>>>>> Unless you have set max core configuration.
>>>>>
>>>>>
>>>>> Mayur Rustagi
>>>>> Ph: +1 (760) 203 3257
>>>>> http://www.sigmoidanalytics.com
>>>>>  @mayur_rustagi <https://twitter.com/mayur_rustagi>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, May 22, 2014 at 4:07 PM, qingyang li <liqingyang1...@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> my aim of setting task number is to increase the query speed,    and
>>>>>> I have also found " mapPartitionsWithIndex at 
>>>>>> Operator.scala:333<http://192.168.1.101:4040/stages/stage?id=17>"
>>>>>> is costing much time.  so, my another question is :
>>>>>> how to tunning 
>>>>>> mapPartitionsWithIndex<http://192.168.1.101:4040/stages/stage?id=17>
>>>>>> to make the costing time down?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2014-05-22 18:09 GMT+08:00 qingyang li <liqingyang1...@gmail.com>:
>>>>>>
>>>>>> i have added  SPARK_JAVA_OPTS+="-Dspark.
>>>>>>> default.parallelism=40 "  in shark-env.sh,
>>>>>>> but i find there are only10 tasks on the cluster and 2 tasks each
>>>>>>> machine.
>>>>>>>
>>>>>>>
>>>>>>> 2014-05-22 18:07 GMT+08:00 qingyang li <liqingyang1...@gmail.com>:
>>>>>>>
>>>>>>> i have added  SPARK_JAVA_OPTS+="-Dspark.default.parallelism=40 "  in
>>>>>>>> shark-env.sh
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-05-22 17:50 GMT+08:00 qingyang li <liqingyang1...@gmail.com>:
>>>>>>>>
>>>>>>>> i am using tachyon as storage system and using to shark to query a
>>>>>>>>> table which is a bigtable, i have 5 machines as a spark cluster, 
>>>>>>>>> there are
>>>>>>>>> 4 cores on each machine .
>>>>>>>>> My question is:
>>>>>>>>> 1. how to set task number on each core?
>>>>>>>>> 2. where to see how many partitions of one RDD?
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to