Re: Number of executors change during job running

2016-05-02 Thread Vikash Pareek
Hi Bill, You can try DirectStream and increase # of partition to kafka. then input Dstream will have the partitions as per kafka topic without using re-partitioning. Can you please share your event timeline chart from spark ui. You need to tune your configuration as per computation. Spark ui

Re: Number of executors in spark-1.6 and spark-1.5

2016-04-10 Thread Vikash Pareek
Hi Talebzadeh, Thank for your quick response. >>in 1.6, how many executors do you see for each node? I have1 executor for 1 node with SPARK_WORKER_INSTANCES=1. >>in standalone mode how are you increasing the number of worker instances. Are you starting another slave on each node? No, I am not

Re: Number of executors in spark-1.6 and spark-1.5

2016-04-10 Thread Mich Talebzadeh
Hi, in 1.6, how many executors do you see for each node? in standalone mode how are you increasing the number of worker instances. Are you starting another slave on each node? HTH Dr Mich Talebzadeh LinkedIn *

Re: Number of executors in Spark - Kafka

2016-01-21 Thread Cody Koeninger
6 kafka partitions will result in 6 spark partitions, not 6 spark rdds. The question of whether you will have a backlog isn't just a matter of having 1 executor per partition. If a single executor can process all of the partitions fast enough to complete a batch in under the required time, you

Re: number of executors in sparkR.init()

2015-12-25 Thread Felix Cheung
The equivalent for spark-submit --num-executors should beĀ  spark.executor.instancesWhen use in SparkConf?http://spark.apache.org/docs/latest/running-on-yarn.html Could you try setting that with sparkR.init()? _ From: Franc Carter Sent:

Re: number of executors in sparkR.init()

2015-12-25 Thread Franc Carter
Thanks, that works cheers On 26 December 2015 at 16:53, Felix Cheung wrote: > The equivalent for spark-submit --num-executors should be > spark.executor.instances > When use in SparkConf? > http://spark.apache.org/docs/latest/running-on-yarn.html > > Could you try

Re: number of executors

2015-05-18 Thread edward cui
Oh BTW, it's spark 1.3.1 on hadoop 2.4. AIM 3.6. Sorry for lefting out this information. Appreciate for any help! Ed 2015-05-18 12:53 GMT-04:00 edward cui edwardcu...@gmail.com: I actually have the same problem, but I am not sure whether it is a spark problem or a Yarn problem. I set up a

Re: number of executors

2015-05-18 Thread Sandy Ryza
*All On Mon, May 18, 2015 at 9:07 AM, Sandy Ryza sandy.r...@cloudera.com wrote: Hi Xiaohe, The all Spark options must go before the jar or they won't take effect. -Sandy On Sun, May 17, 2015 at 8:59 AM, xiaohe lan zombiexco...@gmail.com wrote: Sorry, them both are assigned task

Re: number of executors

2015-05-18 Thread Sandy Ryza
Hi Xiaohe, The all Spark options must go before the jar or they won't take effect. -Sandy On Sun, May 17, 2015 at 8:59 AM, xiaohe lan zombiexco...@gmail.com wrote: Sorry, them both are assigned task actually. Aggregated Metrics by Executor Executor IDAddressTask TimeTotal TasksFailed

Re: number of executors

2015-05-18 Thread edward cui
I actually have the same problem, but I am not sure whether it is a spark problem or a Yarn problem. I set up a five nodes cluster on aws emr, start yarn daemon on the master (The node manager will not be started on default on the master, I don't want to waste any resource since I have to pay).

Re: number of executors

2015-05-18 Thread xiaohe lan
Yeah, I read that page before, but it does not mention the options should come before the application jar. Actually, if I put the --class option before the application jar, I will get ClassNotFoundException. Anyway, thanks again Sandy. On Tue, May 19, 2015 at 11:06 AM, Sandy Ryza

Re: number of executors

2015-05-18 Thread xiaohe lan
Hi Sandy, Thanks for your information. Yes, spark-submit --master yarn --num-executors 5 --executor-cores 4 target/scala-2.10/simple-project_2.10-1.0.jar --class scala.SimpleApp is working awesomely. Is there any documentations pointing to this ? Thanks, Xiaohe On Tue, May 19, 2015 at 12:07 AM,

Re: number of executors

2015-05-18 Thread Sandy Ryza
Awesome! It's documented here: https://spark.apache.org/docs/latest/submitting-applications.html -Sandy On Mon, May 18, 2015 at 8:03 PM, xiaohe lan zombiexco...@gmail.com wrote: Hi Sandy, Thanks for your information. Yes, spark-submit --master yarn --num-executors 5 --executor-cores 4

Re: number of executors

2015-05-17 Thread Akhil Das
Did you try --executor-cores param? While you submit the job, do a ps aux | grep spark-submit and see the exact command parameters. Thanks Best Regards On Sat, May 16, 2015 at 12:31 PM, xiaohe lan zombiexco...@gmail.com wrote: Hi, I have a 5 nodes yarn cluster, I used spark-submit to submit

Re: number of executors

2015-05-17 Thread xiaohe lan
Sorry, them both are assigned task actually. Aggregated Metrics by Executor Executor IDAddressTask TimeTotal TasksFailed TasksSucceeded TasksInput Size / RecordsShuffle Write Size / RecordsShuffle Spill (Memory)Shuffle Spill (Disk)1host1:61841.7 min505640.0 MB / 12318400382.3 MB / 121007701630.4

Re: number of executors

2015-05-17 Thread xiaohe lan
bash-4.1$ ps aux | grep SparkSubmit xilan 1704 13.2 1.2 5275520 380244 pts/0 Sl+ 08:39 0:13 /scratch/xilan/jdk1.8.0_45/bin/java -cp

Re: number of executors

2015-05-16 Thread Ted Yu
What Spark release are you using ? Can you check driver log to see if there is some clue there ? Thanks On Sat, May 16, 2015 at 12:01 AM, xiaohe lan zombiexco...@gmail.com wrote: Hi, I have a 5 nodes yarn cluster, I used spark-submit to submit a simple app. spark-submit --master yarn

Re: Number of Executors per worker process

2015-03-02 Thread Spico Florin
Hello! Thank you very much for your response. In the book Learning Spark I found out the following sentence: Each application will have at most one executor on each worker So worker can have one or none executor process spawned (perhaps the number depends on the workload distribution). Best

Re: Number of Executors per worker process

2015-02-26 Thread Jeffrey Jedele
Hi Spico, Yes, I think an executor core in Spark is basically a thread in a worker pool. It's recommended to have one executor core per physical core on your machine for best performance, but I think in theory you can create as many threads as your OS allows. For deployment: There seems to be

Re: Number of executors and tasks

2014-11-26 Thread Akhil Das
1. On HDFS files are treated as ~64mb in block size. When you put the same file in local file system (ext3/ext4) it will be treated as different (in your case it looks like ~32mb) and that's why you are seeing 9 output files. 2. You could set *num-executors *to increase the number of executor

Re: Number of executors and tasks

2014-11-26 Thread Akhil Das
This one would give you a better understanding http://stackoverflow.com/questions/24622108/apache-spark-the-number-of-cores-vs-the-number-of-executors Thanks Best Regards On Wed, Nov 26, 2014 at 10:32 PM, Akhil Das ak...@sigmoidanalytics.com wrote: 1. On HDFS files are treated as ~64mb in

Re: Number of executors change during job running

2014-07-16 Thread Bill Jay
Hi Tathagata, I have tried the repartition method. The reduce stage first had 2 executors and then it had around 85 executors. I specified repartition(300) and each of the executors were specified 2 cores when I submitted the job. This shows repartition works to increase more executors. However,

Re: Number of executors change during job running

2014-07-14 Thread Bill Jay
Hi Tathagata, It seems repartition does not necessarily force Spark to distribute the data into different executors. I have launched a new job which uses repartition right after I received data from Kafka. For the first two batches, the reduce stage used more than 80 executors. Starting from the

Re: Number of executors change during job running

2014-07-14 Thread Tathagata Das
Can you give me a screen shot of the stages page in the web ui, the spark logs, and the code that is causing this behavior. This seems quite weird to me. TD On Mon, Jul 14, 2014 at 2:11 PM, Bill Jay bill.jaypeter...@gmail.com wrote: Hi Tathagata, It seems repartition does not necessarily

Re: Number of executors change during job running

2014-07-11 Thread Praveen Seluka
If I understand correctly, you could not change the number of executors at runtime right(correct me if am wrong) - its defined when we start the application and fixed. Do you mean number of tasks? On Fri, Jul 11, 2014 at 6:29 AM, Tathagata Das tathagata.das1...@gmail.com wrote: Can you try

Re: Number of executors change during job running

2014-07-11 Thread Bill Jay
Hi Praveen, I did not change the number of total executors. I specified 300 as the number of executors when I submitted the jobs. However, for some stages, the number of executors is very small, leading to long calculation time even for small data set. That means not all executors were used for

Re: Number of executors change during job running

2014-07-11 Thread Bill Jay
Hi Tathagata, I also tried to use the number of partitions as parameters to the functions such as groupByKey. It seems the numbers of executors is around 50 instead of 300, which is the number of the executors I specified in submission script. Moreover, the running time of different executors is

Re: Number of executors change during job running

2014-07-11 Thread Tathagata Das
Can you show us the program that you are running. If you are setting number of partitions in the XYZ-ByKey operation as 300, then there should be 300 tasks for that stage, distributed on the 50 executors are allocated to your context. However the data distribution may be skewed in which case, you

Re: Number of executors change during job running

2014-07-11 Thread Bill Jay
Hi Tathagata, Below is my main function. I omit some filtering and data conversion functions. These functions are just a one-to-one mapping, which may not possible increase running time. The only reduce function I have here is groupByKey. There are 4 topics in my Kafka brokers and two of the

Re: Number of executors change during job running

2014-07-11 Thread Bill Jay
Hi folks, I just ran another job that only received data from Kafka, did some filtering, and then save as text files in HDFS. There was no reducing work involved. Surprisingly, the number of executors for the saveAsTextFiles stage was also 2 although I specified 300 executors in the job

Re: Number of executors change during job running

2014-07-11 Thread Tathagata Das
Aah, I get it now. That is because the input data streams is replicated on two machines, so by locality the data is processed on those two machines. So the map stage on the data uses 2 executors, but the reduce stage, (after groupByKey) the saveAsTextFiles would use 300 tasks. And the default

Re: Number of executors change during job running

2014-07-11 Thread Bill Jay
Hi Tathagata, Do you mean that the data is not shuffled until the reduce stage? That means groupBy still only uses 2 machines? I think I used repartition(300) after I read the data from Kafka into DStream. It seems that it did not guarantee that the map or reduce stages will be run on 300

Re: Number of executors change during job running

2014-07-10 Thread Tathagata Das
Are you specifying the number of reducers in all the DStream.ByKey operations? If the reduce by key is not set, then the number of reducers used in the stages can keep changing across batches. TD On Wed, Jul 9, 2014 at 4:05 PM, Bill Jay bill.jaypeter...@gmail.com wrote: Hi all, I have a

Re: Number of executors change during job running

2014-07-10 Thread Bill Jay
Hi Tathagata, I set default parallelism as 300 in my configuration file. Sometimes there are more executors in a job. However, it is still slow. And I further observed that most executors take less than 20 seconds but two of them take much longer such as 2 minutes. The data size is very small

Re: Number of executors change during job running

2014-07-10 Thread Tathagata Das
Can you try setting the number-of-partitions in all the shuffle-based DStream operations, explicitly. It may be the case that the default parallelism (that is, spark.default.parallelism) is probably not being respected. Regarding the unusual delay, I would look at the task details of that stage