question on Spark worker node core settings.

2014-02-01 Thread Wisc Forum
HI, I am currently working on visualizing the task behaviors running on spark. I found two problems that are hard to solve. 1. I set the SPARK_WORKER_CORES to 2 in my SPARK_HOME/conf/spark-env.sh file. But when I start the cluster, I still see more than two cores running on some nodes both on

Spark performance on Amazon EC2

2013-11-25 Thread Wisc Forum
>> Hi, Spark support: >> >> I am working on a research project which uses Spark on Amazon EC2 as the >> cluster foundation for distributed computation. >> >> The project basically consumes some image and text files and project these >> files to different features and index them for later query.

Spark performance on Amazon EC2

2013-11-24 Thread Wisc Forum
Hi, Spark support: I am working on a research project which uses Spark on Amazon EC2 as the cluster foundation for distributed computation. The project basically consumes some image and text files and project these files to different features and index them for later query. Currently the image

Re: How to add more worker node to spark cluster on EC2

2013-11-17 Thread Wisc Forum
laves. If you do this, then you can just > copy all the Spark configuration from the master by using a command like this > (assuming you installed from the ec2 scripts): > ~/spark-ec2/copy-file ~/spark/conf/ > and then everyone should be happy. > > > On Sun, Nov 17, 2013 at 1

How to add more worker node to spark cluster on EC2

2013-11-17 Thread Wisc Forum
Hi, I have a job that runs on Spark on EC2. The cluster currently contains 1 master node and 2 worker node. I am planning to add several other worker nodes to the cluster. How should I do that so the master node knows the new worker nodes? I couldn't find the documentation on it in Spark's site

Spark map function question

2013-10-21 Thread Wisc Forum
Hi, we have tried integrating Spark with our existing code and see some issues. The issue is that when we use the below function (where func is a function to process elem) rdd.map{ elem => {func.apply(elem)} } in the log, I see the apply function are applied a few times for the same element el