HI,
I am currently working on visualizing the task behaviors running on spark. I
found two problems that are hard to solve.
1. I set the SPARK_WORKER_CORES to 2 in my SPARK_HOME/conf/spark-env.sh file.
But when I start the cluster, I still see more than two cores running on some
nodes both on
>> Hi, Spark support:
>>
>> I am working on a research project which uses Spark on Amazon EC2 as the
>> cluster foundation for distributed computation.
>>
>> The project basically consumes some image and text files and project these
>> files to different features and index them for later query.
Hi, Spark support:
I am working on a research project which uses Spark on Amazon EC2 as the
cluster foundation for distributed computation.
The project basically consumes some image and text files and project these
files to different features and index them for later query.
Currently the image
laves. If you do this, then you can just
> copy all the Spark configuration from the master by using a command like this
> (assuming you installed from the ec2 scripts):
> ~/spark-ec2/copy-file ~/spark/conf/
> and then everyone should be happy.
>
>
> On Sun, Nov 17, 2013 at 1
Hi, I have a job that runs on Spark on EC2. The cluster currently contains 1
master node and 2 worker node.
I am planning to add several other worker nodes to the cluster. How should I do
that so the master node knows the new worker nodes?
I couldn't find the documentation on it in Spark's site
Hi, we have tried integrating Spark with our existing code and see some issues.
The issue is that when we use the below function (where func is a function to
process elem)
rdd.map{ elem => {func.apply(elem)} }
in the log, I see the apply function are applied a few times for the same
element el