Re: Biggest spark.akka.framesize possible

2013-12-08 Thread Shangyu Luo
due to the frame size being too small, you’re fine. Having a bigger frame size will result in wasted space and unneeded memory allocation for buffers. It doesn’t make the communication more efficient. Matei On Dec 8, 2013, at 12:57 PM, Shangyu Luo lsy...@gmail.com wrote: I would like

Re: cluster hangs for no apparent reason

2013-11-07 Thread Shangyu Luo
walrusthe...@gmail.com Shangyu, Thanks for the tip re: the flag! Maybe the broadcast variable is only for complex data structures? On Sun, Nov 3, 2013 at 7:58 PM, Shangyu Luo lsy...@gmail.com wrote: I met the problem of 'Too many open files' before. One solution is adding 'ulimit -n 10

Re: cluster hangs for no apparent reason

2013-11-03 Thread Shangyu Luo
}.reduce(_ + _) w -= gradient } On Sun, Nov 3, 2013 at 10:47 AM, Shangyu Luo lsy...@gmail.com wrote: Hi Walrus, Thank you for sharing your solution to your problem. I think I have met the similar problem before (i.e., one machine is working while others are idle.) and I

Re: Questions about the files that Spark will produce during its running

2013-10-29 Thread Shangyu Luo
The error is from a worker node -- did you check that /data2 is set up properly on the worker nodes too? In general that should be the only directory used. Matei On Oct 28, 2013, at 6:52 PM, Shangyu Luo lsy...@gmail.com wrote: Hello, I have some questions about the files that Spark will create

Questions about the files that Spark will produce during its running

2013-10-28 Thread Shangyu Luo
Hello, I have some questions about the files that Spark will create and use during its running. (1) I am running a python program on Spark with a cluster of EC2. The data comes from hdfs file system. I have met the following error in the console of the master node: *java.io.FileNotFoundException:

Running pi example error with spark 0.8.0 cdh4 version

2013-10-10 Thread Shangyu Luo
set up SCALA_HOME and SPARK_HOME in bashrc and they worked well for spark 0.8.0 source version (I downloaded and compiled the source version before, but I have deleted it now) So what's going wrong here? Any advice will be appreciated. Thanks! -- -- Shangyu, Luo Department of Computer Science Rice

Re: Running pi example error with spark 0.8.0 cdh4 version

2013-10-10 Thread Shangyu Luo
OK. I think I have solved it. I do not need to build the 0.8.0 cdh4 version because it has been prebuilt. Now the pi example can run now. 2013/10/10 Shangyu Luo lsy...@gmail.com Hello, I downloaded spark 0.8.0 cdh4 version, built and compiled it by using SPARK_HADOOP_VERSION=2.0.0-cdh4.4.0

Re: The functionality of daemon.py?

2013-10-08 Thread Shangyu Luo
is the actual PySpark worker process, and is launched by the Spark worker when running Python jobs. So, when using PySpark, the real computation is handled by a python process (via daemon.py), not a java process. Hope that helps, -Jey On Mon, Oct 7, 2013 at 9:50 PM, Shangyu Luo lsy...@gmail.com wrote

Re: The functionality of daemon.py?

2013-10-08 Thread Shangyu Luo
Also, I found that the 'daemon.py' will continue running on one worker node even after I terminated the spark job at master node. A little strange for me. 2013/10/8 Shangyu Luo lsy...@gmail.com Hello Jey, Thank you for answering. I have found that there are about 6 or 7 'daemon.py' processes

The functionality of daemon.py?

2013-10-07 Thread Shangyu Luo
job the daemon.py will work on? Is it normal for it to consume a lot of CPU and memory? Thanks! Best, Shangyu Luo -- -- Shangyu, Luo Department of Computer Science Rice University

Re: Some questions about task distribution and execution in Spark

2013-10-06 Thread Shangyu Luo
is the default unless you give it another value. You can view the exact number of tasks on the job monitoring UI in Spark 0.8 ( http://spark.incubator.apache.org/docs/latest/monitoring.html). Matei Any help will be appreciated. Thanks! -- -- Shangyu, Luo Department of Computer Science

Re: How to prevent webUI from coming up

2013-10-04 Thread Shangyu Luo
and the time he shows up? -Randy Pausch -- -- Shangyu, Luo Department of Computer Science Rice University -- Not Just Think About It, But Do It! -- Success is never final. -- Losers always whine about their best

Re: Wrong result with mapPartitions example

2013-09-28 Thread Shangyu Luo
, took 0.172441 s [625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625] -- Reynold Xin, AMPLab, UC Berkeley http://rxin.org On Thu, Sep 26, 2013 at 10:08 PM, Shangyu Luo lsy...@gmail.com wrote: I can see the test for ParallelCollectionRDD.slice(). But how

Some questions about task distribution and execution in Spark

2013-09-26 Thread Shangyu Luo
be counted as one task? For example, sc.parallelize([0,1,2,3]).map(lambda x: x) Will there be four tasks? Any help will be appreciated. Thanks! -- -- Shangyu, Luo Department of Computer Science Rice University

Wrong result with mapPartitions example

2013-09-26 Thread Shangyu Luo
wrong with my code? Thanks! -- -- Shangyu, Luo Department of Computer Science Rice University