due to the frame size being too small, you’re fine. Having a bigger
frame size will result in wasted space and unneeded memory allocation for
buffers. It doesn’t make the communication more efficient.
Matei
On Dec 8, 2013, at 12:57 PM, Shangyu Luo lsy...@gmail.com wrote:
I would like
walrusthe...@gmail.com
Shangyu,
Thanks for the tip re: the flag! Maybe the broadcast variable is only for
complex data structures?
On Sun, Nov 3, 2013 at 7:58 PM, Shangyu Luo lsy...@gmail.com wrote:
I met the problem of 'Too many open files' before. One solution is
adding 'ulimit -n 10
}.reduce(_ + _)
w -= gradient
}
On Sun, Nov 3, 2013 at 10:47 AM, Shangyu Luo lsy...@gmail.com wrote:
Hi Walrus,
Thank you for sharing your solution to your problem. I think I have met
the similar problem before (i.e., one machine is working while others are
idle.) and I
The error is from a worker node -- did you check that /data2 is set up
properly on the worker nodes too? In general that should be the only
directory used.
Matei
On Oct 28, 2013, at 6:52 PM, Shangyu Luo lsy...@gmail.com wrote:
Hello,
I have some questions about the files that Spark will create
Hello,
I have some questions about the files that Spark will create and use during
its running.
(1) I am running a python program on Spark with a cluster of EC2. The data
comes from hdfs file system. I have met the following error in the console
of the master node:
*java.io.FileNotFoundException:
set up SCALA_HOME and SPARK_HOME in bashrc and they worked well
for spark 0.8.0 source version (I downloaded and compiled the source
version before, but I have deleted it now)
So what's going wrong here? Any advice will be appreciated.
Thanks!
--
--
Shangyu, Luo
Department of Computer Science
Rice
OK. I think I have solved it.
I do not need to build the 0.8.0 cdh4 version because it has been prebuilt.
Now the pi example can run now.
2013/10/10 Shangyu Luo lsy...@gmail.com
Hello,
I downloaded spark 0.8.0 cdh4 version, built and compiled it by
using SPARK_HADOOP_VERSION=2.0.0-cdh4.4.0
is the actual PySpark worker process, and
is launched by the Spark worker when running Python jobs. So, when
using PySpark, the real computation is handled by a python process
(via daemon.py), not a java process.
Hope that helps,
-Jey
On Mon, Oct 7, 2013 at 9:50 PM, Shangyu Luo lsy...@gmail.com wrote
Also, I found that the 'daemon.py' will continue running on one worker node
even after I terminated the spark job at master node. A little strange for
me.
2013/10/8 Shangyu Luo lsy...@gmail.com
Hello Jey,
Thank you for answering. I have found that there are about 6 or 7
'daemon.py' processes
job the daemon.py will work on? Is it normal for it
to consume a lot of CPU and memory?
Thanks!
Best,
Shangyu Luo
--
--
Shangyu, Luo
Department of Computer Science
Rice University
is the default
unless you give it another value. You can view the exact number of tasks on
the job monitoring UI in Spark 0.8 (
http://spark.incubator.apache.org/docs/latest/monitoring.html).
Matei
Any help will be appreciated.
Thanks!
--
--
Shangyu, Luo
Department of Computer Science
and the time he shows up? -Randy
Pausch
--
--
Shangyu, Luo
Department of Computer Science
Rice University
--
Not Just Think About It, But Do It!
--
Success is never final.
--
Losers always whine about their best
, took 0.172441 s
[625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625,
625, 625]
--
Reynold Xin, AMPLab, UC Berkeley
http://rxin.org
On Thu, Sep 26, 2013 at 10:08 PM, Shangyu Luo lsy...@gmail.com wrote:
I can see the test for ParallelCollectionRDD.slice().
But how
be counted as one task? For example,
sc.parallelize([0,1,2,3]).map(lambda x: x) Will there be four tasks?
Any help will be appreciated.
Thanks!
--
--
Shangyu, Luo
Department of Computer Science
Rice University
wrong with my code?
Thanks!
--
--
Shangyu, Luo
Department of Computer Science
Rice University
15 matches
Mail list logo