Inlined.
On Wed, Oct 2, 2013 at 1:00 PM, Matei Zaharia matei.zaha...@gmail.comwrote:
Hi Shangyu,
(1) When we read in a local file by SparkContext.textFile and do some
map/reduce job on it, how will spark decide to send data to which worker
node? Will the data be divided/partitioned equally
No, that is not what allowLocal means. For a very few actions, the
DAGScheduler will run the job locally (in a separate thread on the master
node) if the RDD in the action has a single partition and no dependencies
in its lineage. If allowLocal is false, that doesn't mean that
Ok, even if my understanding of allowLocal is incorrect, nevertheless
(1) I'm loading a local file
(2) The tasks seem as if they are getting executed on a slave node
(ip-10-129-25-28) is not my master node
??
On Thu, Oct 3, 2013 at 12:22 PM, Mark Hamstra m...@clearstorydata.comwrote:
No,
The spark code is on my /home directory, which is shared on NFS to all
nodes. So all workers should be able to access the same file.
On Thu, Oct 3, 2013 at 2:34 PM, Mark Hamstra m...@clearstorydata.comwrote:
But the worker has to be on a node that has local access to the file.
On Thu, Oct
Hi,
Trying to figure out what does it mean when the application (driver
program) logs end with the the lines like the ones below. This is with the
application running on Spark 0.8.0 on EC2.
Any help will be greatly appreciated.
Thanks!
13/10/03 16:17:33 INFO cluster.ClusterTaskSetManager:
Hi Eduardo,
it seems to me that your second problem is caused by inconsistent, i.e.
different classes in master and worker JVMs.
Are you sure, that you have replaced the changed FlatMapFunction on all
worker nodes and also on master?
Regards,
Martin
13/10/03 13:27:44 INFO
Hi Martin,
Yes, that is what is seems. However, it is unlikely that is the case,
because I have all spark classes on my home, which is mounted on NFS to all
nodes. Unless there is something else I am missing...
Edu
On Thu, Oct 3, 2013 at 3:29 PM, Martin Weindel martin.wein...@gmail.comwrote:
Hi Eduardo,
if you are using Spark 0.7.3, I remember that I had to replace the class
file additional at
spark-0.7.3/core/target/scala-2.9.3/classes/spark/api/java/function/
Martin
Am 03.10.2013 22:35, schrieb Eduardo Berrocal:
Hi Martin,
Yes, that is what is seems. However, it is unlikely
Ah, ok. Thanks for the clarification.
When I create a file that is only visible on the master I get the following
error...
f.map(l=l.split( )).collect
13/10/03 20:38:48 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
13/10/03 20:38:48 WARN snappy.LoadSnappy: Snappy native library not
And by .java, I mean .scala
On Thu, Oct 3, 2013 at 4:03 PM, Eduardo Berrocal eberr...@hawk.iit.eduwrote:
That directory is not present on version 0.8.0 (the one I am using).
However, there are the files for FlatMapFunction in spark 0.8.0:
$ find ./ -name FlatMapFunction*
That directory is not present on version 0.8.0 (the one I am using).
However, there are the files for FlatMapFunction in spark 0.8.0:
$ find ./ -name FlatMapFunction*
Ok, I found the mistake!. Wow, it really come to me by inspiration, if not
I don't know how this suddenly come to me.
The problem is that I packed my application in a jar file with the old
spark-assembly-0.8.0-incubating-hadoop1.0.4.jar in it. So that is the
reason why workers and master have
Hi all,
Is the sort order guaranteed if you apply operations like map(), filter() or
distinct() after sort in a distributed setting (run on a cluster of machines
backed by HDFS)? In other words, does rdd.sortByKey().map() have the same
sort order as rdd.sortByKey()? If so, is it documented
Yes, it is for these map-like operations. The only time when it isn't is when
you change the RDD's partitioner, e.g. by doing sortByKey or groupByKey. It
would definitely be good to document this more formally.
Matei
On Oct 3, 2013, at 3:33 PM, Mingyu Kim m...@palantir.com wrote:
Hi all,
Got it. Thanks a lot!
From: Matei Zaharia matei.zaha...@gmail.com
Reply-To: user@spark.incubator.apache.org
user@spark.incubator.apache.org
Date: Thursday, October 3, 2013 6:00 PM
To: user@spark.incubator.apache.org user@spark.incubator.apache.org
Subject: Re: Sort order of RDD rows
Yes, it
15 matches
Mail list logo