Re: Task not running in standalone cluster

2013-12-17 Thread Jie Deng
Hi Andrew, Thanks for helping! Sorry I did not make my self clear, here is the output from iptables (both master and worker): jie@jie-OptiPlex-7010:~/spark$ sudo ufw status Status: inactive jie@jie-OptiPlex-7010:~/spark$ sudo iptables -L Chain INPUT (policy ACCEPT) target prot opt source

Re: OOM, help

2013-12-17 Thread Jie Deng
Hi,Leo, I think java.lang.OutOfMemoryError: Java heap space is caused by java memory problem, no connection with spark. Just try -Xmx: more memory when start jvm 2013/12/17 leosand...@gmail.com leosand...@gmail.com hello everyone, I have a problem when I run the wordcount example. I read

Re: Task not running in standalone cluster

2013-12-17 Thread Jie Deng
When I start a task on master, I can see there is a CoarseGralinedExcutorBackend java process running on worker, is that saying something? 2013/12/17 Jie Deng deng113...@gmail.com Hi Andrew, Thanks for helping! Sorry I did not make my self clear, here is the output from iptables (both

Re: Task not running in standalone cluster

2013-12-17 Thread Jie Deng
don't bother...My problem is using spark-0.9 instead 0.8...because 0.9 fixed bug which can run from eclipse. 2013/12/17 Jie Deng deng113...@gmail.com When I start a task on master, I can see there is a CoarseGralinedExcutorBackend java process running on worker, is that saying something?

Re: Task not running in standalone cluster

2013-12-17 Thread Andrew Ash
Glad you got it figured out! On Tue, Dec 17, 2013 at 8:43 AM, Jie Deng deng113...@gmail.com wrote: don't bother...My problem is using spark-0.9 instead 0.8...because 0.9 fixed bug which can run from eclipse. 2013/12/17 Jie Deng deng113...@gmail.com When I start a task on master, I can

Re: spark through vpn, SPARK_LOCAL_IP

2013-12-17 Thread viren kumar
Is that really the only solution? I too am faced with the same problem of running the driver on a machine with two IPs, one internal and one external. I launch the job and the Spark server fails to connect to the client since it tries on the internal IP. I tried setting SPARK_LOCAL_IP, but to no

Re: Repartitioning an RDD

2013-12-17 Thread Matei Zaharia
I’m not sure if a method called repartition() ever existed in an official release, since we don’t remove methods, but there is a method called coalesce() that does what you want. You just tell it the desired new number of partitions. You can also have it shuffle the data across the cluster to

Re: Repartitioning an RDD

2013-12-17 Thread Patrick Wendell
Master and 0.8.1 (soon to be released) have `repartition`. It's actually a new feature not an old one! On Tue, Dec 17, 2013 at 4:31 PM, Mark Hamstra m...@clearstorydata.com wrote: https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L280 On

Re: spark pre-built binaries for 0.8.0

2013-12-17 Thread Patrick Wendell
Hey Philip, No - those are compiled against the mr1 version. You'll need to build yourself for YARN. - Patrick On Tue, Dec 17, 2013 at 10:32 AM, Philip Ogren philip.og...@oracle.com wrote: I have a question about the pre-built binary for 0.8.0 for CDH 4 listed here:

Problem when trying to modify data generated with collect() method from RDD

2013-12-17 Thread 杨强
Hi, everyone. I'm using scala to implement a connected component algorithm in Spark. And the question codes are as follows: 1type Graph = ListBuffer[Array[String]] 2type CCS = ListBuffer[Graph] 3val ccs_array:Array[CCS] = graphs_rdd.map{ graph = find_cc(graph)}.collect() 4var

FileNotFoundException running spark job

2013-12-17 Thread Nathan Kronenfeld
Hi, Folks. I was wondering if anyone has encountered the following error before; I've been staring at this all day and can't figure out what it means. In my client log, I get: [INFO] 17 Dec 2013 22:31:09 - org.apache.spark.Logging$class - Lost TID 282 (task 3.0:63) [INFO] 17 Dec 2013 22:31:09 -

Spark streaming vs. spark usage

2013-12-17 Thread Nathan Kronenfeld
Hi, Folks. We've just started looking at Spark Streaming, and I find myself a little confused. As I understood it, one of the main points of the system was that one could use the same code when streaming, doing batch processing, or whatnot. Yet when we try to apply a batch processor that

Re: FileNotFoundException running spark job

2013-12-17 Thread Azuryy Yu
I think you need to increase ulimit to avoid 'too many open files' error, then FileNotFoundException should disappear. On Wed, Dec 18, 2013 at 11:56 AM, Nathan Kronenfeld nkronenf...@oculusinfo.com wrote: Hi, Folks. I was wondering if anyone has encountered the following error before; I've

Re: FileNotFoundException running spark job

2013-12-17 Thread Nathan Kronenfeld
On Tue, Dec 17, 2013 at 11:05 PM, Azuryy Yu azury...@gmail.com wrote: I think you need to increase ulimit to avoid 'too many open files' error, then FileNotFoundException should disappear. That was our initial thought too... but this is happening on even trivial jobs that worked fine a few

Re: spark pre-built binaries for 0.8.0

2013-12-17 Thread phoenix bai
I am compiling against hadoop 2.2.0, it really takes time, especially with network connection is not that stable and all. SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true ./sbt/sbt assembly On Wed, Dec 18, 2013 at 10:39 AM, Patrick Wendell pwend...@gmail.comwrote: Hey Philip, No - those are

Re: spark pre-built binaries for 0.8.0

2013-12-17 Thread Azuryy Yu
Hi Phoenix, This is not Spark releated. It was your local net work limited. Thanks. On Wed, Dec 18, 2013 at 3:17 PM, phoenix bai mingzhi...@gmail.com wrote: I am compiling against hadoop 2.2.0, it really takes time, especially with network connection is not that stable and all.

Re: spark pre-built binaries for 0.8.0

2013-12-17 Thread phoenix bai
yeah I know. I can`t do nothing to improve my network, so, all i manage to do is: if it looks hanging, i kill it and restart. so far so good, but looks long way to go. On Wed, Dec 18, 2013 at 3:35 PM, Azuryy Yu azury...@gmail.com wrote: Hi Phoenix, This is not Spark releated. It was your