how to bind spark-master to the public IP of EC2

2014-01-05 Thread Nan Zhu
Hi, all How to bind spark-master to the public IP of EC2? I tried to set spark-env.sh, but it failed Thank you Best, -- Nan Zhu

Re: ADD_JARS doesn't properly work for spark-shell

2014-01-05 Thread Aureliano Buendia
On Sun, Jan 5, 2014 at 6:01 AM, Aaron Davidson ilike...@gmail.com wrote: That sounds like a different issue. What is the type of myrdd (i.e., if you just type myrdd into the shell)? It's possible it's defined as an RDD[Nothing] and thus all operations try to typecast to Nothing, which always

Re: State of spark on scala 2.10

2014-01-05 Thread Aureliano Buendia
On Sun, Jan 5, 2014 at 5:42 AM, Patrick Wendell pwend...@gmail.com wrote: I usually just use the existing launch scripts to create a correctly sized cluster then just: 1. Copy spark/conf/* to /tmp 2. rm -rf spark/ 3. Checkout and build new spark from github (you'll need to rename

debug standalone Spark jobs?

2014-01-05 Thread Nan Zhu
Hi, all I’m trying to run a standalone job in a Spark cluster on EC2, obviously there is some bug in my code, after the job runs for several minutes, it failed with an exception Loading /usr/share/sbt/bin/sbt-launch-lib.bash [info] Set current project to rec_system (in build

Re: debug standalone Spark jobs?

2014-01-05 Thread Sriram Ramachandrasekaran
Did you get to look at the spark worker logs? They would be at SPARK_HOME/logs/ Also, you should look at the application logs itself. They would be under SPARK_HOME/work/APP_ID On Sun, Jan 5, 2014 at 8:36 PM, Nan Zhu zhunanmcg...@gmail.com wrote: Hi, all I’m trying to run a standalone job

Re: debug standalone Spark jobs?

2014-01-05 Thread Nan Zhu
Ah, yes, I think application logs really help Thank you -- Nan Zhu On Sunday, January 5, 2014 at 10:13 AM, Sriram Ramachandrasekaran wrote: Did you get to look at the spark worker logs? They would be at SPARK_HOME/logs/ Also, you should look at the application logs itself. They

Re: debug standalone Spark jobs?

2014-01-05 Thread Archit Thakur
You can run your spark application locally by setting SPARK_MASTER=local and then debug the launched jvm in your IDE. On Sun, Jan 5, 2014 at 9:04 PM, Nan Zhu zhunanmcg...@gmail.com wrote: Ah, yes, I think application logs really help Thank you -- Nan Zhu On Sunday, January 5, 2014 at

Re: Spark context jar confusions

2014-01-05 Thread Eugen Cepoi
Indeed you don't need it, just make sure that it is in your classpath. But anyway the jar is not so big, I mean compared to what next your job will do, sending some mo over the network seems OK to me. 2014/1/5 Aureliano Buendia buendia...@gmail.com Eugen, I noticed that you are including

Re: how to bind spark-master to the public IP of EC2

2014-01-05 Thread Nan Zhu
Yes, I did that, it runs well when I bind to a private IP on EC2, but I cannot bind to the public IP I checked with the EC2 docs, it seems that the public IP is not something binded with a NIC device of your instance, but a stuff got from NAT so the program cannot create socket on it Best,

Re: debug standalone Spark jobs?

2014-01-05 Thread Eugen Cepoi
You can set the log level to INFO, it looks like spark is logging applicative errors as INFO. When I have errors that I can reproduce only on live data, I am running a spark shell with my job in its classpath, then I debug tweak things to find out what happens. 2014/1/5 Nan Zhu

Re: Will JVM be reused?

2014-01-05 Thread Archit Thakur
I am facing a general problem actually, which seem to be related to how many JVM get launched. In my map task I read a file and fill a map out of it. Now, since the data is static and map tasks are called for every record of RDD and I want to read it only once, so I kept the map as static (in