Ah, sorry to hear you had more problems. Some thoughts on them: > Thanks for that, Matei! I'll look at that once I get a spare moment. :-) > > If you like, I'll keep documenting my newbie problems and frustrations... > perhaps it might make things easier for others. > > Another issue I seem to have found (now that I can get small clusters up): > some of the examples (the streaming.Twitter ones especially) depend on there > being a "/mnt/spark" and "/mnt2/spark" directory (I think for java > tempfiles?) and those don't seem to exist out-of-the-box. I have to create > those directories and use "copy-dir" to get them to the workers before those > examples run.
I think this is a side-effect of the r3 instances not having those drives mounted. Our setup script would normally create these directories. What was the error? > Much of the the last two days for me have been about failing to get any of my > own code to work, except for in spark-shell. (which is very nice, btw) > > At first I tried editing the examples, because I took the documentation > literally when it said "Finally, Spark includes several samples in the > examples directory (Scala, Java, Python). You can run them as follows:" but > of course didn't realize editing them is pointless because while the source > is there, the code is actually pulled from a .jar elsewhere. Doh. (so obvious > in hindsight) > > I couldn't even turn down the voluminous INFO messages to WARNs, no matter > how many conf/log4j.properties files I edited or copy-dir'd. I'm sure there's > a trick to that I'm not getting. What did you change log4j.properties to? It should be changed to say log4j.rootCategory=WARN, console but maybe another log4j.properties is somehow arriving on the classpath. This is definitely a common problem so we need to add some explicit docs on it. > Even trying to build SimpleApp I've run into the problem that all the > documentation says to use "sbt/sbt assemble", but sbt doesn't seem to be in > the 1.0.0 pre-built packages that I downloaded. Are you going through http://spark.apache.org/docs/latest/quick-start.html? You should be able to do just sbt package. Once you do that you don’t need to deploy your application’s JAR to the cluster, just pass it to spark-submit and it will automatically be sent over. > Ah... yes.. there it is in the source package. I suppose that means that in > order to deploy any new code to the cluster, I've got to rebuild from source > on my "cluster controller". OK, I never liked that Amazon Linux AMI anyway. > I'm going to start from scratch again with an Ubuntu 12.04 instance, > hopefully that will be more auspicious... > > Meanwhile I'm learning scala... Great Turing's Ghost, it's the dream language > we've theorized about for years! I hadn't realized! Indeed, glad you’re enjoying it. Matei > > > > On Mon, Jun 2, 2014 at 12:05 PM, Matei Zaharia <matei.zaha...@gmail.com> > wrote: > FYI, I opened https://issues.apache.org/jira/browse/SPARK-1990 to track this. > Matei > > > On Jun 1, 2014, at 6:14 PM, Jeremy Lee <unorthodox.engine...@gmail.com> wrote: > >> Sort of.. there were two separate issues, but both related to AWS.. >> >> I've sorted the confusion about the Master/Worker AMI ... use the version >> chosen by the scripts. (and use the right instance type so the script can >> choose wisely) >> >> But yes, one also needs a "launch machine" to kick off the cluster, and for >> that I _also_ was using an Amazon instance... (made sense.. I have a team >> that will needs to do things as well, not just me) and I was just pointing >> out that if you use the "most recommended by Amazon" AMI (for your free >> micro instance, for example) you get python 2.6 and the ec2 scripts fail. >> >> That merely needs a line in the documentation saying "use Ubuntu for your >> cluster controller, not Amazon Linux" or somesuch. But yeah, for a newbie, >> it was hard working out when to use "default" or "custom" AMIs for various >> parts of the setup. >> >> >> On Mon, Jun 2, 2014 at 4:01 AM, Patrick Wendell <pwend...@gmail.com> wrote: >> Hey just to clarify this - my understanding is that the poster >> (Jeremey) was using a custom AMI to *launch* spark-ec2. I normally >> launch spark-ec2 from my laptop. And he was looking for an AMI that >> had a high enough version of python. >> >> Spark-ec2 itself has a flag "-a" that allows you to give a specific >> AMI. This flag is just an internal tool that we use for testing when >> we spin new AMI's. Users can't set that to an arbitrary AMI because we >> tightly control things like the Java and OS versions, libraries, etc. >> >> >> On Sun, Jun 1, 2014 at 12:51 AM, Jeremy Lee >> <unorthodox.engine...@gmail.com> wrote: >> > *sigh* OK, I figured it out. (Thank you Nick, for the hint) >> > >> > "m1.large" works, (I swear I tested that earlier and had similar issues... >> > ) >> > >> > It was my obsession with starting "r3.*large" instances. Clearly I hadn't >> > patched the script in all the places.. which I think caused it to default >> > to >> > the Amazon AMI. I'll have to take a closer look at the code and see if I >> > can't fix it correctly, because I really, really do want nodes with 2x the >> > CPU and 4x the memory for the same low spot price. :-) >> > >> > I've got a cluster up now, at least. Time for the fun stuff... >> > >> > Thanks everyone for the help! >> > >> > >> > >> > On Sun, Jun 1, 2014 at 5:19 PM, Nicholas Chammas >> > <nicholas.cham...@gmail.com> wrote: >> >> >> >> If you are explicitly specifying the AMI in your invocation of spark-ec2, >> >> may I suggest simply removing any explicit mention of AMI from your >> >> invocation? spark-ec2 automatically selects an appropriate AMI based on >> >> the >> >> specified instance type. >> >> >> >> 2014년 6월 1일 일요일, Nicholas Chammas<nicholas.cham...@gmail.com>님이 작성한 메시지: >> >> >> >>> Could you post how exactly you are invoking spark-ec2? And are you having >> >>> trouble just with r3 instances, or with any instance type? >> >>> >> >>> 2014년 6월 1일 일요일, Jeremy Lee<unorthodox.engine...@gmail.com>님이 작성한 메시지: >> >>> >> >>> It's been another day of spinning up dead clusters... >> >>> >> >>> I thought I'd finally worked out what everyone else knew - don't use the >> >>> default AMI - but I've now run through all of the "official" quick-start >> >>> linux releases and I'm none the wiser: >> >>> >> >>> Amazon Linux AMI 2014.03.1 - ami-7aba833f (64-bit) >> >>> Provisions servers, connects, installs, but the webserver on the master >> >>> will not start >> >>> >> >>> Red Hat Enterprise Linux 6.5 (HVM) - ami-5cdce419 >> >>> Spot instance requests are not supported for this AMI. >> >>> >> >>> SuSE Linux Enterprise Server 11 sp3 (HVM) - ami-1a88bb5f >> >>> Not tested - costs 10x more for spot instances, not economically viable. >> >>> >> >>> Ubuntu Server 14.04 LTS (HVM) - ami-f64f77b3 >> >>> Provisions servers, but "git" is not pre-installed, so the cluster setup >> >>> fails. >> >>> >> >>> Amazon Linux AMI (HVM) 2014.03.1 - ami-5aba831f >> >>> Provisions servers, but "git" is not pre-installed, so the cluster setup >> >>> fails. >> > >> > >> > >> > >> > -- >> > Jeremy Lee BCompSci(Hons) >> > The Unorthodox Engineers >> >> >> >> -- >> Jeremy Lee BCompSci(Hons) >> The Unorthodox Engineers > > > > > -- > Jeremy Lee BCompSci(Hons) > The Unorthodox Engineers