Thanks for that, Matei! I'll look at that once I get a spare moment. :-)

If you like, I'll keep documenting my newbie problems and frustrations...
perhaps it might make things easier for others.

Another issue I seem to have found (now that I can get small clusters up):
some of the examples (the streaming.Twitter ones especially) depend on
there being a "/mnt/spark" and "/mnt2/spark" directory (I think for java
tempfiles?) and those don't seem to exist out-of-the-box. I have to create
those directories and use "copy-dir" to get them to the workers before
those examples run.

Much of the the last two days for me have been about failing to get any of
my own code to work, except for in spark-shell. (which is very nice, btw)

At first I tried editing the examples, because I took the documentation
literally when it said "Finally, Spark includes several samples in the
examples directory (Scala, Java, Python). You can run them as follows:"
 but of course didn't realize editing them is pointless because while the
source is there, the code is actually pulled from a .jar elsewhere. Doh.
(so obvious in hindsight)

I couldn't even turn down the voluminous INFO messages to WARNs, no matter
how many conf/log4j.properties files I edited or copy-dir'd. I'm sure
there's a trick to that I'm not getting.

Even trying to build SimpleApp I've run into the problem that all the
documentation says to use "sbt/sbt assemble", but sbt doesn't seem to be in
the 1.0.0 pre-built packages that I downloaded.

Ah... yes.. there it is in the source package. I suppose that means that in
order to deploy any new code to the cluster, I've got to rebuild from
source on my "cluster controller". OK, I never liked that Amazon Linux AMI
anyway. I'm going to start from scratch again with an Ubuntu 12.04
instance, hopefully that will be more auspicious...

Meanwhile I'm learning scala... Great Turing's Ghost, it's the dream
language we've theorized about for years! I hadn't realized!



On Mon, Jun 2, 2014 at 12:05 PM, Matei Zaharia <matei.zaha...@gmail.com>
wrote:

> FYI, I opened https://issues.apache.org/jira/browse/SPARK-1990 to track
> this.
>
> Matei
>
>
> On Jun 1, 2014, at 6:14 PM, Jeremy Lee <unorthodox.engine...@gmail.com>
> wrote:
>
> Sort of.. there were two separate issues, but both related to AWS..
>
> I've sorted the confusion about the Master/Worker AMI ... use the version
> chosen by the scripts. (and use the right instance type so the script can
> choose wisely)
>
> But yes, one also needs a "launch machine" to kick off the cluster, and
> for that I _also_ was using an Amazon instance... (made sense.. I have a
> team that will needs to do things as well, not just me) and I was just
> pointing out that if you use the "most recommended by Amazon" AMI (for your
> free micro instance, for example) you get python 2.6 and the ec2 scripts
> fail.
>
> That merely needs a line in the documentation saying "use Ubuntu for your
> cluster controller, not Amazon Linux" or somesuch. But yeah, for a newbie,
> it was hard working out when to use "default" or "custom" AMIs for various
> parts of the setup.
>
>
> On Mon, Jun 2, 2014 at 4:01 AM, Patrick Wendell <pwend...@gmail.com>
> wrote:
>
>> Hey just to clarify this - my understanding is that the poster
>> (Jeremey) was using a custom AMI to *launch* spark-ec2. I normally
>> launch spark-ec2 from my laptop. And he was looking for an AMI that
>> had a high enough version of python.
>>
>> Spark-ec2 itself has a flag "-a" that allows you to give a specific
>> AMI. This flag is just an internal tool that we use for testing when
>> we spin new AMI's. Users can't set that to an arbitrary AMI because we
>> tightly control things like the Java and OS versions, libraries, etc.
>>
>>
>> On Sun, Jun 1, 2014 at 12:51 AM, Jeremy Lee
>> <unorthodox.engine...@gmail.com> wrote:
>> > *sigh* OK, I figured it out. (Thank you Nick, for the hint)
>> >
>> > "m1.large" works, (I swear I tested that earlier and had similar
>> issues... )
>> >
>> > It was my obsession with starting "r3.*large" instances. Clearly I
>> hadn't
>> > patched the script in all the places.. which I think caused it to
>> default to
>> > the Amazon AMI. I'll have to take a closer look at the code and see if I
>> > can't fix it correctly, because I really, really do want nodes with 2x
>> the
>> > CPU and 4x the memory for the same low spot price. :-)
>> >
>> > I've got a cluster up now, at least. Time for the fun stuff...
>> >
>> > Thanks everyone for the help!
>> >
>> >
>> >
>> > On Sun, Jun 1, 2014 at 5:19 PM, Nicholas Chammas
>> > <nicholas.cham...@gmail.com> wrote:
>> >>
>> >> If you are explicitly specifying the AMI in your invocation of
>> spark-ec2,
>> >> may I suggest simply removing any explicit mention of AMI from your
>> >> invocation? spark-ec2 automatically selects an appropriate AMI based
>> on the
>> >> specified instance type.
>> >>
>> >> 2014년 6월 1일 일요일, Nicholas Chammas<nicholas.cham...@gmail.com>님이 작성한
>> 메시지:
>> >>
>> >>> Could you post how exactly you are invoking spark-ec2? And are you
>> having
>> >>> trouble just with r3 instances, or with any instance type?
>> >>>
>> >>> 2014년 6월 1일 일요일, Jeremy Lee<unorthodox.engine...@gmail.com>님이 작성한
>> 메시지:
>> >>>
>> >>> It's been another day of spinning up dead clusters...
>> >>>
>> >>> I thought I'd finally worked out what everyone else knew - don't use
>> the
>> >>> default AMI - but I've now run through all of the "official"
>> quick-start
>> >>> linux releases and I'm none the wiser:
>> >>>
>> >>> Amazon Linux AMI 2014.03.1 - ami-7aba833f (64-bit)
>> >>> Provisions servers, connects, installs, but the webserver on the
>> master
>> >>> will not start
>> >>>
>> >>> Red Hat Enterprise Linux 6.5 (HVM) - ami-5cdce419
>> >>> Spot instance requests are not supported for this AMI.
>> >>>
>> >>> SuSE Linux Enterprise Server 11 sp3 (HVM) - ami-1a88bb5f
>> >>> Not tested - costs 10x more for spot instances, not economically
>> viable.
>> >>>
>> >>> Ubuntu Server 14.04 LTS (HVM) - ami-f64f77b3
>> >>> Provisions servers, but "git" is not pre-installed, so the cluster
>> setup
>> >>> fails.
>> >>>
>> >>> Amazon Linux AMI (HVM) 2014.03.1 - ami-5aba831f
>> >>> Provisions servers, but "git" is not pre-installed, so the cluster
>> setup
>> >>> fails.
>> >
>> >
>> >
>> >
>> > --
>> > Jeremy Lee  BCompSci(Hons)
>> >   The Unorthodox Engineers
>>
>
>
>
> --
> Jeremy Lee  BCompSci(Hons)
>   The Unorthodox Engineers
>
>
>


-- 
Jeremy Lee  BCompSci(Hons)
  The Unorthodox Engineers

Reply via email to