Hi Tom,
Thanks for the reply, and after posting I found your blogs and followed your
instructions - thanks
There were a couple of gotchya's
1) My had a / in it and the escaping does not work
2) I copied to the root directory in the S3 bucket and I could not manage to
get it out again using a dist
Hi Tim,
The steps you outline look about right. Because your file is >5GB you
will need to use the S3 block file system, which has a s3 URL. (See
http://wiki.apache.org/hadoop/AmazonS3) You shouldn't have to build
your own AMI unless you have dependencies that can't be submitted as a
part of the M
Hi all,
I have data in a file (150million lines at 100Gb or so) and have several
MapReduce classes for my processing (custom index generation).
Can someone please confirm the following is the best way to run on EC2 and
S3 (both of which I am new to..)
1) load my 100Gb file into S3
2) create a cla
These are the FoxyProxy wildcards I use
*compute-1.amazonaws.com*
*.ec2.internal*
*.compute-1.internal*
and w/ hadoop 0.17.0, just type (after booting your cluster)
hadoop-ec2 proxy
to start the tunnel for that cluster
On Jun 3, 2008, at 11:26 PM, James Moore wrote:
On Tue, Jun 3, 2008 at
Andreas Kostyrka wrote:
Well, the basic "trouble" with EC2 is that clusters usually are not networks
in the TCP/IP sense.
This makes it painful to decide which URLs should be resolved where.
Plus to make it even more painful, you cannot easily run it with one simple
SOCKS server, because you
On Tue, Jun 3, 2008 at 5:04 PM, Andreas Kostyrka <[EMAIL PROTECTED]> wrote:
> Plus to make it even more painful, you cannot easily run it with one simple
> SOCKS server, because you need to defer DNS resolution to the inside the
> cluster, because VM names do resolve to external IPs, while the webs
Well, the basic "trouble" with EC2 is that clusters usually are not networks
in the TCP/IP sense.
This makes it painful to decide which URLs should be resolved where.
Plus to make it even more painful, you cannot easily run it with one simple
SOCKS server, because you need to defer DNS resoluti
obviously this isn't the best solution if you need to let many semi
trusted users browse your cluster.
Actually, it would be much more secure if the tunnel service ran on a
trusted server letting your users connect remotely via SOCKS and then
browse the cluster. These users wouldn't need
if you use the new scripts in 0.17.0, just run
> hadoop-ec2 proxy
this starts a ssh tunnel to your cluster.
installing foxy proxy in FF gives you whole cluster visibility..
obviously this isn't the best solution if you need to let many semi
trusted users browse your cluster.
On May 28, 20
On Wednesday 28 May 2008 23:16:43 Chris Anderson wrote:
> Andreas,
>
> If you can ssh into the nodes, you can always set up port-forwarding
> with ssh -L to bring those ports to your local machine.
Yes, and the missing part is simple too: iptables with DNAT on OUTPUT :)
I even made a small ugly s
up cgiproxy[1] on it, and only allowed it to forward to the hadoop nodes.
That way, when you click a link to go to a slave node's logs or whatnot it
still works. ;)
I no longer use Hadoop on ec2, but still use the config described above!
[1]http://www.jmarshall.com/tools/
Recently I spent some time hacking the contrib/ec2 scripts to install
and configure OpenVPN on top of the other installed packages. Our use
case required that all the slaves running mappers would need to
connect back through to our primary mysql database (firewalled as you
can imagine). Simultane
On Wed, May 28, 2008 at 2:23 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
>
> That doesn't work because the various web pages have links or redirects to
> other pages on other machines.
>
> Also, you would need to ssh to ALL of your cluster to get the file browser
> to work.
True. That makes it a li
That doesn't work because the various web pages have links or redirects to
other pages on other machines.
Also, you would need to ssh to ALL of your cluster to get the file browser
to work.
Better to do the proxy thing.
On 5/28/08 2:16 PM, "Chris Anderson" <[EMAIL PROTECTED]> wrote:
> Andreas
Andreas,
If you can ssh into the nodes, you can always set up port-forwarding
with ssh -L to bring those ports to your local machine.
On Wed, May 28, 2008 at 1:51 PM, Andreas Kostyrka <[EMAIL PROTECTED]> wrote:
> What I wonder is what ports do I need to access?
>
> 50060 on all nodes.
> 50030 on
What I wonder is what ports do I need to access?
50060 on all nodes.
50030 on the jobtracker.
Any other ports?
Andreas
Am Mittwoch, den 28.05.2008, 13:37 -0700 schrieb Allen Wittenauer:
>
>
> On 5/28/08 1:22 PM, "Andreas Kostyrka" <[EMAIL PROTECTED]> wrote:
> > I just wondered what other peop
That presumes that you have a static source address. Plus for
nontechnical reasons changing the firewall rules is nontrivial.
(I'm responsible for the inside of the VMs, but somebody else holds the
ec2 keys, don't ask)
Andreas
Am Mittwoch, den 28.05.2008, 16:27 -0400 schrieb Jake Thompson:
> What
On 5/28/08 1:22 PM, "Andreas Kostyrka" <[EMAIL PROTECTED]> wrote:
> I just wondered what other people use to access the hadoop webservers,
> when running on EC2?
While we don't run on EC2 :), we do protect the hadoop web processes by
putting a proxy in front of it. A user connects to the p
What is wron with opening up the ports only to the hosts that you want to
have access to them. This is what I cam currently doing, -s 0.0.0.0/0 is
everyone everywhere so change it to -s my.ip.add.ress/32
On Wed, May 28, 2008 at 4:22 PM, Andreas Kostyrka <[EMAIL PROTECTED]>
wrote:
> Hi!
>
> I j
Hi!
I just wondered what other people use to access the hadoop webservers,
when running on EC2?
Ideas that I had:
1.) opening ports 50030 and so on => not good, data goes unprotected
over the internet. Even if I could enable some form of authentication it
would still plain http.
2.) Some kind of
ll be better knowing what's going
on.
also, setting up FoxyProxy on firefox lets you browse your whole
cluster if you setup a ssh tunnel (socks).
On Mar 20, 2008, at 10:15 AM, Prasan Ary wrote:
Hi All,
I have been trying to configure Hadoop on EC2 for large number of
clusters ( 100 plus).
oing on.
also, setting up FoxyProxy on firefox lets you browse your whole
cluster if you setup a ssh tunnel (socks).
On Mar 20, 2008, at 10:15 AM, Prasan Ary wrote:
> Hi All,
> I have been trying to configure Hadoop on EC2 for large number of
> clusters ( 100 plus). It seems that I have to
id_rsa"
>sleep 1
> done
>
> Anyway, did you tried hadoop-ec2 script? It works well for task you
> described.
>
>
> Prasan Ary wrote:
> > Hi All,
> > I have been trying to configure Hadoop on EC2 for large number of
> > clusters ( 10
Yes, this isn't ideal for larger clusters. There's a jira to address
this: https://issues.apache.org/jira/browse/HADOOP-2410.
Tom
On 20/03/2008, Prasan Ary <[EMAIL PROTECTED]> wrote:
> Hi All,
> I have been trying to configure Hadoop on EC2 for large number of clusters
&
have been trying to configure Hadoop on EC2 for large number of
clusters ( 100 plus). It seems that I have to copy EC2 private key
to all the machines in the cluster so that they can have SSH
connections.
For now it seems I have to run a script to copy the key file to
each of the EC2 instances
TECTED]" "chmod 600 /root/.ssh/id_rsa"
sleep 1
done
Anyway, did you tried hadoop-ec2 script? It works well for task you
described.
Prasan Ary wrote:
Hi All,
I have been trying to configure Hadoop on EC2 for large number of clusters (
100 plus). It seems that I have to cop
Hi All,
I have been trying to configure Hadoop on EC2 for large number of clusters (
100 plus). It seems that I have to copy EC2 private key to all the machines in
the cluster so that they can have SSH connections.
For now it seems I have to run a script to copy the key file to each of the
27 matches
Mail list logo