Re: How to make a lucene Document hadoop Writable?

2008-05-28 Thread David Chung
unsubscribe

About Metrics update

2008-05-28 Thread Ion Badita
Hi, A looked over the class org.apache.hadoop.metrics.spi.AbstractMetricsContext and i have a question: why in the update(MetricsRecordImpl record) metricUpdates Map is not cleared after the updates are merged in metricMap. Because of this on every update() "old" increments are merged in metri

behavior of MapWritable as Key in Map Reduce

2008-05-28 Thread Tarandeep Singh
Hi, I want to understand the behavior of MapWritable if used as an intermediate Key in Mappers and Reducers. Suppose I create a MapWritable object with the following key-values in it- (K1, V1), (K2, V2) (K3, V3) So how will the Map Reduce Framework group and sort the keys (MapWritable objects) em

Re: splitting of big files?

2008-05-28 Thread Erik Paulson
On Tue, May 27, 2008 at 10:49:38AM -0700, Ted Dunning wrote: > > There is a good tutorial on the wiki about this. > > Your problem here is that you have conflated two concepts. The first is the > splitting of files into blocks for storage purposes. This has nothing to do > with what data a prog

hadoop on EC2

2008-05-28 Thread Andreas Kostyrka
Hi! I just wondered what other people use to access the hadoop webservers, when running on EC2? Ideas that I had: 1.) opening ports 50030 and so on => not good, data goes unprotected over the internet. Even if I could enable some form of authentication it would still plain http. 2.) Some kind of

Re: slow hosts in reduc

2008-05-28 Thread Andreas Kostyrka
Ok, just for the next guy scratching his head when googling, I haven't found a solution, but an upgrade from hadoop 0.16.3 to 0.17.0 seems to have fixed the problems. Andreas Am Dienstag, den 27.05.2008, 16:35 +0200 schrieb Andreas Kostyrka: > To make it more painful, that's what I get from an n

Re: hadoop on EC2

2008-05-28 Thread Jake Thompson
What is wron with opening up the ports only to the hosts that you want to have access to them. This is what I cam currently doing, -s 0.0.0.0/0 is everyone everywhere so change it to -s my.ip.add.ress/32 On Wed, May 28, 2008 at 4:22 PM, Andreas Kostyrka <[EMAIL PROTECTED]> wrote: > Hi! > > I j

Re: hadoop on EC2

2008-05-28 Thread Allen Wittenauer
On 5/28/08 1:22 PM, "Andreas Kostyrka" <[EMAIL PROTECTED]> wrote: > I just wondered what other people use to access the hadoop webservers, > when running on EC2? While we don't run on EC2 :), we do protect the hadoop web processes by putting a proxy in front of it. A user connects to the p

Re: hadoop on EC2

2008-05-28 Thread Andreas Kostyrka
That presumes that you have a static source address. Plus for nontechnical reasons changing the firewall rules is nontrivial. (I'm responsible for the inside of the VMs, but somebody else holds the ec2 keys, don't ask) Andreas Am Mittwoch, den 28.05.2008, 16:27 -0400 schrieb Jake Thompson: > What

Re: hadoop on EC2

2008-05-28 Thread Andreas Kostyrka
What I wonder is what ports do I need to access? 50060 on all nodes. 50030 on the jobtracker. Any other ports? Andreas Am Mittwoch, den 28.05.2008, 13:37 -0700 schrieb Allen Wittenauer: > > > On 5/28/08 1:22 PM, "Andreas Kostyrka" <[EMAIL PROTECTED]> wrote: > > I just wondered what other peop

Re: hadoop on EC2

2008-05-28 Thread Chris Anderson
Andreas, If you can ssh into the nodes, you can always set up port-forwarding with ssh -L to bring those ports to your local machine. On Wed, May 28, 2008 at 1:51 PM, Andreas Kostyrka <[EMAIL PROTECTED]> wrote: > What I wonder is what ports do I need to access? > > 50060 on all nodes. > 50030 on

Re: hadoop on EC2

2008-05-28 Thread Ted Dunning
That doesn't work because the various web pages have links or redirects to other pages on other machines. Also, you would need to ssh to ALL of your cluster to get the file browser to work. Better to do the proxy thing. On 5/28/08 2:16 PM, "Chris Anderson" <[EMAIL PROTECTED]> wrote: > Andreas

Re: hadoop on EC2

2008-05-28 Thread Chris Anderson
On Wed, May 28, 2008 at 2:23 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > > That doesn't work because the various web pages have links or redirects to > other pages on other machines. > > Also, you would need to ssh to ALL of your cluster to get the file browser > to work. True. That makes it a li

Re: hadoop on EC2

2008-05-28 Thread Jim R. Wilson
Recently I spent some time hacking the contrib/ec2 scripts to install and configure OpenVPN on top of the other installed packages. Our use case required that all the slaves running mappers would need to connect back through to our primary mysql database (firewalled as you can imagine). Simultane

Re: 0.16.4 DataNode problem...

2008-05-28 Thread C G
I've repeated the experiment under more controlled circumstances: by creating a new file system formatted by 0.16.4 and then populating it. In this scenario we see the same problem: during the reduce phase the DataNode instances consume more and more memory until the system fails. Further, o

Need example of MapWritable as Intermediate Key

2008-05-28 Thread Tarandeep Singh
Hi, Can someone point me to an example code where MapWritable/SortedMapWritable is used as in intermediate key. I am looking for how to set the comparator for MapWritable/SortedMapwritable so that the framework groups/sorts the intermediate keys in accordance to my requirement - sort the intermedi

Re: hadoop on EC2

2008-05-28 Thread Nate Carlson
On Wed, 28 May 2008, Andreas Kostyrka wrote: 1.) opening ports 50030 and so on => not good, data goes unprotected over the internet. Even if I could enable some form of authentication it would still plain http. Personally, I set up an Apache server (with https and auth), and then set up cgipr