Re: Starting up a larger cluster

2008-02-09 Thread Owen O'Malley
On Feb 8, 2008, at 9:32 AM, Jeff Eastman wrote: I noticed that phenomena right off the bat. Is that a designed "feature" or just an unhappy consequence of how blocks are allocated? It was driven by a desire to maximize HDFS write throughput, which has unfortunate effects in the case of a

Re: Speculative execution and output directory

2008-02-09 Thread Arun C Murthy
On Feb 9, 2008, at 3:52 PM, Ashish Thusoo wrote: Hi Hadoop users, We have intermittently hit issues with speculative execution and hadoop streaming where we see a directory of the form _task_200__m_..._. It's an unfortunate side-effect of the current implementation of specul

Re: Hadoop summit / workshop at Yahoo!

2008-02-09 Thread Scott S
Has any announcement gone out about the Hadoop Summit. I am very interested in attending, but have not heard anything about it other than this original post. Scott Ajay Anand wrote: > > Yahoo plans to host a summit / workshop on Apache Hadoop at our > Sunnyvale campus on March 25th. Given

Re: Best Practice?

2008-02-09 Thread Ted Dunning
It's still better to use a combiner! On 2/9/08 4:37 PM, "Jeff Eastman" <[EMAIL PROTECTED]> wrote: > Well, I tried saving the OutputCollectors in an instance variable and > writing to them during close and it seems to work. > > Jeff > > -Original Message- > From: Jeff Eastman [mailto:

Re: Best Practice?

2008-02-09 Thread Ted Dunning
Hmmm I think that computing centroids in the mapper may not be the best idea. A different structure that would work well is to use the mapper to assign data records to centroids and use the centroid number as the key for the reduce key. Then the reduce itself can compute the centroids. Yo

Re: Best Practice?

2008-02-09 Thread Ted Dunning
Put them in the job configuration and over-ride the configure method to get access to them. Then store them in fields in the mapper or reducer until you need them. On 2/9/08 3:39 PM, "Jeff Eastman" <[EMAIL PROTECTED]> wrote: > What's the best way to get additional configuration arguments to my

RE: Best Practice?

2008-02-09 Thread Jeff Eastman
Well, I tried saving the OutputCollectors in an instance variable and writing to them during close and it seems to work. Jeff -Original Message- From: Jeff Eastman [mailto:[EMAIL PROTECTED] Sent: Saturday, February 09, 2008 4:21 PM To: core-user@hadoop.apache.org Subject: RE: Best Pract

RE: Best Practice?

2008-02-09 Thread Jeff Eastman
Thanks Aaron, I missed that one. Now I have my configuration information in my mapper. In the mapper, I'm computing cluster centroids by reading all the input points and assigning them to clusters. I don't actually store the points in the mapper, just the evolving centroids. I'm trying to wait un

Speculative execution and output directory

2008-02-09 Thread Ashish Thusoo
Hi Hadoop users, We have intermittently hit issues with speculative execution and hadoop streaming where we see a directory of the form _task_200__m_..._. formed in the output directory. Has anyone out there hit similar issues or knows what might be happening here? We did scan th

Re: Best Practice?

2008-02-09 Thread Aaron Kimball
You can set arbitrary key value pairs in the JobConf. So you can have jeff.yourapp.yoursetting = yourval to your heart's content. - Aaron Jeff Eastman wrote: What's the best way to get additional configuration arguments to my mappers and reducers? Jeff

Best Practice?

2008-02-09 Thread Jeff Eastman
What's the best way to get additional configuration arguments to my mappers and reducers? Jeff

Re: URLs contain non-existant domain names in machines.jsp

2008-02-09 Thread Allen Wittenauer
On 2/9/08 7:41 AM, "Ben Kucinich" <[EMAIL PROTECTED]> wrote: > I don't want it to use > the hostname to form those links. I want it to use the IP address, > 192.168.101.8 to form the links. Is it possible? I'm fairly certain the answer is no. You need to have working hostname resolution.

Re: URLs contain non-existant domain names in machines.jsp

2008-02-09 Thread Ben Kucinich
I made a small mistake describing my problem. There is no 192.168.1.8. There is only one machine, 192.168.101.8. I'll describe my problem again. 1. I have set up a single-node cluster on 192.168.101.8. It is an Ubuntu server. 2. There is no entry for 192.168.101.8 in the DNS server. However, the

Hypertable

2008-02-09 Thread Doug Judd
We're working on an open source, high performance, distributed database modeled after Bigtable. It differs from HBase in that it is a more faithful implementation of the design outlined in the Bigtable paper and is written in C++. You can check out the project and download an "alpha" release here

Re: URLs contain non-existant domain names in machines.jsp

2008-02-09 Thread Amar Kamat
Ben Kucinich wrote: I have a Hadoop running on a master node 192.168.1.8. fs.default.name is 192.168.101.8:9000 and mapred.job.tracker is 192.168.101.8:9001. Actually the masters are the nodes where the JobTracker and the NameNode are running i.e 192.168.101.8 in your case. 192.168.1.8 would