Re: distcp port for 0.17.2

2008-10-23 Thread Aaron Kimball
Hi, The dfs.http.address is for human use, not program interoperability. You can visit http://whatever.address.your.namenode.has:50070 in a web browser and see statistics about your filesystem. The address of cluster 2 is in its fs.default.name. This should be set to something like hdfs://cluster

Re: Passing Constants from One Job to the Next

2008-10-23 Thread Aaron Kimball
See Configuration.setInt() in the API. (JobConf inherits from Configuration). You can read it back in the configure() method of your mappers/reducers - Aaron On Wed, Oct 22, 2008 at 3:03 PM, Yih Sun Khoo <[EMAIL PROTECTED]> wrote: > Are you saying that I can pass, say, a single integer constant w

Re: Is it possible to change parameters using org.apache.hadoop.conf.Configuration API?

2008-10-23 Thread Steve Loughran
Alex Loddengaard wrote: Just to be clear, you want to persist a configuration change to your entire cluster without bringing it down, and you're hoping to use the Configuration API to do so. Did I get your question correct? I don't know of a way to do this without restarting the cluster, becaus

Auto-shutdown for EC2 clusters

2008-10-23 Thread Stuart Sierra
Hi folks, Anybody tried scripting Hadoop on EC2 to... 1. Launch a cluster 2. Pull data from S3 3. Run a job 4. Copy results to S3 5. Terminate the cluster ... without any user interaction? -Stuart

Re: Auto-shutdown for EC2 clusters

2008-10-23 Thread Chris K Wensel
Hey Stuart I did that for a client using Cascading events and SQS. When jobs completed, they dropped a message on SQS where a listener picked up new jobs and ran with them, or decided to kill off the cluster. The currently shipping EC2 scripts are suitable for having multiple simultaneous

Re: Auto-shutdown for EC2 clusters

2008-10-23 Thread Per Jacobsson
We're doing the same thing, but doing the scheduling just with shell scripts running on a machine outside of the Hadoop cluster. It works but we're getting into a bit of scripting hell as things get more complex. We're using distcp to first copy the files the jobs need from S3 to HDFS and it works

Re: Auto-shutdown for EC2 clusters

2008-10-23 Thread Paco NATHAN
Hi Stuart, Yes, we do that. Ditto on most of what Chris described. We use an AMI which pulls tarballs for Ant, Java, Hadoop, etc., from S3 when it launches. That controls the versions for tools/frameworks, instead of redoing an AMI each time a tool has an update. A remote server -- in our data

[Help needed] Is there a way to know the input filename at Hadoop Streaming?

2008-10-23 Thread Steve Gao
Sorry for the email. Thanks for any help or hint.     I am using Hadoop Streaming. The input are multiple files.     Is there a way to get the current filename in mapper?     For example:     $HADOOP_HOME/bin/hadoop  \     jar $HADOOP_HOME/hadoop-streaming.jar \     -input file1 \     -in

RE: Is there a way to know the input filename at Hadoop Streaming?

2008-10-23 Thread Steve Gao
Thanks, Amogh. But my case is slightly different. The command line inputs are 2 files: file1 and file2. I need to tell in the mapper which line is from which file: #In mapper while (){ //how to tell the current line is from file1 or file2? } -jobconfs map.input.file param does not help in this

Re: [Help needed] Is there a way to know the input filename at Hadoop Streaming?

2008-10-23 Thread Zhengguo 'Mike' SUN
I guess one trick you can do without the help of hadoop is to encode the file identifier inside the file itself. For example, each line of file1 could start with 1'space''content of the original line'. - Original Message From: Steve Gao <[EMAIL PROTECTED]> To: core-user@hadoop.apache.

Re: Is there a way to know the input filename at Hadoop Streaming?

2008-10-23 Thread Rick Cox
On Wed, Oct 22, 2008 at 18:55, Steve Gao <[EMAIL PROTECTED]> wrote: > I am using Hadoop Streaming. The input are multiple files. > Is there a way to get the current filename in mapper? > Streaming map tasks should have a "map_input_file" environment variable like the following: map_input_file=hdf

Re: distcp port for 0.17.2

2008-10-23 Thread Tsz Wo (Nicholas), Sze
> The dfs.http.address is for human use, not program interoperability. You can > visit http://whatever.address.your.namenode.has:50070 in a web browser and > see statistics about your filesystem. This is not entirely true. The dfs.http.address is also used programmeticaly in some components like

Re: distcp port for 0.17.2

2008-10-23 Thread bzheng
It's working for me now. Turns out the cluster have multiple network interfaces and I was using the wrong one. Thanks. Aaron Kimball-3 wrote: > > Hi, > > The dfs.http.address is for human use, not program interoperability. You > can > visit http://whatever.address.your.namenode.has:50070 in

LHadoop Server simple Hadoop input and output

2008-10-23 Thread Edward Capriolo
One of my first questions about hadoop was, "How do systems outside the cluster interact with the file system?" I read several documents that described streaming data into hadoop for processing, but I had trouble finding examples. The goal of LHadoop Server (L stands for Lightweight) is to produce

Re: LHadoop Server simple Hadoop input and output

2008-10-23 Thread Jeff Hammerbacher
Hey Edward, The Thrift interface to HDFS allows clients to be developed in any Thrift-supported language: http://wiki.apache.org/hadoop/HDFS-APIs. Regards, Jeff On Thu, Oct 23, 2008 at 1:04 PM, Edward Capriolo <[EMAIL PROTECTED]> wrote: > One of my first questions about hadoop was, "How do syste

Re: LHadoop Server simple Hadoop input and output

2008-10-23 Thread Edward Capriolo
I had downloaded thrift and ran the example applications after the Hive meet up. It is very cool stuff. The thriftfs interface is more elegant than what I was trying to do, and that implementation is more complete. Still, someone might be interested in what I did if they want a super-light API :)

Seeking Someone to Review Hadoop Article

2008-10-23 Thread Tom Wheeler
Each month the developers at my company write a short article about a Java technology we find exciting. I've just finished one about Hadoop for November and am seeking a volunteer knowledgeable about Hadoop to look it over to help ensure it's both clear and technically accurate. If you're interest

Re: Is it possible to change parameters using org.apache.hadoop.conf.Configuration API?

2008-10-23 Thread Jinyeon Lee
Thank you!!!

Re: Seeking Someone to Review Hadoop Article

2008-10-23 Thread Mafish Liu
I'm interesting in it. On Fri, Oct 24, 2008 at 6:31 AM, Tom Wheeler <[EMAIL PROTECTED]> wrote: > Each month the developers at my company write a short article about a > Java technology we find exciting. I've just finished one about Hadoop > for November and am seeking a volunteer knowledgeable ab

HELP: Namenode Startup Failed with an OutofMemoryError

2008-10-23 Thread woody zhou
Hi everyone, I have a problem about Hadoop startup. I failed to startup the namenode and I got the following exception in the namenode log file: 2008-10-23 21:54:51,223 INFO org.mortbay.http.SocketListener: Started SocketListener on 0.0.0.0:50070 2008-10-23 21:54:51,224 INFO org.mortbay.util.Cont

Article on Apache Hadoop

2008-10-23 Thread Amit k. Saha
Hello, I have just started exploring Hadoop purely out of hobbyistic reasons. I love writing and hence have published a dummy style article on Apache Hadoop titled: "Hands-on Hadoop for cluster computing". Its available at http://www.linux.com/feature/150395 I am very thankful to all the folks on