Re: Few Queries..!!!

2009-06-08 Thread Sugandha Naolekar
Hello! I have a 7 node cluster. But there is one remote node(8th machine) within the same LAN which holds some kind of data. Now, I need to place this data into HDFS. This 8th machine is not a part of the hadoop cluster(master/slave) config file. So, what I have thought is:: -> Will get the Files

Re: Placing data into HDFS..!

2009-06-08 Thread Usman Waheed
If you are going to be using this 8th machine as a client only then ensure that it is running the same version of hadoop as your cluster. In the config file hadoop-site.xml point fs.default.name to the namenode. -Usman Hello! I have a 7 node cluster. But there is one remote node(8th machine) w

"java.io.IOException: Could not read from stream" & "java.io.IOException: Bad connect ack with firstBadLink 130.207.114.164:50010"

2009-06-08 Thread Kang, Seunghwa
Hello, I am using Hadoop 0.19.1 and my job (this will run for 10 hours to finish) failed in the middle of execution (once 21% and second 81% finished) with the error messages like java.io.IOException: Could not read from stream ... java.io.IOException: Bad connect ack with firstBadLink 130.207.

Re: Hadoop scheduling question

2009-06-08 Thread Steve Loughran
Aaron Kimball wrote: Finally, there's a third scheduler called the Capacity scheduler. It's similar to the fair scheduler, in that it allows guarantees of minimum availability for different pools. I don't know how it apportions additional extra resources though -- this is the one I'm least famil

Re: Every time the mapping phase finishes I see this

2009-06-08 Thread Steve Loughran
Mayuran Yogarajah wrote: There are always a few 'Failed/Killed Task Attempts' and when I view the logs for these I see: - some that are empty, ie stdout/stderr/syslog logs are all blank - several that say: 2009-06-06 20:47:15,309 WARN org.apache.hadoop.mapred.TaskTracker: Error running child

Map-Reduce!

2009-06-08 Thread Sugandha Naolekar
Hello! As far as I have read the forums of Map-reduce, it is basically used to process large amount of data speedily. right?? But, can you please give me some instances or examples wherein, I can use map-reduce..??? -- Regards! Sugandha

Chaining Pipes Tasks

2009-06-08 Thread Roshan James
Hi, I am trying to get started with Hadoop Pipes. Is there an example of chaining tasks (with Pipes) somewhere? If not, can someone tell me how I can specify the input and output directories for the second task. I was expecting to be able to set these values in JobConf, but Pipes seems to provide

Re: Map-Reduce!

2009-06-08 Thread jason hadoop
A very common one is processing large quantities of log files and producing summary date. Another use is simply as a way of distributing large jobs across multiple computers. In a previous job, we used Map/Reduce for distributed bulk web crawling, and for distributed media file processing. On Mon,

Re: Remote connection to HDFS!

2009-06-08 Thread Todd Lipcon
Hi Sugandha, Usman has already answered your question. Please stop reposting the same question over and over. Thanks -Todd On Mon, Jun 8, 2009 at 7:05 AM, Sugandha Naolekar wrote: > Hello! > > I have A 7 node cluster. Now there is 8th machine (called as remote) which > will bw acting just as a

Re: Remote connection to HDFS!

2009-06-08 Thread Sugandha Naolekar
Hi Todd! I am facing many issues in transferring the data and making it work. That's why, I reposted the question. My intention is not to trouble you guys! Sorry for the inconveniences. On Mon, Jun 8, 2009 at 7:40 PM, Todd Lipcon wrote: > Hi Sugandha, > > Usman has already answered your questi

Re: Remote connection to HDFS!

2009-06-08 Thread Todd Lipcon
On Mon, Jun 8, 2009 at 7:14 AM, Sugandha Naolekar wrote: > Hi Todd! > > I am facing many issues in transferring the data and making it work. That's > why, I reposted the question. My intention is not to trouble you guys! > It's no trouble at all! We're glad to help, but it's much easier for us to

Re: Few Queries..!!!

2009-06-08 Thread Alex Loddengaard
If you're going to be doing ad-hoc HDFS puts and gets, then you should just use the Hadoop command line tool, bin/hadoop. Otherwise, you can use the Java API to read and write files, etc. As for contributing to Hadoop and its ecosystem, everything is open source and open for contributions. You s

Re: Implementing CLient-Server architecture using MapReduce

2009-06-08 Thread akhil1988
Can anyone help me on this issue. I have an account on the cluster and I cannot go and start server on each server process on each tasktracker. Akhil akhil1988 wrote: > > Hi All, > > I am porting a machine learning application on Hadoop using MapReduce. The > architecture of the application g

Re: Every time the mapping phase finishes I see this

2009-06-08 Thread Mayuran Yogarajah
I should mention..these are Hadoop streaming jobs, Hadoop version hadoop-0.18.3. Any idea about the empty stdout/stderr/syslog logs? I have no way to really track down whats causing them. thanks Steve Loughran wrote: Mayuran Yogarajah wrote: There are always a few 'Failed/Killed Task

Multiple NIC Cards

2009-06-08 Thread John Martyniak
Hi, I am creating a small Hadoop (0.19.1) cluster (2 nodes to start), each of the machines has 2 NIC cards (1 external facing, 1 internal facing). It is important that Hadoop run and communicate on the internal facing NIC (because the external facing NIC costs me money), also the interna

Re: "java.io.IOException: Could not read from stream" & "java.io.IOException: Bad connect ack with firstBadLink 130.207.114.164:50010"

2009-06-08 Thread Taeho Kang
If "The program is very simple and just adds time stamp to the every line of input data." is what your job actually does, then you may have to think about changing your jobs. Having said your job takes 10 hours to finish, I guess you have tons of data to process (maybe hundreds of gigabytes?). The