Hello!
I have a 7 node cluster. But there is one remote node(8th machine) within
the same LAN which holds some kind of data. Now, I need to place this data
into HDFS. This 8th machine is not a part of the hadoop
cluster(master/slave) config file.
So, what I have thought is::
-> Will get the Files
If you are going to be using this 8th machine as a client only then
ensure that it is running the same version of hadoop as your cluster.
In the config file hadoop-site.xml point fs.default.name to the namenode.
-Usman
Hello!
I have a 7 node cluster. But there is one remote node(8th machine) w
Hello,
I am using Hadoop 0.19.1 and my job (this will run for 10 hours to finish)
failed in the middle of execution (once 21% and second 81% finished) with the
error messages like
java.io.IOException: Could not read from stream
...
java.io.IOException: Bad connect ack with firstBadLink 130.207.
Aaron Kimball wrote:
Finally, there's a third scheduler called the Capacity scheduler. It's
similar to the fair scheduler, in that it allows guarantees of minimum
availability for different pools. I don't know how it apportions additional
extra resources though -- this is the one I'm least famil
Mayuran Yogarajah wrote:
There are always a few 'Failed/Killed Task Attempts' and when I view the
logs for
these I see:
- some that are empty, ie stdout/stderr/syslog logs are all blank
- several that say:
2009-06-06 20:47:15,309 WARN org.apache.hadoop.mapred.TaskTracker: Error
running child
Hello!
As far as I have read the forums of Map-reduce, it is basically used to
process large amount of data speedily. right??
But, can you please give me some instances or examples wherein, I can use
map-reduce..???
--
Regards!
Sugandha
Hi,
I am trying to get started with Hadoop Pipes. Is there an example of
chaining tasks (with Pipes) somewhere?
If not, can someone tell me how I can specify the input and output
directories for the second task. I was expecting to be able to set these
values in JobConf, but Pipes seems to provide
A very common one is processing large quantities of log files and producing
summary date.
Another use is simply as a way of distributing large jobs across multiple
computers.
In a previous job, we used Map/Reduce for distributed bulk web crawling, and
for distributed media file processing.
On Mon,
Hi Sugandha,
Usman has already answered your question. Please stop reposting the same
question over and over.
Thanks
-Todd
On Mon, Jun 8, 2009 at 7:05 AM, Sugandha Naolekar wrote:
> Hello!
>
> I have A 7 node cluster. Now there is 8th machine (called as remote) which
> will bw acting just as a
Hi Todd!
I am facing many issues in transferring the data and making it work. That's
why, I reposted the question. My intention is not to trouble you guys!
Sorry for the inconveniences.
On Mon, Jun 8, 2009 at 7:40 PM, Todd Lipcon wrote:
> Hi Sugandha,
>
> Usman has already answered your questi
On Mon, Jun 8, 2009 at 7:14 AM, Sugandha Naolekar wrote:
> Hi Todd!
>
> I am facing many issues in transferring the data and making it work. That's
> why, I reposted the question. My intention is not to trouble you guys!
>
It's no trouble at all! We're glad to help, but it's much easier for us to
If you're going to be doing ad-hoc HDFS puts and gets, then you should just
use the Hadoop command line tool, bin/hadoop. Otherwise, you can use the
Java API to read and write files, etc.
As for contributing to Hadoop and its ecosystem, everything is open source
and open for contributions. You s
Can anyone help me on this issue. I have an account on the cluster and I
cannot go and start server on each server process on each tasktracker.
Akhil
akhil1988 wrote:
>
> Hi All,
>
> I am porting a machine learning application on Hadoop using MapReduce. The
> architecture of the application g
I should mention..these are Hadoop streaming jobs, Hadoop version
hadoop-0.18.3.
Any idea about the empty stdout/stderr/syslog logs? I have no way to
really track down
whats causing them.
thanks
Steve Loughran wrote:
Mayuran Yogarajah wrote:
There are always a few 'Failed/Killed Task
Hi,
I am creating a small Hadoop (0.19.1) cluster (2 nodes to start), each
of the machines has 2 NIC cards (1 external facing, 1 internal
facing). It is important that Hadoop run and communicate on the
internal facing NIC (because the external facing NIC costs me money),
also the interna
If "The program is very simple and just adds time stamp to the every line of
input data." is what your job actually does, then you may have to think
about changing your jobs.
Having said your job takes 10 hours to finish, I guess you have tons of data
to process (maybe hundreds of gigabytes?). The
16 matches
Mail list logo