Hi,
I tried to install HDFS federation with the help of document given by you.
I have small issue.I used 2 slave in setup, both will act as namenode and
datanode.Now the issue is when I am looking at home pages of both namenodes
only one datanode is appearing.As per my understanding 2 datanodes
Hi Omkar,
Thanks for the quick reply, and sorry for not being able to get the
required logs that you have asked for.
But in the meanwhile I just wanted to check if you can get a clue with
the information I have now. I am seeing the following kind of error message
in AppMaster.stderr whenever
Hi,
Is there a way for the reducer to get the total number of input records to
the map phase?
For example, I want the reducer to normalize a sum by dividing it in the
number of records. I tried getting the value of that counter by using the
line:
Hi,
I have a scenario in which I want to trigger a hive uploading script every
day. I have a set of folders created for a set of customer ids everyday. My
hive script will read the customer id from the path, checks whether the
table for the customer id exits and if not create a table and will
Hi,
I have a hadoop reduce task attempt that will never fail or get completed
unless I manually fail/kill it.
The problem surfaces when the task tracker node (due to network issues that
I am still investigating) looses connectivity with other task trackers/data
nodes, but not with the job
Thanks Bryan. This is great stuff!
On Thu, Sep 12, 2013 at 8:49 PM, Bryan Beaudreault bbeaudrea...@hubspot.com
wrote:
Hey Adrian,
To clarify, the replication happens on *write*. So as you write output
from the reducer of Job A, you are writing into hdfs. Part of that write
path is
I've just seen your email, Vinod. This is the behaviour that I'd expect and
similar to other data integration tools; I will keep an eye out for it as a
long term option.
On Fri, Sep 13, 2013 at 5:26 AM, Vinod Kumar Vavilapalli vino...@apache.org
wrote:
Other than the short term solutions
In the normal configuration, the issue here is that Reducers can start
before all the Maps have finished so it is not possible to get the number
(or make sense of it even if you are able to,)
Having said that, you can specifically make sure that Reducers don't start
until all your maps have
Hi,
I remember hearing a while ago that (if I remember correctly) Facebook
had an outputformat that wrote the underlying MySQL database files
directly from a MapReduce job.
For my purpose an sqlite datafile would be good enough too.
I've been unable to find any of those two solutions by simply
Hello Experts,
I know that hdfs federation helps adding multiple name nodes. In this case how
this Yarn works? Where will be the Resource Manger if i have multiple name
nodes? (Normally in Master Node I guess) Can we execute yarn applications in
hadoop cluster without having this hdfs
If you want to see a simple example of what you are looking for:
https://github.com/cloudera/cdh-twitter-example
It is part of this article:
http://blog.cloudera.com/blog/2012/09/analyzing-twitter-data-with-hadoop/
On Tue, Sep 17, 2013 at 4:20 AM, praveenesh kumar praveen...@gmail.comwrote:
Hi all,
I'm using hadoop 1.0.4 and using gzip to keep the logs processed by hadoop
(logs are gzipped into block size files).
I read that bzip2 is splittable. Is it so in hadoop 1.0.4 ? Does that mean
that any input file bigger then block size will be split between maps ?
What are the tradeoffs
Amareshwari ported this in MAPREDUCE-355:
https://issues.apache.org/jira/browse/MAPREDUCE-355
It has not been backported to the 1.x line, but it is in the 2.x branch. -C
On Mon, Sep 16, 2013 at 4:34 PM, Ivan Balashov ibalas...@iponweb.net wrote:
Hi,
Just wondering if there is any particular
Or you do the calculation in the reducer close() method, even though I am not
sure in the reducer you can get the Mapper's count.
But even you can't, here is what can do:1) Save the JobConf reference in your
Mapper conf metehod2) Store the Map_INPUT_RECORDS counter in the configuration
object
Hi, I have a question related to sequence file. I wonder why I should use it
under what kind of circumstance?
Let's say if I have a csv file, I can store that directly in HDFS. But if I do
know that the first 2 fields are some kind of key, and most of MR jobs will
query on that key, will it
hi guys,
I am using webhdfs, and I noticed that when I exec this:
curl -i -L
'http://192.168.1.217:50070/webhdfs/v1/user/hadoop/sample.txt?op=GETFILECHECKSUM'
It is redirected to
http://hadoop2:50075/webhdfs/v1/user/hadoop/sample.txt?op=GETFILECHECKSUM;
I am wondering why
Shahab,
One question - You mentioned - In the normal configuration, the issue here
is that Reducers can start before all the Maps have finished so it is not
possible to get the number (or make sense of it even if you are able to,)
I think , reducers would start copying the data form the
Yes , bzip2 is splittable.
Tradeoffs - I have not done much experimentation with codecs.
Thanks,
Rahul
On Wed, Sep 18, 2013 at 2:07 AM, Amit Sela am...@infolinks.com wrote:
Hi all,
I'm using hadoop 1.0.4 and using gzip to keep the logs processed by hadoop
(logs are gzipped into block size
Sequence file is a solution offered to avoid small files problem. If you
have too many small files, Hadoop wouldn't scale very well. It also eats up
your Namenode memory if you aren't able to combine them somehow.
If you have a million 10 KB files, it is often useful to combine them into
larger
Hi,
I resolved the issue.There is some problem with /etc/hosts file.
One more question I would like to ask is:
I created a directory in HDFS of NameNode1 and copied a file into it. My
question is did it visible when I ran hadoop fs -ls PathToDirectory from
NameNode2 machine?For me its not
Chris ,
I think that the error occurs when NN tries to download the fsimage from
SNN.
You can check the NN logs to make sure whether this is true.
There could be different reasons for this.
1. NN fails to do SPNEGO with SNN.
2. NN's TGT expired. Unlikely in your test cluster.
Please post with
21 matches
Mail list logo