RE: Can you help me to install HDFS Federation and test?

2013-09-17 Thread Sandeep L
Hi, I tried to install HDFS federation with the help of document given by you. I have small issue.I used 2 slave in setup, both will act as namenode and datanode.Now the issue is when I am looking at home pages of both namenodes only one datanode is appearing.As per my understanding 2 datanodes

Re: Container allocation fails randomly

2013-09-17 Thread Krishna Kishore Bonagiri
Hi Omkar, Thanks for the quick reply, and sorry for not being able to get the required logs that you have asked for. But in the meanwhile I just wanted to check if you can get a clue with the information I have now. I am seeing the following kind of error message in AppMaster.stderr whenever

MAP_INPUT_RECORDS counter in the reducer

2013-09-17 Thread Yaron Gonen
Hi, Is there a way for the reducer to get the total number of input records to the map phase? For example, I want the reducer to normalize a sum by dividing it in the number of records. I tried getting the value of that counter by using the line:

Oozie dynamic action

2013-09-17 Thread praveenesh kumar
Hi, I have a scenario in which I want to trigger a hive uploading script every day. I have a set of folders created for a set of customer ids everyday. My hive script will read the customer id from the path, checks whether the table for the customer id exits and if not create a table and will

How does one make a hadoop task attempt to fail after too many data fetch failures?

2013-09-17 Thread Francisco Rodera
Hi, I have a hadoop reduce task attempt that will never fail or get completed unless I manually fail/kill it. The problem surfaces when the task tracker node (due to network issues that I am still investigating) looses connectivity with other task trackers/data nodes, but not with the job

Re: chaining (the output of) jobs/ reducers

2013-09-17 Thread Adrian CAPDEFIER
Thanks Bryan. This is great stuff! On Thu, Sep 12, 2013 at 8:49 PM, Bryan Beaudreault bbeaudrea...@hubspot.com wrote: Hey Adrian, To clarify, the replication happens on *write*. So as you write output from the reducer of Job A, you are writing into hdfs. Part of that write path is

Re: chaining (the output of) jobs/ reducers

2013-09-17 Thread Adrian CAPDEFIER
I've just seen your email, Vinod. This is the behaviour that I'd expect and similar to other data integration tools; I will keep an eye out for it as a long term option. On Fri, Sep 13, 2013 at 5:26 AM, Vinod Kumar Vavilapalli vino...@apache.org wrote: Other than the short term solutions

Re: MAP_INPUT_RECORDS counter in the reducer

2013-09-17 Thread Shahab Yunus
In the normal configuration, the issue here is that Reducers can start before all the Maps have finished so it is not possible to get the number (or make sense of it even if you are able to,) Having said that, you can specifically make sure that Reducers don't start until all your maps have

Generating mysql or sqlite datafiles from Hadoop (Java)?

2013-09-17 Thread Niels Basjes
Hi, I remember hearing a while ago that (if I remember correctly) Facebook had an outputformat that wrote the underlying MySQL database files directly from a MapReduce job. For my purpose an sqlite datafile would be good enough too. I've been unable to find any of those two solutions by simply

Yarn and HDFS federation

2013-09-17 Thread Manickam P
Hello Experts, I know that hdfs federation helps adding multiple name nodes. In this case how this Yarn works? Where will be the Resource Manger if i have multiple name nodes? (Normally in Master Node I guess) Can we execute yarn applications in hadoop cluster without having this hdfs

Re: Oozie dynamic action

2013-09-17 Thread Peyman Mohajerian
If you want to see a simple example of what you are looking for: https://github.com/cloudera/cdh-twitter-example It is part of this article: http://blog.cloudera.com/blog/2012/09/analyzing-twitter-data-with-hadoop/ On Tue, Sep 17, 2013 at 4:20 AM, praveenesh kumar praveen...@gmail.comwrote:

Bzip2 vs Gzip

2013-09-17 Thread Amit Sela
Hi all, I'm using hadoop 1.0.4 and using gzip to keep the logs processed by hadoop (logs are gzipped into block size files). I read that bzip2 is splittable. Is it so in hadoop 1.0.4 ? Does that mean that any input file bigger then block size will be split between maps ? What are the tradeoffs

Re: mapred.join package not migrated to mapreduce

2013-09-17 Thread Chris Douglas
Amareshwari ported this in MAPREDUCE-355: https://issues.apache.org/jira/browse/MAPREDUCE-355 It has not been backported to the 1.x line, but it is in the 2.x branch. -C On Mon, Sep 16, 2013 at 4:34 PM, Ivan Balashov ibalas...@iponweb.net wrote: Hi, Just wondering if there is any particular

RE: MAP_INPUT_RECORDS counter in the reducer

2013-09-17 Thread java8964 java8964
Or you do the calculation in the reducer close() method, even though I am not sure in the reducer you can get the Mapper's count. But even you can't, here is what can do:1) Save the JobConf reference in your Mapper conf metehod2) Store the Map_INPUT_RECORDS counter in the configuration object

Hadoop sequence file's benefits

2013-09-17 Thread java8964 java8964
Hi, I have a question related to sequence file. I wonder why I should use it under what kind of circumstance? Let's say if I have a csv file, I can store that directly in HDFS. But if I do know that the first 2 fields are some kind of key, and most of MR jobs will query on that key, will it

a question about webhdfs

2013-09-17 Thread douxin
hi guys, I am using webhdfs, and I noticed that when I exec this: curl -i -L 'http://192.168.1.217:50070/webhdfs/v1/user/hadoop/sample.txt?op=GETFILECHECKSUM' It is redirected to http://hadoop2:50075/webhdfs/v1/user/hadoop/sample.txt?op=GETFILECHECKSUM; I am wondering why

Re: MAP_INPUT_RECORDS counter in the reducer

2013-09-17 Thread Rahul Bhattacharjee
Shahab, One question - You mentioned - In the normal configuration, the issue here is that Reducers can start before all the Maps have finished so it is not possible to get the number (or make sense of it even if you are able to,) I think , reducers would start copying the data form the

Re: Bzip2 vs Gzip

2013-09-17 Thread Rahul Bhattacharjee
Yes , bzip2 is splittable. Tradeoffs - I have not done much experimentation with codecs. Thanks, Rahul On Wed, Sep 18, 2013 at 2:07 AM, Amit Sela am...@infolinks.com wrote: Hi all, I'm using hadoop 1.0.4 and using gzip to keep the logs processed by hadoop (logs are gzipped into block size

Re: Hadoop sequence file's benefits

2013-09-17 Thread Preethi Vinayak Ponangi
Sequence file is a solution offered to avoid small files problem. If you have too many small files, Hadoop wouldn't scale very well. It also eats up your Namenode memory if you aren't able to combine them somehow. If you have a million 10 KB files, it is often useful to combine them into larger

RE: Can you help me to install HDFS Federation and test?

2013-09-17 Thread Sandeep L
Hi, I resolved the issue.There is some problem with /etc/hosts file. One more question I would like to ask is: I created a directory in HDFS of NameNode1 and copied a file into it. My question is did it visible when I ran hadoop fs -ls PathToDirectory from NameNode2 machine?For me its not

Re: Securing the Secondary Name Node

2013-09-17 Thread Benoy Antony
Chris , I think that the error occurs when NN tries to download the fsimage from SNN. You can check the NN logs to make sure whether this is true. There could be different reasons for this. 1. NN fails to do SPNEGO with SNN. 2. NN's TGT expired. Unlikely in your test cluster. Please post with