RE: Will different files in HDFS trigger different mapper

2013-10-02 Thread Sourygna Luangsay
Hi, If you have lot of small files, by default Hive will group various of them in a single mapper. Check this property: hive.input.format (org.apache.hadoop.hive.ql.io.CombineHiveInputFormat (default, if empty) => if you set it to “org.apache.hadoop.hive.ql.io.HiveInputFormat”, you´ll get

HDFS block replication issue

2013-10-02 Thread Ionut Ignatescu
Hi, I have a Hadoop&HBase cluster, that runs Hadoop 1.1.2 and HBase 0.94.7. I notice an issue that stops normal cluster running. My use case: I have several MR jobs that read data from one HBase table in map phase and write data in 3 different tables during the reduce phase. I create table handle

Re: Accessing only particular folder using hadoop streaming

2013-10-02 Thread Harsh J
You need to use globs when passing your input path, like below perhaps: data/shard*/d1* On Thu, Oct 3, 2013 at 1:28 AM, jamal sasha wrote: > Hi, > I have data in this one folder like following: > > data---shard1---d1_1 > | |_d2_1 > Lshard2---d1_1 >

Re: Hadoop Solaris OS compatibility

2013-10-02 Thread Roman Shaposhnik
On Fri, Sep 27, 2013 at 2:42 AM, Jitendra Yadav wrote: > Hi All, > > Since few years, I'm working as hadoop admin on Linux platform,Though we > have majority of servers on Solaris (Sun Sparc hardware). Many times I have > seen that hadoop is compatible with Linux. Is that right?. If yes then what

Re: HDFS / Federated HDFS - Doubts

2013-10-02 Thread Krishna Kumaar Natarajan
Thanks Chris. Hope someone answers/give pointer to get clear idea about question4. Regards, Krishna On Wed, Oct 2, 2013 at 1:41 PM, Chris Mawata wrote: > Don't know about question 4 but for the first three -- the metadata is > in the memory of the namenode at runtime but is also persisted to

Will different files in HDFS trigger different mapper

2013-10-02 Thread java8964 java8964
Hi, I have a question related to how the mapper generated for the input files from HDFS. I understand the split and blocks concept in the HDFS, but my originally understanding is that one mapper will only process data from one file in HDFS, no matter how small this file it is. Is that correct? T

Accessing only particular folder using hadoop streaming

2013-10-02 Thread jamal sasha
Hi, I have data in this one folder like following: data---shard1---d1_1 | |_d2_1 Lshard2---d1_1 | |_d2_2 Lshard3---d1_1 | |_d2_3 Lshard4---d1_1 |_d2_4 Now, I want to

Re: modify HDFS

2013-10-02 Thread Ravi Prakash
Karim! Hadoop 3.0 corresponds to trunk currently. I would recommend you to use branch-2. Its fairly stable. hadoop-1.x is rather old and is in maintenance mode now. You can get all the branches from https://wiki.apache.org/hadoop/GitAndHadoop git clone git://git.apache.org/hadoop-common.git Pl

Re: HDFS / Federated HDFS - Doubts

2013-10-02 Thread Chris Mawata
One more thing, Krishna, when using JounalNodes as opposed to the native file system for the metadata storage you do get replication. Chris On 10/2/2013 12:52 AM, Krishna Kumaar Natarajan wrote: Hi All, While trying to understand federated HDFS in detail I had few doubts and listing them d

Re: HDFS / Federated HDFS - Doubts

2013-10-02 Thread Chris Mawata
Don't know about question 4 but for the first three -- the metadata is in the memory of the namenode at runtime but is also persisted to disk (otherwise it would be lost if you shut down and re-start the namenode). The copy persisted to disk is on the native file system (not HDFS) and no is not

Re: modify HDFS

2013-10-02 Thread Pradeep Gollakota
Since hadoop 3.0 is 2 major versions higher, it will be significantly different than working with hadoop 1.1.2. The hadoop-1.1 branch is available on SVN at http://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1/ On Tue, Oct 1, 2013 at 11:30 PM, Karim Awara wrote: > Hi all, > > My p