Re: Question about DataNode

2014-02-27 Thread Juan Carlos
Hi Edward, maybe you are sending your request to the master from the slave. I don't are sure, but I think that secondary never answer any request, neither read request, and you have to modify your config files by hand to change your slave to be master. I haven't tested so much with master/slave

Re:

2014-02-27 Thread Banty Sharma
yes bro :) On Thu, Feb 27, 2014 at 12:34 PM, Avinash Kujur avin...@gmail.com wrote: Hi, can i solve the hadoop issues in https://koding.com/. ?

What if file format is dependent upon first few lines?

2014-02-27 Thread Fengyun RAO
Below is a fake sample of Microsoft IIS log: #Software: Microsoft Internet Information Services 7.5 #Version: 1.0 #Date: 2013-07-04 20:00:00 #Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) sc-status sc-substatus sc-win32-status time-taken

Re: Question about DataNode

2014-02-27 Thread Bertrand Dechoux
I am not sure what your question is. You might want to be more explicit and read more about Hadoop architecture and the roles of the various daemons. A directory is only metadata so you can mess around with DataNodes if you want but they are not involved. Regards Bertrand Bertrand Dechoux On

Multiple inputs for different avro inputs

2014-02-27 Thread AnilKumar B
Hi, I am using MultipleInputs to read two different avro inputs with different schemas. But in run method, as we need to specify the AvroJob.setInputKeySchema(job,schema), Which schema I need to set? I tried as below ListSchema schemas = new ArrayListSchema();

RE: What if file format is dependent upon first few lines?

2014-02-27 Thread java8964
If the file is big enough and you want to split them for parallel processing, then maybe one option could be that in your mapper, you can always get the full file path from the InputSplit, then open it (The file path, which means you can read from the the beginning), read the first 4 lines,

Re: What if file format is dependent upon first few lines?

2014-02-27 Thread Harsh J
A mapper's record reader implementation need not be restricted to strictly only the input split boundary. It is a loose relationship - you can always seek(0), read the lines you need to prepare, then seek(offset) and continue reading. Apache Avro (http://avro.apache.org) has a similar format -

RE: Multiple inputs for different avro inputs

2014-02-27 Thread java8964
Using the Union schema is correct, which should be able to support multi schemas input. One question is that why you setInputKeySchema? Does your job load the Avro data as the key to the following Mapper? Yong Date: Thu, 27 Feb 2014 16:13:34 +0530 Subject: Multiple inputs for different avro

Re: CapacityScheduler and FairScheduler

2014-02-27 Thread Harsh J
Hi, While they do have similarities, they are whole different implementations:

Re: Multiple inputs for different avro inputs

2014-02-27 Thread Harsh J
One more doubt: Why we don't have AvroMultipleInputs just like AvroMultipleOutputs? Any reason? This (and other) question belongs to Apache Avro's list (u...@avro.apache.org). Moving user@hadoop to bcc:. For AvroMultipleInputs, see https://issues.apache.org/jira/browse/AVRO-1439 On Thu, Feb

Re: CapacityScheduler and FairScheduler

2014-02-27 Thread Juan Carlos
ye I know they have different implementations, what I wanted to point was about features. Are there any feature in CapacityScheduler missing in FairScheduler? AFAIK it's possible to configure a FairScheduler to do exactly the same as capacity and more, in this case I would see CapacityScheduler as

Re: Multiple inputs for different avro inputs

2014-02-27 Thread AnilKumar B
Hi Yong, One question is that why you setInputKeySchema? Does your job load the Avro data as the key to the following Mapper? Yes, I am loading data as a key. Thanks Regards, B Anil Kumar. On Thu, Feb 27, 2014 at 7:57 PM, java8964 java8...@hotmail.com wrote: Using the Union schema is

Hadoop FileCrush

2014-02-27 Thread Devin Suiter RDX
Hi, Has anyone used Hadoop Filecrush? http://www.jointhegrid.com/hadoop_filecrush/ I was just curious about the reliability and integrity of it. It seems like a nice concept. But, if it is a nice concept and trustworthy, it should be looked at for incorporating under BigTop, one would think.

RM AM_RESYNC signal to AM

2014-02-27 Thread Gaurav Gupta
Hi, I killed the node manager on the node where AM was running and the AM master got the AM_RESYNC command signal from RM. I have following questions 1. In what all scenarios does the RM sends AM_RESYNC signal to AM? 2. Should the RM not send the AM_SHUTDOWN signal to AM when node

how to feed sampled data into each mapper

2014-02-27 Thread qiaoresearcher
Assume there is one large data set with size 100G on hdfs, how can we control that every data set sent to each mapper is around 10% or original data (or 10G) and each 10% is random sampled from the 100G data set? Do we have any example sample code doing this? Regards,

Re: how to feed sampled data into each mapper

2014-02-27 Thread Hadoop User
Try changing split size in the driver code. Mapreduce split size properties Sent from my iPhone On Feb 27, 2014, at 11:20 AM, qiaoresearcher qiaoresearc...@gmail.com wrote: Assume there is one large data set with size 100G on hdfs, how can we control that every data set sent to each

Re: Re: hadoop Exception: java.io.IOException: Couldn't set up IO streams

2014-02-27 Thread leiwang...@gmail.com
Thanks, it works after increase the ulimit number. leiwang...@gmail.com From: shashwat shriparv Date: 2014-02-27 12:43 To: user CC: leiwangouc Subject: Re: hadoop Exception: java.io.IOException: Couldn't set up IO streams ​Try to increase ulimit for the machine and the user under which

Re: What if file format is dependent upon first few lines?

2014-02-27 Thread Fengyun RAO
thanks, Harsh. could you specify more detail, or give some links or an example where I can start? 2014-02-27 22:17 GMT+08:00 Harsh J ha...@cloudera.com: A mapper's record reader implementation need not be restricted to strictly only the input split boundary. It is a loose relationship -

Re: how to feed sampled data into each mapper

2014-02-27 Thread qiaoresearcher
thanks, i think what you suggest is to just divide the large file into various splits and each split is about 10G, but how to make this 10G is 'random sampled' from the original large data set? On Thu, Feb 27, 2014 at 7:40 PM, Hadoop User hadoopus...@gmail.com wrote: Try changing split size

Re: What if file format is dependent upon first few lines?

2014-02-27 Thread Jay Vyas
-- method 1 -- You could, i think, just extend fileinputformat, with isSplittable = false. Then each file wont be brokeen up into separate blocks, and processed as a whole per mapper. This is probably the easiest thing to do but if you have huge files, it wont perform very well. -- method 2 --

[no subject]

2014-02-27 Thread Avinash Kujur
i am new for hadoop. what are the issues i should start working with. i need some proper guidance. it will be helpful for me if someone will share his/her experience with me. i need to go through the code which fixed some issue. please help me.

Re:

2014-02-27 Thread Ted Yu
You can start from here: http://wiki.apache.org/hadoop/HowToContribute See this prior response: http://search-hadoop.com/m/FZpRqM7Jsc Cheers On Thu, Feb 27, 2014 at 9:05 PM, Avinash Kujur avin...@gmail.com wrote: i am new for hadoop. what are the issues i should start working with. i need

Newbie, any tutorial for install hadoop 2.3 with proper linux version

2014-02-27 Thread Alex Lee
Hello, I am quite a newbie here. And want to setup hadoop 2.3 on 4 new PCs. Later may add more PCs into it. Is there any tutorial I can learn from, such as the which linux version I should use, how to setup the linux, and how to install the hadoop step by step. I am trying to setup

HBase Exception: org.apache.hadoop.hbase.UnknownRowLockException

2014-02-27 Thread Shailesh Samudrala
I'm running a sample code I wrote to test HBase lockRow() and unlockRow() methods. The sample code is below: HTable table = new HTable(config, test); RowLock rowLock = table.lockRow(Bytes.toBytes(row)); System.out.println(Obtained rowlock on + row + \nRowLock: + rowLock); Put p = new

RE: RM AM_RESYNC signal to AM

2014-02-27 Thread Rohith Sharma K S
Hi Gaurav If NodeManage is killed, then containers running on this NM won't be killed immediately. RM holds node information for 10 minutes(default node expiry). Possibly there should be 1. After 10 minutes , container is killed. 2. NM is killed and restarted before 10 minutes.

Re: HBase Exception: org.apache.hadoop.hbase.UnknownRowLockException

2014-02-27 Thread Ted Yu
You're using 0.94, right ? RowLock has been dropped since 0.96.0 Can you tell us more about your use case ? On Thu, Feb 27, 2014 at 9:56 PM, Shailesh Samudrala shailesh2...@gmail.comwrote: I'm running a sample code I wrote to test HBase lockRow() and unlockRow() methods. The sample code

Re: Newbie, any tutorial for install hadoop 2.3 with proper linux version

2014-02-27 Thread Zhijie Shen
This is the link about cluster setup: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html - Zhijie On Thu, Feb 27, 2014 at 9:41 PM, Alex Lee eliy...@hotmail.com wrote: Hello, I am quite a newbie here. And want to setup hadoop 2.3 on 4 new PCs. Later

Adding a New Host to the cluster using CM explained

2014-02-27 Thread VJ Shalish
Hi, Adding a New Host to the cluster using CM explained with screenshots: http://shalishvj.wordpress.com/2014/02/27/a-hadoop-cluster-home-adding-a-new-host-to-the-cluster/ Thank Shalish.

RE: Newbie, any tutorial for install hadoop 2.3 with proper linux version

2014-02-27 Thread Alex Lee
Hi Zhi Jie, Thanks, I am going through it. But may need to select a linux os first. Any suggesiton. Alex Date: Thu, 27 Feb 2014 22:29:56 -0800 Subject: Re: Newbie, any tutorial for install hadoop 2.3 with proper linux version From: zs...@hortonworks.com To: user@hadoop.apache.org This is

Re: Newbie, any tutorial for install hadoop 2.3 with proper linux version

2014-02-27 Thread shashwat shriparv
If you want to go for license free you can go for either Ubuntu or CentOS *Warm Regards_**∞_* * Shashwat Shriparv* [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9[image: https://twitter.com/shriparv]

Re: Newbie, any tutorial for install hadoop 2.3 with proper linux version

2014-02-27 Thread shashwat shriparv
These links you can follow http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.0.2/bk_installing_manually_book/content/rpm_chap3.html