Re: How to compress MapFile programmatically

2013-08-11 Thread Harsh J
A MapFile isn't a directory. It is a directory _containing_ two files. You cannot "open" a directory for reading. The MapFile API is documented at http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/MapFile.html and thats what you're to be using for reading/writing them. Compression is

DefaultResourceCalculator class not found, ResourceManager fails to start.

2013-08-11 Thread Rob Blah
Hi I have a strange problem, regarding missing class, the DefaultResourceCalculator. I have a single node sandbox cluster working in a pseudo-distributed mode. The cluster was working fine yesterday, however today it stopped working. I was able to fix all issues except the following problem in Res

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-11 Thread Sathwik B P
Hi Harsh, Does it make any sense to keep the method in LRW still synchronized. Isn't it creating unnecessary overhead for non multi threaded implementations. regards, sathwik On Fri, Aug 9, 2013 at 7:16 AM, Harsh J wrote: > I suppose I should have been clearer. There's no problem out of box if

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-11 Thread Harsh J
Yes, I feel we could discuss this over a JIRA to remove it if it hurts perf. too much, but it would have to be a marked incompatible change, and we have to add a note about the lack of thread safety in the javadoc of base Mapper/Reducer classes. On Sun, Aug 11, 2013 at 1:26 PM, Sathwik B P wrote:

RE: How to compress MapFile programmatically

2013-08-11 Thread Abhijit Sarkar
Thanks Harsh. However, if I compress the MapFile using the MapFile.Writer Constructor option and then put it in a DistributedCache, how do I uncompress it in the Map/Reduce? There isn't any API method to do that apparently. Regards,Abhijit > From: ha...@cloudera.com > Date: Sun, 11 Aug 2013 12:5

Re: How to compress MapFile programmatically

2013-08-11 Thread Harsh J
A MapFile.Reader will automatically detect and decompress without needing to be told anything special. You needn't have to worry about decompressing files by yourself in Apache Hadoop generally - the framework handles it for you transparently if you're using the proper APIs. On Sun, Aug 11, 2013 a

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-11 Thread Niels Basjes
I expect the impact on the IO speed to be almost 0 because waiting for a single disk seek is longer than many thousands of calls to a synchronized method. Niels On Aug 11, 2013 3:00 PM, "Harsh J" wrote: > Yes, I feel we could discuss this over a JIRA to remove it if it hurts > perf. too much, bu

Discrepancy in the values of consumed disk space by hadoop

2013-08-11 Thread Yogini Gulkotwar
Hi All, I have a CDH4 hadoop cluster setup with 3 datanodes and a data replication factor of 2. When I try to check the consumed dfs space, I get different values using the "hdfs dfsadmin -report" and "hdfs fsck" command. Could anyone please help me understand the reason behind the discrepancy in

Re: Discrepancy in the values of consumed disk space by hadoop

2013-08-11 Thread Jitendra Yadav
Hi, I think you are referring DFS Used (from NameNode report) and Total size (from fsck) values right?. *DFS Used:* This contains the total hdfs space used on all the connected data nodes, in your case 230296610816 (214.48 GB). ** *Total Size:* Fsck utility looks for the blocks in namespace , it

Re: DefaultResourceCalculator class not found, ResourceManager fails to start.

2013-08-11 Thread Rob Blah
Hi again >From a little investigation I have performed I have observed the following. I assume the module responsible for this class is hadoop-yarn-common. During RM init it crashes since it is looking for a class DefaultResourceCalculator in org.apache.hadoop.yarn.server.resourcemanager.resource

Re: DefaultResourceCalculator class not found, ResourceManager fails to start.

2013-08-11 Thread Ted Yu
Can you check the config entry for yarn.scheduler.capacity.resource-calculator ? It should point to org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator bq. I was able to fix all issues What other issues came up ? Thanks On Sun, Aug 11, 2013 at 2:07 PM, Rob Blah wrote: > Hi again >

In 10-nodes cluster ,how many zookeeper quorum should i need?

2013-08-11 Thread ch huang
ATT

Re: In 10-nodes cluster ,how many zookeeper quorum should i need?

2013-08-11 Thread Bryan Beaudreault
In what context are you talking?  You'll probably be ok with 3 at that size. — Sent from iPhone On Sun, Aug 11, 2013 at 9:53 PM, ch huang wrote: > ATT

the options that used to tuning mapreduceV1 is still useful for YARN?

2013-08-11 Thread ch huang
or ,YARN framework has it's own tuning & optimization options?

Re: the options that used to tuning mapreduceV1 is still useful for YARN?

2013-08-11 Thread Harsh J
Yes, the deprecation is graceful and older MR1 properties would still work in 2.x at least, although you're recommended to switch over to the new parameters where warned by the Configuration class. On Mon, Aug 12, 2013 at 8:11 AM, ch huang wrote: > or ,YARN framework has it's own tuning & optimiz

Re: the options that used to tuning mapreduceV1 is still useful for YARN?

2013-08-11 Thread ch huang
and where i can find the list of deprecation parameter and new parameter ? On Mon, Aug 12, 2013 at 10:43 AM, Harsh J wrote: > Yes, the deprecation is graceful and older MR1 properties would still > work in 2.x at least, although you're recommended to switch over to > the new parameters where war

Re: the options that used to tuning mapreduceV1 is still useful for YARN?

2013-08-11 Thread Chris Embree
Steps to Hadoop 2.x documentation. 1. Realize reality, 2. Smoke 2-3 long joints, depending on tolerance levels 3. Review the code... 4. Allow the THC to take effect and view the code in a new light 5. Understand what the developers have said 6. Code mind beautiful patches to base code 7. crash 8.

RE: FileNotFoundException trying to uncompress local cache archive

2013-08-11 Thread Abhijit Sarkar
Can someone please advise? > From: abhijit.sar...@gmail.com > To: user@hadoop.apache.org > Subject: FileNotFoundException trying to uncompress local cache archive > Date: Sun, 11 Aug 2013 11:43:02 -0400 > > Hi, > As a learning exercise for myself, I'm receiving a simple text file URI as an > arg

Re: the options that used to tuning mapreduceV1 is still useful for YARN?

2013-08-11 Thread Ted Yu
In Configuration class, you should be able to find addDeprecation() methods. Below is the result of quick search where addDeprecation() is called. Configuration.addDeprecation("topology.script.file.name", Configuration.addDeprecation("topology.script.number.args", Configuration.addDep

Re: the options that used to tuning mapreduceV1 is still useful for YARN?

2013-08-11 Thread Harsh J
Hi, You can view them at http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/DeprecatedProperties.html for 2.x. On Mon, Aug 12, 2013 at 8:28 AM, ch huang wrote: > and where i can find the list of deprecation parameter and new parameter ? > > > On Mon, Aug 12, 2013 at 10:43 AM

Re: Discrepancy in the values of consumed disk space by hadoop

2013-08-11 Thread Yogini Gulkotwar
Thanks Jitendra. Thanks & Regards, *Yogini Gulkotwar* *Flutura Decision Sciences & Analytics, Bangalore* *Email*: yogini.gulkot...@flutura.com *Website*: www.fluturasolutions.com On Mon, Aug 12, 2013 at 1:31 AM, Jitendra Yadav wrote: > Hi, > > I think you are referring DFS Used (from NameNode r

Unable to load native-hadoop library for your platform

2013-08-11 Thread ??????
I use hadoop-2.0.5-alpha. Red Hat Enterprise Linux Server release 5.4??have the following warning?? 13/08/12 13:58:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable How To Resolve

How to import custom Python module in MapReduce job?

2013-08-11 Thread Andrei
(cross-posted from StackOverflow ) I have a MapReduce job defined in file *main.py*, which imports module lib from file *lib.py*. I use Hadoop Streaming to submit this jo