Re: How to reduce total shuffle time

2012-08-28 Thread Minh Duc Nguyen
Without knowing your exact workload, using a Combiner (if possible) as Tsuyoshi recommended should decrease your total shuffle time. You can also try compressing the map output so that there's less disk and network IO. Here's an example configuration using Snappy:

Re: doubt about reduce tasks and block writes

2012-08-24 Thread Minh Duc Nguyen
Marc, see my inline comments. On Fri, Aug 24, 2012 at 4:09 PM, Marc Sturlese marc.sturl...@gmail.comwrote: Hey there, I have a doubt about reduce tasks and block writes. Do a reduce task always first write to hdfs in the node where they it is placed? (and then these blocks would be

Re: Error:Hdfs Client for hadoop using native java api

2012-07-22 Thread Minh Duc Nguyen
As Shaswat mentioned previously, you're problem may be related to your configuration. Is core-site.xml on your classpath? For example, what is the value for conf.get(fs.default.name)? Alternatively, you can set this property directly in your code: conf.set(fs.default.name,

Re: Sqoop Issue

2012-06-26 Thread Minh Duc Nguyen
Akash, Instead of adding the connector jar to $HADOOP_HOME/lib, when running your map-reduce job using hadoop jar you can pass your connector jar using the -libjars flag. For example: hadoop jar hadoop-examples.jar wordcount -files cachefile.txt -libjars mylib.jar input output ~ Minh On

Re: Single disk failure (with HDFS-457 applied) causes data node to die

2012-06-21 Thread Minh Duc Nguyen
Peter, I believe that this will help you: https://ccp.cloudera.com/display/CDHDOC/CDH3+Deployment+on+a+Cluster DataNode Configuration By default, the failure of a single dfs.data.dir will cause the HDFS DataNode process to shut down, which results in the NameNode scheduling additional replicas

Re: Error: Too Many Fetch Failures

2012-06-19 Thread Minh Duc Nguyen
Take at look at slide 25: http://www.slideshare.net/cloudera/hadoop-troubleshooting-101-kate-ting-cloudera It describes a similar error so hopefully this will help you. ~ Minh On Tue, Jun 19, 2012 at 10:27 AM, Ellis H. Wilson III el...@cse.psu.edu wrote: Hi all, This is my first email to