Re: Replace a block with a new one

2014-07-18 Thread Shumin Guo
That will break the consistency of the file system, but it doesn't hurt to try. On Jul 17, 2014 8:48 PM, Zesheng Wu wuzeshen...@gmail.com wrote: How about write a new block with new checksum file, and replace the old block file and checksum file both? 2014-07-17 19:34 GMT+08:00 Wellington

Re: how to save dead node when it's disk is full?

2014-05-28 Thread Shumin Guo
If the datanode is dead, block replications will be re-replicated on other datanodes automatically. So, after a while, when those blocks have been re-replicated, you should be able to delete the data on the dead nodes and restart it. Here, I am assuming your cluster have enough space on other live

Re: Strange error in Hadoop 2.2.0: FileNotFoundException: file:/tmp/hadoop-hadoop/mapred/

2014-04-22 Thread Shumin Guo
Can you list the file using hadoop commands? for example, hadoop fs -ls ...? On Tue, Apr 22, 2014 at 10:32 AM, Natalia Connolly natalia.v.conno...@gmail.com wrote: Hi Jay, I am really not sure how to answer this question. Here is the full error: 14/04/22 11:31:02 INFO

Re: analyzing s3 data

2014-04-22 Thread Shumin Guo
You can configure your hadoop cluster to use s3 as the file system. Everything else should be same as for HDFS. On Mon, Apr 21, 2014 at 7:21 AM, kishore alajangi alajangikish...@gmail.com wrote: Hi Experts, We are running four node cluster which is installed cdh4.5 with cm4.8, We have

Re: All datanodes are bad. Aborting ...

2014-04-22 Thread Shumin Guo
Did you do fsck? And what's the result? On Sun, Apr 20, 2014 at 12:14 PM, Amit Kabra amitkabrai...@gmail.comwrote: 1) ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f)

Re: Stuck Job - how should I troubleshoot?

2014-04-22 Thread Shumin Guo
As the last map task is in pending state, it is possible that some issue is happening within your cluster, for example, not enough memory, deadlock, data problem etc. You can kill this map task manually, and see if the problem can be solved. On Sun, Apr 20, 2014 at 9:46 AM, Serge Blazhievsky

Re: Problem with jobtracker hadoop 1.2

2014-04-22 Thread Shumin Guo
It seems you are using the local FS rather than HDFS. You need to make sure your hdfs cluster is up and running. On Thu, Apr 17, 2014 at 6:42 PM, Shengjun Xin s...@gopivotal.com wrote: Did you start datanode service? On Thu, Apr 17, 2014 at 9:23 PM, Karim Awara

Re: Task or job tracker seems not working?

2014-04-22 Thread Shumin Guo
The error message tells that you are using local FS rather than HDFS. So, you need to make sure your HDFS cluster is up and running before running any mapreduce jobs. For example, you can use fsck or other hdfs commands to test if the HDFS cluster is running ok. On Thu, Apr 17, 2014 at 8:51 AM,

Re: Hadoop 2.3.0 API error

2014-04-02 Thread Shumin Guo
what is your input data like? On Apr 2, 2014 10:16 AM, ei09072 ei09...@fe.up.pt wrote: After installing Hadoop 2.3.0 on windows 8, I tried to run the wordcount example given. However I get the following error: c:\hadoopbin\yarn jar share/hadoop/mapreduce/hadoop- mapreduce-examples-2.3.0.ja

Call for big data proposals

2014-03-26 Thread Shumin Guo
Dear subscribers, My name is Shumin Guo. I am the author of the Hadoop management book Hadoop Operations and Cluster Management Cookbookhttp://www.amazon.com/Hadoop-Operations-Cluster-Management-Cookbook/dp/1782165169/ref=sr_1_1?ie=UTF8qid=1395873783sr=8-1keywords=hadoop+operations+and+cluster

Re: Performance

2014-02-25 Thread Shumin Guo
If you want to do profiling on your hadoop cluster, the starfish project might be interesting. You can find more info http://www.cs.duke.edu/starfish/ Thanks, Shumin On Feb 25, 2014 3:31 PM, Thomas Bentsen t...@bentzn.com wrote: Thanks a lot guys! From Dieters original reply I got TeraSort

Re: Wrong FS hdfs:/localhost:9000 ;expected file///

2014-02-25 Thread Shumin Guo
The value should be hdfs:///localhost:port On Feb 24, 2014 6:37 AM, Chirag Dewan chirag.de...@ericsson.com wrote: Hi All, I am new to hadoop. I am using hadoop 2.2.0. I have a simple client code which reads a file from HDFS on a single node cluster. Now when I run my code using java -jar

RE: Reading a file in a customized way

2014-02-25 Thread Shumin Guo
You can extend the fileinputformat and set splittable to be false. More info is in the java doc: https://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapred/FileInputFormat.html Shumin On Feb 25, 2014 10:56 AM, java8964 java8...@hotmail.com wrote: See my reply for another email today for

Re: Hadoop property precedence

2013-07-13 Thread Shumin Guo
I Think the client side configuration will take effect. Shumin On Jul 12, 2013 11:50 AM, Shalish VJ shalis...@yahoo.com wrote: Hi, Suppose block size set in configuration file at client side is 64MB, block size set in configuration file at name node side is 128MB and block size set in

Re: Splitting input file - increasing number of mappers

2013-07-06 Thread Shumin Guo
You also need to pay attention to the split boundary, because you don’t want to split one line to different mappers. May be you can think about multi-line input format. Simon. On Jul 6, 2013 10:18 AM, Sanjay Subramanian sanjay.subraman...@wizecommerce.com wrote: More mappers will make it

Re: basic question about rack awareness and computation migration

2013-03-07 Thread Shumin Guo
Yes, I agree with Bertrand. Hadoop can take a whole file as input and you just put your compression code into the map method, and use the identity reduce function that simply writes your compressed data on to HDFS by using the file output format. Thanks, On Thu, Mar 7, 2013 at 7:35 AM, Bertrand

Re: each stage's time in hadoop

2013-03-06 Thread Shumin Guo
You can also try the following two commands: 1, hadoop job -status job-id For example: hadoop job -status job_201303021057_0004 I will get the following output: Job: job_201303021057_0004 file: hdfs://master:54310/user/ec2-user/.staging/job_201303021057_0004/job.xml tracking URL:

Re: store file gives exception

2013-03-06 Thread Shumin Guo
Nitin is right. The hadoop Job tracker will schedule a job based on the data block location and the computing power of the node. Based on the number of data blocks, the job tracker will split a job into map tasks. Optimally, map tasks should be scheduled on nodes with local data. And also because

Re: S3N copy creating recursive folders

2013-03-06 Thread Shumin Guo
I used to have similar problem. Looks like there is a recursive folder creation bug. How about you try remove the srcData from the dst, for example use the following command: *hadoop fs -cp s3n://acessKey:acesssec...@bucket.name/srcData /test/* Or with distcp: *hadoop distcp

Re: Issue: Namenode is in safe mode

2013-03-06 Thread Shumin Guo
To decommission a live datanode from the cluster, you can do the following steps: 1, edit configuration file $HADOOP_HOME/conf/hdfs-site.xml, and add the following property: property namedfs.hosts.exclude/name value$HADOOP_HOME/conf/dfs-exclude.txt/value /property 2, put the host name of the

Re: For Hadoop 2.0.3; setting CLASSPATH=$(hadoop classpath) does not work, as opposed to 1.x versions

2013-03-06 Thread Shumin Guo
You can always print out the hadoop classpath before running the hadoop command, for example by editing the $HADOOP_HOME/bin/hadoop file. HTH. On Wed, Mar 6, 2013 at 5:01 AM, shubhangi shubhangi.g...@oracle.com wrote: Hi All, I am writing an application in c++, which uses API provided by

Re: Execution handover in map/reduce pipeline

2013-03-06 Thread Shumin Guo
Oozie for mapreduce job flow management can be a good choice. It can be too heavy weight for your problem. Based on your description. I am simply assuming that you are processing some static data files, for example, the files will not change on the way of processing, and there are no