Re: Child error

2013-03-12 Thread George Datskos
Leo That JIRA says "fix version=1.0.4" but it is not correct. The real JIRA is MAPREDUCE-2374. The actual fix version for this bug 1.1.2 George or https://issues.apache.org/jira/browse/MAPREDUCE-4857 Which is fixed in 1.0.4 *From:*Amit Sela [mailto:am...@infolinks.com] *Sent:* Tuesday,

Re: access hadoop cluster from ubuntu on laptop

2013-03-12 Thread George Datskos
Dan, If you aren't using kerberos security, you can use the HADOOP_USER_NAME env variable $ HADOOP_USER_NAME=hdfs hadoop fs -touchz /abc George I'd like to access hadoop cluster from my laptop (through ubuntu). I put configuration files under /etc/hadoop/cluster/conf and set up env varia

Re: S3N copy creating recursive folders

2013-03-06 Thread George Datskos
Subroto and Shumin Try adding a slash to to the s3n source: - hadoop fs -cp s3n://acessKey:acesssec...@bucket.name/srcData" /test/srcData + hadoop fs -cp s3n://acessKey:acesssec...@bucket.name/srcData/" /test/srcData Without the slash, it will keep listing "srcData" each time it is listed,

Re: some ideas for QJM and NFS

2013-02-17 Thread George Datskos
Hi Azuryy, So you have measurements for hadoop-1.0.4 and hadoop-2.0.3+QJM, but I think you should also measure hadoop-2.0.3 _wihout_ QJM so you can know for sure if the performance degrade is actually related to QJM or not. George Hi, HarshJ is a good guy, I've seen this JIRA: https://i

Re: Question related to Decompressor interface

2013-02-12 Thread George Datskos
Hello Can someone share some idea what the Hadoop source code of class org.apache.hadoop.io.compress.BlockDecompressorStream, method rawReadInt() is trying to do here? The BlockDecompressorStream class is used for block-based decompression (e.g. snappy). Each chunk has a header indicating h

Re: xcievers

2013-02-07 Thread George Datskos
Patai, I am still curious, how do we monitor the consumption of this value in each datanode. You can use the getDataNodeStats() method of your your DistributedFileSystem instance. It returns an array of DatanodeInfo which contains, among other things, the xceiver count that you are looking

Re: How to submit Tool jobs programatically in parallel?

2012-12-13 Thread George Datskos
Dave, DistCp needs to be blocking (it intentionally uses runJob instead of the asynchronous submitJob). After the job completes it needs to "finalize" permissions and other attributes (see the tools.DistCp.finalize method). If you need to run multiple distcp's in parallel, I'd go with your

Re: How to get the number of node in cluster?

2012-09-25 Thread George Datskos
Hey Jason, This should return the number of active tasktrackers in the cluster: int numNodes = new JobClient(conf).getClusterStatus().getTaskTrackers() Hi, all As I have to run my MapReduce program on clusters of different size, and I would like the reducer number adapt to (0.95 * NodeNo. *

Re: what happens when a datanode rejoins?

2012-09-11 Thread George Datskos
Mehul, Let me make an addition. Some of the blocks it was managing are deleted/modified? Blocks that are deleted in the interim will deleted on the rejoining node as well, after it rejoins . Regarding the "modified," I'd advise against modifying blocks after they have been fully written.

Re: what happens when a datanode rejoins?

2012-09-11 Thread George Datskos
Hi Mehul Some of the blocks it was managing are deleted/modified? The namenode will asynchronously replicate the blocks to other datanodes in order to maintain the replication factor after a datanode has not been in contact for 10 minutes. The size of the blocks are now modified say from