Re: HADOOP_MAPRED_HOME not found!

2014-03-28 Thread Azuryy Yu
it was defined at hadoop-config.sh On Fri, Mar 28, 2014 at 1:19 PM, divye sheth divs.sh...@gmail.com wrote: Which version of hadoop are u using? AFAIK the hadoop mapred home is the directory where hadoop is installed or in other words untarred. Thanks Divye Sheth On Mar 28, 2014 10:43

Re: HADOOP_MAPRED_HOME not found!

2014-03-28 Thread Rahul Singh
Try adding the hadoop bin path to system path. -Rahul Singh On Fri, Mar 28, 2014 at 11:32 AM, Azuryy Yu azury...@gmail.com wrote: it was defined at hadoop-config.sh On Fri, Mar 28, 2014 at 1:19 PM, divye sheth divs.sh...@gmail.com wrote: Which version of hadoop are u using? AFAIK the

Re: HADOOP_MAPRED_HOME not found!

2014-03-28 Thread Avinash Kujur
we can execute the above command anywhere or do i need to execute it in any particular directory? thanks On Thu, Mar 27, 2014 at 11:41 PM, divye sheth divs.sh...@gmail.com wrote: I believe you are using Hadoop 2. In order to get the mapred working you need to set the HADOOP_MAPRED_HOME path

Re: HADOOP_MAPRED_HOME not found!

2014-03-28 Thread divye sheth
You can execute this command on any machine where you have set the HADOOP_MAPRED_HOME Thanks Divye Sheth On Fri, Mar 28, 2014 at 12:31 PM, Avinash Kujur avin...@gmail.com wrote: we can execute the above command anywhere or do i need to execute it in any particular directory? thanks On

Re: HADOOP_MAPRED_HOME not found!

2014-03-28 Thread Avinash Kujur
i am not getting where to set HADOOP_MAPRED_HOME and how to set. thanks On Fri, Mar 28, 2014 at 12:06 AM, divye sheth divs.sh...@gmail.com wrote: You can execute this command on any machine where you have set the HADOOP_MAPRED_HOME Thanks Divye Sheth On Fri, Mar 28, 2014 at 12:31 PM,

Re: How to get locations of blocks programmatically?

2014-03-28 Thread Harsh J
Yes, use http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path, long, long) On Fri, Mar 28, 2014 at 7:33 AM, Libo Yu yu_l...@hotmail.com wrote: Hi all, hadoop path fsck -files -block -locations can list locations for all

Re: mapred job -list error

2014-03-28 Thread Harsh J
Please also indicate your exact Hadoop version in use. On Fri, Mar 28, 2014 at 9:04 AM, haihong lu ung3...@gmail.com wrote: dear all: I had a problem today, when i executed the command mapred job -list on a slave, an error came out. show the message as below: 14/03/28 11:18:47

Re: HADOOP_MAPRED_HOME not found!

2014-03-28 Thread divye sheth
Hi Avinash, The export command you can execute on any one machine in the cluster as of now. Once you have executed the export command i.e. export HADOOP_MAPRED_HOME=/path/to/your/hadoop/installation you can then execute the mapred job -list command from that very same machine. Thanks Divye Sheth

Re: Maps stuck on Pending

2014-03-28 Thread Dieter De Witte
There's is a big chance that your map output is being copied to your reducer, this could take quite some time if you have a lot of data and could be resolved by: 1) having more reducers 2) adjust the slowstart parameter so that the copying can start while the map tasks are still running Regards,

when it's safe to read map-reduce result?

2014-03-28 Thread Li Li
I have a program that do some map-reduce job and then read the result of the job. I learned that hdfs is not strong consistent. when it's safe to read the result? as long as output/_SUCCESS exist?

Re: when it's safe to read map-reduce result?

2014-03-28 Thread Dieter De Witte
_SUCCES implies that the job has succesfully terminated, so this seems like a reasonable criterion. Regards, Dieter 2014-03-28 9:33 GMT+01:00 Li Li fancye...@gmail.com: I have a program that do some map-reduce job and then read the result of the job. I learned that hdfs is not strong

Re: when it's safe to read map-reduce result?

2014-03-28 Thread Li Li
thanks. is the following codes safe? int exitCode=ToolRunner.run() if(exitCode==0){ //safe to read result } On Fri, Mar 28, 2014 at 4:36 PM, Dieter De Witte drdwi...@gmail.com wrote: _SUCCES implies that the job has succesfully terminated, so this seems like a reasonable criterion.

How to run data node block scanner on data node in a cluster from a remote machine?

2014-03-28 Thread reena upadhyay
How to run data node block scanner on data node in a cluster from a remote machine? By default data node executes block scanner in 504 hours. This is the default value of dfs.datanode.scan.period . If I want to run the data node block scanner then one way is to configure the

How to run data node block scanner on data node in a cluster from a remote machine?

2014-03-28 Thread reena upadhyay
How to run data node block scanner on data node in a cluster from a remote machine? By default data node executes block scanner in 504 hours. This is the default value of dfs.datanode.scan.period . If I want to run the data node block scanner then one way is to configure the property of

Does hadoop depends on ecc memory to generate checksum for data stored in HDFS

2014-03-28 Thread reena upadhyay
To ensure data I/O integrity, hadoop uses CRC 32 mechanism to generate checksum for the data stored on hdfs . But suppose I have a data node machine that does not have ecc(error correcting code) type of memory, So will hadoop hdfs will be able to generate checksum for data blocks when

Re: How to run data node block scanner on data node in a cluster from a remote machine?

2014-03-28 Thread Harsh J
Hello Reena, No there isn't a programmatic way to invoke the block scanner. Note though that the property to control its period is DN-local, so you can change it on DNs and do a DN rolling restart to make it take effect without requiring a HDFS downtime. On Fri, Mar 28, 2014 at 3:07 PM, reena

Re: Does hadoop depends on ecc memory to generate checksum for data stored in HDFS

2014-03-28 Thread Harsh J
While the HDFS functionality of computing, storing and validating checksums for block files does not specifically _require_ ECC, you do _want_ ECC to avoid frequent checksum failures. This is noted in Tom's book as well, in the chapter that discusses setting up your own cluster: ECC memory is

How check sum are generated for blocks in data node

2014-03-28 Thread reena upadhyay
I was going through this link http://stackoverflow.com/questions/9406477/data-integrity-in-hdfs-which-data-nodes-verifies-the-checksum . Its written that in recent version of hadoop only the last data node verifies the checksum as the write happens in a pipeline fashion. Now I have a question:

how to be assignee ?

2014-03-28 Thread Avinash Kujur
hi, how can i be assignee fro a particular issue? i can't see any option for being assignee on the page. Thanks.

Re: YarnException: Unauthorized request to start container. This token is expired.

2014-03-28 Thread Leibnitz
no doubt Sent from my iPhone 6 On Mar 23, 2014, at 17:37, Fengyun RAO raofeng...@gmail.com wrote: What does this exception mean? I googled a lot, all the results tell me it's because the time is not synchronized between datanode and namenode. However, I checked all the servers, that the

Replication HDFS

2014-03-28 Thread Victor Belizário
Hey, I did look in HDFS for replication in filesystem master x slave. Have any way to do master x master? I just have 1 TB of files in a server and i want to replicate to another server, in real time sync. Thanks !

Hadoop documentation: control flow and FSM diagrams

2014-03-28 Thread Emilio Coppa
Hi All, I have created a wiki on github: https://github.com/ercoppa/HadoopDiagrams/wiki This is an effort to provide an updated documentation of how the internals of Hadoop work. The main idea is to help the user understand the big picture without removing too much internal details. You can

RE: R on hadoop

2014-03-28 Thread Martin, Nick
If you’re spitballing options might also look at Pattern http://www.cascading.org/projects/pattern/ Has some nuances so be sure to spend the time to vet your specific use case (i.e. what you’re actually doing in R and what you want to accomplish leveraging data in Hadoop). From: Sri

Re: Replication HDFS

2014-03-28 Thread Serge Blazhievsky
You mean replication between two different hadoop cluster or you just need data to be replicated between two different nodes? Sent from my iPhone On Mar 28, 2014, at 8:10 AM, Victor Belizário victor_beliza...@hotmail.com wrote: Hey, I did look in HDFS for replication in filesystem

Re: Why is HDFS_BYTES_WRITTEN is much larger than HDFS_BYTES_READ in this case?

2014-03-28 Thread Hardik Pandya
what is your compression format gzip, lzo or snappy for lzo final output FileOutputFormat.setCompressOutput(conf, true); FileOutputFormat.setOutputCompressorClass(conf, LzoCodec.class); In addition, to make LZO splittable, you need to make a LZO index file. On Thu, Mar 27, 2014 at 8:57 PM,

Re: How to get locations of blocks programmatically?

2014-03-28 Thread Hardik Pandya
have you looked into FileSystem API this is hadoop v2.2.0 http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/fs/FileSystem.html does not exist in http://hadoop.apache.org/docs/r1.2.0/api/org/apache/hadoop/fs/FileSystem.html

Re: reducing HDFS FS connection timeouts

2014-03-28 Thread Hardik Pandya
how about adding ipc.client.connect.max.retries.on.timeouts *2 (default is 45)*Indicates the number of retries a client will make on socket timeout to establish a server connection. does that help? On Thu, Mar 27, 2014 at 4:23 PM, John Lilley john.lil...@redpoint.netwrote: It seems to take a

Re: how to be assignee ?

2014-03-28 Thread Azuryy Yu
Hi Avin, You should be added as an sub-project's contributor, then you can be an assignee. so you can find how to be an contributor on the Wiki. On Fri, Mar 28, 2014 at 6:50 PM, Avinash Kujur avin...@gmail.com wrote: hi, how can i be assignee fro a particular issue? i can't see any option

Re: Hadoop documentation: control flow and FSM diagrams

2014-03-28 Thread Hardik Pandya
Very helpful indeed Emillio, thanks! On Fri, Mar 28, 2014 at 12:58 PM, Emilio Coppa erco...@gmail.com wrote: Hi All, I have created a wiki on github: https://github.com/ercoppa/HadoopDiagrams/wiki This is an effort to provide an updated documentation of how the internals of Hadoop work.

Re: when it's safe to read map-reduce result?

2014-03-28 Thread Hardik Pandya
if the job complets without any failures exitCode should be 0 and safe to read the result public class MyApp extends Configured implements Tool { public int run(String[] args) throws Exception { // Configuration processed by ToolRunner Configuration conf = getConf();

Re: Replication HDFS

2014-03-28 Thread Wellington Chevreuil
Hi Victor, if by replication you mean copy from one cluster to other, you can use the distcp command. Cheers. On 28 Mar 2014, at 16:30, Serge Blazhievsky hadoop...@gmail.com wrote: You mean replication between two different hadoop cluster or you just need data to be replicated between two

Re: How check sum are generated for blocks in data node

2014-03-28 Thread Wellington Chevreuil
Hi Reena, the pipeline is per block. If you have half of your file in data node A only, that means the pipeline had only one node (node A, in this case, probably because replication factor is set to 1) and then, data node A has the checksums for its block. The same applies to data node B.

How to find generated mapreduce code for pig/hive query

2014-03-28 Thread Spark Storm
hello experts, am really new to hadoop - Is it possible to find out based on pig or hive query to find out under the hood map reduce algorithm?? thanks

Re: How to find generated mapreduce code for pig/hive query

2014-03-28 Thread Shahab Yunus
You can use ILLUSTRATE and EXPLAIN commands to see the execution plan, if you mean that by 'under the hood algorithm' http://pig.apache.org/docs/r0.11.1/test.html Regards, Shahab On Fri, Mar 28, 2014 at 5:51 PM, Spark Storm using.had...@gmail.com wrote: hello experts, am really new to

Re: Need help get the hadoop cluster started in EC2

2014-03-28 Thread Yusaku Sako
Hi Max, Not sure if you have already, but you might also want to look into Apache Ambari [1] for provisioning, managing, and monitoring Hadoop clusters. Many have successfully deployed Hadoop clusters on EC2 using Ambari. [1] http://ambari.apache.org/ Yusaku On Fri, Mar 28, 2014 at 7:07 PM,