Re: HADOOP_MAPRED_HOME not found!

2014-03-28 Thread Avinash Kujur
we can execute the above command anywhere or do i need to execute it in any particular directory? thanks On Thu, Mar 27, 2014 at 11:41 PM, divye sheth wrote: > I believe you are using Hadoop 2. In order to get the mapred working you > need to set the HADOOP_MAPRED_HOME path in either your /etc

Re: HADOOP_MAPRED_HOME not found!

2014-03-28 Thread divye sheth
You can execute this command on any machine where you have set the HADOOP_MAPRED_HOME Thanks Divye Sheth On Fri, Mar 28, 2014 at 12:31 PM, Avinash Kujur wrote: > we can execute the above command anywhere or do i need to execute it in > any particular directory? > > thanks > > > On Thu, Mar 27,

Re: HADOOP_MAPRED_HOME not found!

2014-03-28 Thread Avinash Kujur
i am not getting where to set HADOOP_MAPRED_HOME and how to set. thanks On Fri, Mar 28, 2014 at 12:06 AM, divye sheth wrote: > You can execute this command on any machine where you have set the > HADOOP_MAPRED_HOME > > Thanks > Divye Sheth > > > On Fri, Mar 28, 2014 at 12:31 PM, Avinash Kujur

Re: How to get locations of blocks programmatically?

2014-03-28 Thread Harsh J
Yes, use http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path, long, long) On Fri, Mar 28, 2014 at 7:33 AM, Libo Yu wrote: > Hi all, > > "hadoop path fsck -files -block -locations" can list locations for all > blocks in th

Re: mapred job -list error

2014-03-28 Thread Harsh J
Please also indicate your exact Hadoop version in use. On Fri, Mar 28, 2014 at 9:04 AM, haihong lu wrote: > dear all: > > I had a problem today, when i executed the command "mapred job > -list" on a slave, an error came out. show the message as below: > > 14/03/28 11:18:47 INFO Config

Re: HADOOP_MAPRED_HOME not found!

2014-03-28 Thread divye sheth
Hi Avinash, The export command you can execute on any one machine in the cluster as of now. Once you have executed the export command i.e. export HADOOP_MAPRED_HOME=/path/to/your/hadoop/installation you can then execute the mapred job -list command from that very same machine. Thanks Divye Sheth

Re: Maps stuck on Pending

2014-03-28 Thread Dieter De Witte
There's is a big chance that your map output is being copied to your reducer, this could take quite some time if you have a lot of data and could be resolved by: 1) having more reducers 2) adjust the slowstart parameter so that the copying can start while the map tasks are still running Regards,

when it's safe to read map-reduce result?

2014-03-28 Thread Li Li
I have a program that do some map-reduce job and then read the result of the job. I learned that hdfs is not strong consistent. when it's safe to read the result? as long as output/_SUCCESS exist?

Re: when it's safe to read map-reduce result?

2014-03-28 Thread Dieter De Witte
_SUCCES implies that the job has succesfully terminated, so this seems like a reasonable criterion. Regards, Dieter 2014-03-28 9:33 GMT+01:00 Li Li : > I have a program that do some map-reduce job and then read the result > of the job. > I learned that hdfs is not strong consistent. when it's s

Re: when it's safe to read map-reduce result?

2014-03-28 Thread Li Li
thanks. is the following codes safe? int exitCode=ToolRunner.run() if(exitCode==0){ //safe to read result } On Fri, Mar 28, 2014 at 4:36 PM, Dieter De Witte wrote: > _SUCCES implies that the job has succesfully terminated, so this seems like > a reasonable criterion. > > Regards, Dieter > > >

How to run data node block scanner on data node in a cluster from a remote machine?

2014-03-28 Thread reena upadhyay
How to run data node block scanner on data node in a cluster from a remote machine? By default data node executes block scanner in 504 hours. This is the default value of dfs.datanode.scan.period . If I want to run the data node block scanner then one way is to configure the propert

How to run data node block scanner on data node in a cluster from a remote machine?

2014-03-28 Thread reena upadhyay
How to run data node block scanner on data node in a cluster from a remote machine? By default data node executes block scanner in 504 hours. This is the default value of dfs.datanode.scan.period . If I want to run the data node block scanner then one way is to configure the property of dfs.

Does hadoop depends on ecc memory to generate checksum for data stored in HDFS

2014-03-28 Thread reena upadhyay
To ensure data I/O integrity, hadoop uses CRC 32 mechanism to generate checksum for the data stored on hdfs . But suppose I have a data node machine that does not have ecc(error correcting code) type of memory, So will hadoop hdfs will be able to generate checksum for data blocks when read/wri

Re: How to run data node block scanner on data node in a cluster from a remote machine?

2014-03-28 Thread Harsh J
Hello Reena, No there isn't a programmatic way to invoke the block scanner. Note though that the property to control its period is DN-local, so you can change it on DNs and do a DN rolling restart to make it take effect without requiring a HDFS downtime. On Fri, Mar 28, 2014 at 3:07 PM, reena upa

Re: Does hadoop depends on ecc memory to generate checksum for data stored in HDFS

2014-03-28 Thread Harsh J
While the HDFS functionality of computing, storing and validating checksums for block files does not specifically _require_ ECC, you do _want_ ECC to avoid frequent checksum failures. This is noted in Tom's book as well, in the chapter that discusses setting up your own cluster: "ECC memory is str

How check sum are generated for blocks in data node

2014-03-28 Thread reena upadhyay
I was going through this link http://stackoverflow.com/questions/9406477/data-integrity-in-hdfs-which-data-nodes-verifies-the-checksum . Its written that in recent version of hadoop only the last data node verifies the checksum as the write happens in a pipeline fashion. Now I have a question:

how to be assignee ?

2014-03-28 Thread Avinash Kujur
hi, how can i be assignee fro a particular issue? i can't see any option for being assignee on the page. Thanks.

Re: YarnException: Unauthorized request to start container. This token is expired.

2014-03-28 Thread Leibnitz
no doubt Sent from my iPhone 6 > On Mar 23, 2014, at 17:37, Fengyun RAO wrote: > > What does this exception mean? I googled a lot, all the results tell me it's > because the time is not synchronized between datanode and namenode. > However, I checked all the servers, that the ntpd service is o

Replication HDFS

2014-03-28 Thread Victor Belizário
Hey, I did look in HDFS for replication in filesystem master x slave. Have any way to do master x master? I just have 1 TB of files in a server and i want to replicate to another server, in real time sync. Thanks !

Hadoop documentation: control flow and FSM diagrams

2014-03-28 Thread Emilio Coppa
Hi All, I have created a wiki on github: https://github.com/ercoppa/HadoopDiagrams/wiki This is an effort to provide an updated documentation of how the internals of Hadoop work. The main idea is to help the user understand the "big picture" without removing too much internal details. You can f

RE: R on hadoop

2014-03-28 Thread Martin, Nick
If you’re spitballing options might also look at Pattern http://www.cascading.org/projects/pattern/ Has some nuances so be sure to spend the time to vet your specific use case (i.e. what you’re actually doing in R and what you want to accomplish leveraging data in Hadoop). From: Sri [mailto:ha

Re: Replication HDFS

2014-03-28 Thread Serge Blazhievsky
You mean replication between two different hadoop cluster or you just need data to be replicated between two different nodes? Sent from my iPhone > On Mar 28, 2014, at 8:10 AM, Victor Belizário > wrote: > > Hey, > > I did look in HDFS for replication in filesystem master x slave. > > Have

Re: Why is HDFS_BYTES_WRITTEN is much larger than HDFS_BYTES_READ in this case?

2014-03-28 Thread Hardik Pandya
what is your compression format gzip, lzo or snappy for lzo final output FileOutputFormat.setCompressOutput(conf, true); FileOutputFormat.setOutputCompressorClass(conf, LzoCodec.class); In addition, to make LZO splittable, you need to make a LZO index file. On Thu, Mar 27, 2014 at 8:57 PM, Kim

Re: How to get locations of blocks programmatically?

2014-03-28 Thread Hardik Pandya
have you looked into FileSystem API this is hadoop v2.2.0 http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/fs/FileSystem.html does not exist in http://hadoop.apache.org/docs/r1.2.0/api/org/apache/hadoop/fs/FileSystem.html org.apache.hadoop.fs.RemoteIteratorhttp://hadoop.apache.org/do

Re: reducing HDFS FS connection timeouts

2014-03-28 Thread Hardik Pandya
how about adding ipc.client.connect.max.retries.on.timeouts *2 (default is 45)*Indicates the number of retries a client will make on socket timeout to establish a server connection. does that help? On Thu, Mar 27, 2014 at 4:23 PM, John Lilley wrote: > It seems to take a very long time to timeo

Re: how to be assignee ?

2014-03-28 Thread Azuryy Yu
Hi Avin, You should be added as an sub-project's contributor, then you can be an assignee. so you can find how to be an contributor on the Wiki. On Fri, Mar 28, 2014 at 6:50 PM, Avinash Kujur wrote: > hi, > > how can i be assignee fro a particular issue? > i can't see any option for being assi

Re: Hadoop documentation: control flow and FSM diagrams

2014-03-28 Thread Hardik Pandya
Very helpful indeed Emillio, thanks! On Fri, Mar 28, 2014 at 12:58 PM, Emilio Coppa wrote: > Hi All, > > I have created a wiki on github: > > https://github.com/ercoppa/HadoopDiagrams/wiki > > This is an effort to provide an updated documentation of how the internals > of Hadoop work. The main

Re: when it's safe to read map-reduce result?

2014-03-28 Thread Hardik Pandya
if the job complets without any failures exitCode should be 0 and safe to read the result public class MyApp extends Configured implements Tool { public int run(String[] args) throws Exception { // Configuration processed by ToolRunner Configuration conf = getConf();

Re: Replication HDFS

2014-03-28 Thread Wellington Chevreuil
Hi Victor, if by replication you mean copy from one cluster to other, you can use the distcp command. Cheers. On 28 Mar 2014, at 16:30, Serge Blazhievsky wrote: > You mean replication between two different hadoop cluster or you just need > data to be replicated between two different nodes?

Re: Why is HDFS_BYTES_WRITTEN is much larger than HDFS_BYTES_READ in this case?

2014-03-28 Thread Kim Chew
None of that. I checked the the input file's SequenceFile Header and it says "org.apache.hadoop.io.compress.zlib.BuiltInZlibDeflater" Kim On Fri, Mar 28, 2014 at 10:34 AM, Hardik Pandya wrote: > what is your compression format gzip, lzo or snappy > > for lzo final output > > FileOutputFormat.s

Re: How check sum are generated for blocks in data node

2014-03-28 Thread Wellington Chevreuil
Hi Reena, the pipeline is per block. If you have half of your file in data node A only, that means the pipeline had only one node (node A, in this case, probably because replication factor is set to 1) and then, data node A has the checksums for its block. The same applies to data node B. Al

How to find generated mapreduce code for pig/hive query

2014-03-28 Thread Spark Storm
hello experts, am really new to hadoop - Is it possible to find out based on pig or hive query to find out under the hood map reduce algorithm?? thanks

Re: How to find generated mapreduce code for pig/hive query

2014-03-28 Thread Shahab Yunus
You can use ILLUSTRATE and EXPLAIN commands to see the execution plan, if you mean that by 'under the hood algorithm' http://pig.apache.org/docs/r0.11.1/test.html Regards, Shahab On Fri, Mar 28, 2014 at 5:51 PM, Spark Storm wrote: > hello experts, > > am really new to hadoop - Is it possible

Need help get the hadoop cluster started in EC2

2014-03-28 Thread Max Zhao
Hi Everybody, I am trying to get my first hadoop cluster started using the Amazon EC2. I tried quite a few times and searched the web for the solutions, yet I still cannot get it up. I hope somebody can help out here. Here is what I did based on the Apache Whirr Quick Guide ( http://whirr.apache.

Re: Need help get the hadoop cluster started in EC2

2014-03-28 Thread Yusaku Sako
Hi Max, Not sure if you have already, but you might also want to look into Apache Ambari [1] for provisioning, managing, and monitoring Hadoop clusters. Many have successfully deployed Hadoop clusters on EC2 using Ambari. [1] http://ambari.apache.org/ Yusaku On Fri, Mar 28, 2014 at 7:07 PM, Max