Re: hadoop inconsistent behaviour

2014-04-14 Thread Gordon Wang
Hi Rahul, What is the log of reduce container ? Please paste the log and we can see the reason. On Mon, Apr 14, 2014 at 2:38 PM, Rahul Singh wrote: > Hi, > I am running a job(wordcount example) on 3 node cluster(1 master and 2 > slave), some times the job passes but some times it fails(as red

Re: HDFS file system size issue

2014-04-14 Thread Sandeep Nemuri
Please check your logs directory usage. On Mon, Apr 14, 2014 at 12:08 PM, Biswajit Nayak wrote: > Whats the replication factor you have? I believe it should be 3. hadoop > dus shows that disk usage without replication. While name node ui page > gives with replication. > > 38gb * 3 =114gb ~ 1TB

Re: Resetting dead datanodes list

2014-04-14 Thread Sandeep Nemuri
You can add the hostname/IP in exclude file and run this command hadoop dfsadmin -refreshNodes. On Mon, Apr 14, 2014 at 11:34 AM, Stanley Shi wrote: > I believe there's some command to show list of datanodes from CLI, using > parsing HTML is not a good idea. HTML page is intended to be read by

Re: hadoop inconsistent behaviour

2014-04-14 Thread Rahul Singh
I cleaned up the log directory before running the job. Now there is no nodemanager jobs. when i see in userlogs directory i am getting some syslog files with the following error: 2014-04-14 11:58:23,472 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: poc-hadoop06/127.0.1.1:40

Re: hadoop inconsistent behaviour

2014-04-14 Thread Rahul Singh
how do i identify an reduce container? there are multiple container dirs in my application id folder in userlogs. On Mon, Apr 14, 2014 at 12:29 PM, Gordon Wang wrote: > Hi Rahul, > > What is the log of reduce container ? Please paste the log and we can see > the reason. > > > On Mon, Apr 14, 20

Re: hadoop inconsistent behaviour

2014-04-14 Thread Gordon Wang
You can find the reduce container from RM's web page. BTW: from above log, you can check if application master crashes. On Mon, Apr 14, 2014 at 3:12 PM, Rahul Singh wrote: > how do i identify an reduce container? there are multiple container dirs > in my application id folder in userlogs. > > >

Hive install under hadoop

2014-04-14 Thread EdwardKing
I want to use hive in hadoop2.2.0, so I execute following steps: [hadoop@master /]$ tar �Cxzf hive-0.11.0.tar.gz [hadoop@master /]$ export HIVE_HOME=/home/software/hive [hadoop@master /]$ export PATH=${HIVE_HOME}/bin:${PATH} [hadoop@master /]$ hadoop fs -mkdir /tmp [hadoop@master /]$ hadoop fs -m

unsubscribe

2014-04-14 Thread Hansi Klose

heterogeneous storages in HDFS

2014-04-14 Thread lei liu
On April 11 hadoop-2.4 is released, the hadoop-2.4 does not include heterogeneous storages function, when does hadoop include the function? Thanks, LiuLei

Re: heterogeneous storages in HDFS

2014-04-14 Thread ascot.m...@gmail.com
hi, From 2.3.0 20 February, 2014: Release 2.3.0 available Apache Hadoop 2.3.0 contains a number of significant enhancements such as: Support for Heterogeneous Storage hierarchy in HDFS. Is it already there? Ascot On 14 Apr, 2014, at 4:34 pm, lei liu wrote: > On April 11 hadoop-2.4 is rele

Re: heterogeneous storages in HDFS

2014-04-14 Thread Stanley Shi
Please find it in this page: https://wiki.apache.org/hadoop/Roadmap hadoop 2.3.0 only include "phase 1" of the heterogeneous storage; "phase 2" will be included in 2.5.0; Regards, *Stanley Shi,* On Mon, Apr 14, 2014 at 4:38 PM, ascot.m...@gmail.com wrote: > hi, > > From 2.3.0 > 20 February,

Re: heterogeneous storages in HDFS

2014-04-14 Thread lei liu
When is hadoop released? 2014-04-14 17:04 GMT+08:00 Stanley Shi : > Please find it in this page: https://wiki.apache.org/hadoop/Roadmap > > hadoop 2.3.0 only include "phase 1" of the heterogeneous storage; "phase > 2" will be included in 2.5.0; > > Regards, > *Stanley Shi,* > > > > On Mon, Ap

Re: heterogeneous storages in HDFS

2014-04-14 Thread Azuryy
Hadoop 2.5 would be released on mid May. Sent from my iPhone5s > On 2014年4月14日, at 17:47, lei liu wrote: > > When is hadoop released? > > > > > 2014-04-14 17:04 GMT+08:00 Stanley Shi : >> Please find it in this page: https://wiki.apache.org/hadoop/Roadmap >> >> hadoop 2.3.0 only include

Re: Hive install under hadoop

2014-04-14 Thread Thomas Bentsen
Seems like an ordinary Linux file permission thing. Are you logged in as user 'software' Does your user have permission to create dirs in /home/software? /th On Mon, 2014-04-14 at 16:12 +0800, EdwardKing wrote: > I want to use hive in hadoop2.2.0, so I execute following steps: > > [hadoop@mast

Re: Exception in Jobtracker (java.lang.OutOfMemoryError: Java heap space)

2014-04-14 Thread Wellington Chevreuil
Hi Viswanathan, this looks like your job history is full, and is filling up your jobtracker heap: > 2014-04-12 02:25:47,963 ERROR org.apache.hadoop.mapred.JobHistory: Unable to > move history file to DONE canonical subfolder. > java.lang.OutOfMemoryError: Java heap space Have you set any value

Re: Resetting dead datanodes list

2014-04-14 Thread divye sheth
If by resetting the list of dead datanodes you mean the web-console or the report command not showing the datanode that you removed, in this case you will have to do the following: 1. Remove the entry from the slaves file corresponding to the dead datanode. 2. Remove entry from exclude file 3. Run

Re: Exception in hadoop jobtracker OOM

2014-04-14 Thread dingweiqiong
unsubscribe 2014-04-12 23:02 GMT+08:00 Viswanathan J : > Hi, > > I'm using Hadoop v1.2.1 and it is running fine so long(3 months) without > any issues. > > Suddenly I got the below error in Jobtracker and jobs are failed to run. > > Is this issue in JT or TT or Jetty issue? > > 2014-04-12 02:13:

Re: can't copy between hdfs

2014-04-14 Thread divye sheth
Hi, Try setting this property in hdfs-site.xml on the destination cluster. dfs.datanode.max.xcievers 4096 4096 would work if needed you may increase this to a higher number. A restart would be required in this case. Also make sure you have the ulimit set to a high number as had

Re: Exception in hadoop jobtracker OOM

2014-04-14 Thread divye sheth
This usually occurs when the task takes more memory and exceeds its heap space. You can increase the memory of the tasks by setting a property in mapred-site.xml Mapred.child.opts specify the -Xmx to a higher value. P.S the property name might not be correct, you may look up the mapred-default.xm

Re: Exception in hadoop jobtracker OOM

2014-04-14 Thread divye sheth
Sorry for the error. Did not have a proper look at the logs. This seems to be a JT issue. Ignore the previous email. Thanks Divye Sheth On Apr 14, 2014 6:06 PM, "divye sheth" wrote: > This usually occurs when the task takes more memory and exceeds its heap > space. You can increase the memory of

Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop

2014-04-14 Thread Radhe Radhe
Hello People, As per the Apache site http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html Binary CompatibilityFirst, we ensure binary compatibility to the applications that use old mapred APIs.

unsubscribe

2014-04-14 Thread Shawn

How multiple input files are processed by mappers

2014-04-14 Thread Shashidhar Rao
Hi, Please can somebody clarify my doubts. Say. I have a cluster of 30 nodes and I want to put the files in HDFS. And all the files combine together the size is 10 TB but each file is roughly say 1GB only and the total number of files 10 files 1. In real production environment do we copy these 1

Re: How multiple input files are processed by mappers

2014-04-14 Thread Nitin Pawar
1. In real production environment do we copy these 10 files in hdfs under a folder one by one. If this is the case then how many mappers do we specify 10 mappers. And do we use put command of hadoop to transfer this file. Ans: This will depend on what you want to do with files. There is no rule wh

Re: unsubscribe

2014-04-14 Thread Levin ding
unsubscribe

Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop

2014-04-14 Thread John Meagher
Source Compatibility = you need to recompile and use the new version as part of the compilation Binary Compatibility = you can take something compiled against the old version and run it on the new version On Mon, Apr 14, 2014 at 9:19 AM, Radhe Radhe wrote: > Hello People, > > As per the Apache s

Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop

2014-04-14 Thread John Meagher
Also, "Source Compatibility" also means ONLY a recompile is needed. No code changes should be needed. On Mon, Apr 14, 2014 at 10:37 AM, John Meagher wrote: > Source Compatibility = you need to recompile and use the new version > as part of the compilation > > Binary Compatibility = you can take s

Fwd: Changing default scheduler in hadoop

2014-04-14 Thread Mahesh Khandewal
-- Forwarded message -- From: Mahesh Khandewal Date: Mon, 14 Apr 2014 08:42:16 +0530 Subject: Re: Changing default scheduler in hadoop To: user@hadoop.apache.org Cc: Ekta Agrawal , "common-u...@hadoop.apache.org" , "hdfs-u...@hadoop.apache.org" Hi i have patch file of Resource Aw

RE: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS access?

2014-04-14 Thread Roger Whitcomb
Thank you Dave, I got it. Needed a few other .jars as well (commons-cli and protobuf-java). But most importantly, the port was wrong. 50070 is for HTTP access, but using 8020 is correct for direct HDFS access. Thanks again, ~Roger From: dlmarion Sent: Friday,

Map and reduce slots

2014-04-14 Thread Shashidhar Rao
Hi, Can somebody clarify what are map and reduce slots and how Hadoop calculates these slots. Are these slots calculated based on the number of splits? I am getting different answers please help Regards Shashidhar

Re: Map and reduce slots

2014-04-14 Thread João Paulo Forny
You need to differentiate slots from tasks. Tasks are spawned by TT and assigned to a free slot in the cluster. The number of map tasks for a Hadoop job is typically controlled by the input data size and the split size. The number of reduce tasks for a Hadoop job is controlled by the *mapreduce.job

Time taken to do a word count on 10 TB data.

2014-04-14 Thread Shashidhar Rao
Hi, Can somebody provide me a rough estimate of the time taken in hours/mins for a cluster of say 30 nodes to run a map reduce job to perform a word count on say 10 TB of data, assuming that the hardware and the map reduce program is tuned optimally. Just a rough estimate, it could be 5TB,10 TB o

unsubscribe

2014-04-14 Thread dinesh dakshan

Re: Confirm that restoring a copy of snapshot involves copying the data

2014-04-14 Thread Manoj Samel
Any thoughts ? On Wed, Apr 9, 2014 at 10:08 AM, Manoj Samel wrote: > Hi, > > If I take HDFS snapshot and then restore is to some other directory using > > hdfs dfs -cp /xxx/.snapshot/nnn /aaa/bbb > > want to confirm that there is a copy of data from files under snapshot to > the target director.

Re: How multiple input files are processed by mappers

2014-04-14 Thread Alok Kumar
Hi, You can just use put command to load file into HDFS https://hadoop.apache.org/docs/r0.18.3/hdfs_shell.html#put Copying files into hdfs won't require mapper or map-reduce job; It depends on your processing logic ( map-reduce code ) if you really require to have single merged file. Also, you ca

hdfs - get file block path for a datanode

2014-04-14 Thread Alexandros Papadopoulos
hi all, in some cases as hdfs-client, i would like to know the file block path in a datanode. Is there a way to get a file block path for a datanode ?? Thanks in advance, alexpap

Re: hdfs - get file block path for a datanode

2014-04-14 Thread Peyman Mohajerian
hadoop fsck -files -blocks -locations On Mon, Apr 14, 2014 at 4:43 PM, Alexandros Papadopoulos < alex.pap...@gmail.com> wrote: > hi all, > > in some cases as hdfs-client, i would like to know the file block path > in a datanode. > Is there a way to get a file block path for a datanode ?? >

Re: hdfs - get file block path for a datanode

2014-04-14 Thread Alexandros Papadopoulos
thanks for the response, yep y r right! sorry i didn't make it clear. i need this feature throw the java api ?? On 04/15/2014 12:04 AM, Peyman Mohajerian wrote: hadoop fsck -files -blocks -locations On Mon, Apr 14, 2014 at 4:43 PM, Alexandros Papadopoulos mailto:alex.pap...@gmail.com>> wr

Applying delta between two HDFS snapshot using hdfs command

2014-04-14 Thread Manoj Samel
Hi, It seems the only restore from a HDFS snapshot using hdfs command line is copy snapshot files to a target path. If the use cases are 0. stuff ... 1. Take snapshot s_N 2. Add some files, delete other files 3. Take snapshot s_N+1 then copying s_N+1 to target just copies the newly ad

Re: HDFS file system size issue

2014-04-14 Thread Abdelrahman Shettia
Hi Biswa, Are you sure that the replication factor of the files are three? Please run a ‘hadoop fsck / -blocks -files -locations’ and see the replication factor for each file. Also, Post the configuration of dfs.datanode.du.reserved and please check the real space presented by a DataNode by

Re: Applying delta between two HDFS snapshot using hdfs command

2014-04-14 Thread Jing Zhao
Hi Manoj, You're right, right now we do not have a complete snapshot rollback/restore functionality in HDFS. Thus users have to manually copy/delete files according to the snapshot diff report. There's an open jira HDFS-4167 for it. We plan to provide this support soon. Thanks, -Jing On Mon

Re: Applying delta between two HDFS snapshot using hdfs command

2014-04-14 Thread Manoj Samel
Thanks Jing, The Jira is open since Nov 12 but seems a design doc was added just few days back ... Would you have any ETA on this ? Thanks again ! Manoj On Mon, Apr 14, 2014 at 2:47 PM, Jing Zhao wrote: > Hi Manoj, > > You're right, right now we do not have a complete snapshot > rollbac

Setting debug log level for individual daemons

2014-04-14 Thread Ashwin Shankar
Hi, How do we set log level to debug for lets say only Resource manager and not the other hadoop daemons ? -- Thanks, Ashwin

Re: heterogeneous storages in HDFS

2014-04-14 Thread Stanley Shi
This quick? 2.4 has just been released for a few weeks. On Monday, April 14, 2014, Azuryy wrote: > Hadoop 2.5 would be released on mid May. > > > Sent from my iPhone5s > > On 2014年4月14日, at 17:47, lei liu > > > wrote: > > When is hadoop released? > > > > > 2014-04-14 17:04 GMT+08:00 Stanley Shi

Re: Setting debug log level for individual daemons

2014-04-14 Thread Stanley Shi
Add -Dhadoop.root.logger=DEBUG to Something like HADOOP_resourcemanager_opts in yarn-env.sh On Tuesday, April 15, 2014, Ashwin Shankar wrote: > Hi, > How do we set log level to debug for lets say only Resource manager > and not the other hadoop daemons ? > > -- > Thanks, > Ashwin > > > -- Rega

Re: Time taken to do a word count on 10 TB data.

2014-04-14 Thread Stanley Shi
Rough estimation: since word count requires very little computation, it is io centric, we can do estimation based on disk speed. Assume 10 disk with each 100MBps for each node, that is about 1GBps per node; assume 70% utilization in mapper, we have 700MBps for each node. For 30 nodes, it is total

Re: how can i archive old data in HDFS?

2014-04-14 Thread ch huang
it just combine several file into one file ,no zip happened On Fri, Apr 11, 2014 at 9:10 PM, Peyman Mohajerian wrote: > There is: http://hadoop.apache.org/docs/r1.2.1/hadoop_archives.html > But not sure if it compresses the data or not. > > > On Thu, Apr 10, 2014 at 9:57 PM, Stanley Shi wrote: >

What constitutes as "Rename" in snapshot Diff report ?

2014-04-14 Thread Manoj Samel
Hi, In the SnapshotDiffReport class public enum DiffType { CREATE("+"), MODIFY("M"), DELETE("-"), RENAME("R"); ... If I do a "mv" on a file, in the snapshot diff, it shows as delete of old name and creation of new name. What constitutes a "RENAME" ? Thanks, Manoj

Offline image viewer - account for edits ?

2014-04-14 Thread Manoj Samel
Hi, Is it correct to say that the offline image viewer does not accounts for any edits that are not yet merged into the fsimage? Thanks,

Re: Time taken to do a word count on 10 TB data.

2014-04-14 Thread Shashidhar Rao
Thanks stantley shi On Tue, Apr 15, 2014 at 6:25 AM, Stanley Shi wrote: > Rough estimation: since word count requires very little computation, it is > io centric, we can do estimation based on disk speed. > > Assume 10 disk with each 100MBps for each node, that is about 1GBps per > node; assume

Re: FileChannel.map与directbyteBuffer分析

2014-04-14 Thread Ted Yu
On Mac, there is no default program to open .msg file. Can you send in text ? Cheers On Mon, Apr 14, 2014 at 8:48 PM, lei liu wrote: > >

SSH in production environment

2014-04-14 Thread Shashidhar Rao
Hi , Can somebody please clarify in real production environment with multiple nodes in cluster does ssh is implemented with or without it. I have seen examples where keys are generated and those keys are copied in authorized files in other nodes to login to other nodes. Is this the same way done

Hadoop and Hbase

2014-04-14 Thread Shashidhar Rao
Hi, Please somebody clarify how hadoop and hbase are both used in real production environment. Do the Region Servers of Hbase can be installed in Hadoop Datanodes or Region servers are separated from Hadoop data nodes in multiple clusters. I know it's Hbase centric but still if someone has exper

2-node cluster

2014-04-14 Thread Mohan Radhakrishnan
Hi, I have 2 nodes, one is OSX and the other is linux. How is a distributed cluster installed in this case ? What other networking equipment do I need ? Can I ask for pointers to instructions ? I am new. Thanks, Mohan

Re: Hadoop and Hbase

2014-04-14 Thread Ted Yu
When Region Servers are co-located with Datanodes, you can utilize short circuit read feature. See 12.11.2 of http://hbase.apache.org/book.html#perf.hdfs Factors to consider co-location include the allocation of memory of server - so that region server and Data node can have ample memory to fulfil

Hadoop cluster monitoring

2014-04-14 Thread Shashidhar Rao
Hi, Can somebody please help me in clarifying how hadoop cluster is monitored and profiled in real production environment. What are the tools and links if any. I heard Ganglia and HPROF. For HPROF , can somebody share some experience of how to configure to use HPROF to use with Hadoop Regards S

Re: Hadoop and Hbase

2014-04-14 Thread Shashidhar Rao
Thanks Ted Yu On Tue, Apr 15, 2014 at 9:37 AM, Ted Yu wrote: > When Region Servers are co-located with Datanodes, you can utilize short > circuit read feature. > See 12.11.2 of http://hbase.apache.org/book.html#perf.hdfs > > Factors to consider co-location include the allocation of memory of se

Re: 2-node cluster

2014-04-14 Thread Stanley Shi
You can just follow any instruction on deploying distributed cluster, just put several different services on the same host; Regards, *Stanley Shi,* On Tue, Apr 15, 2014 at 12:02 PM, Mohan Radhakrishnan < radhakrishnan.mo...@gmail.com> wrote: > Hi, > I have 2 nodes, one is OSX and the o

Pig: java.lang.String cannot be cast to org.apache.pig.data.DataBag in specified map task

2014-04-14 Thread leiwang...@gmail.com
Hi, I am using cloudera and run mapreduce job written with pig latin, I met the following exception in a map task: 014-04-15 11:30:39,532 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.pig.data.DataBag

Hadoop cluster Rack

2014-04-14 Thread Shashidhar Rao
Hi, Can somebody please explain how to decide on putting nodes in Rack or not. Say , I have 30 nodes , is there a rule that if the number of nodes reach certain number then it is better to put those nodes in Rack. How to decide whether to use Rack or not. Regards Shashi

Re: SSH in production environment

2014-04-14 Thread Shengjun Xin
If you want to use start-all.sh, you need to configure ssh keys or you can login the target machine to start service On Tue, Apr 15, 2014 at 11:56 AM, Shashidhar Rao wrote: > Hi , > > Can somebody please clarify in real production environment with multiple > nodes in cluster does ssh is impleme

Re: Hadoop cluster monitoring

2014-04-14 Thread Arun Murthy
Lots of folks use Apache Ambari (http://ambari.apache.org/) to deploy and monitor their Hadoop cluster. Ambari uses Ganglia/Nagios as underlying technology and has much better UI etc. hth, Arun On Mon, Apr 14, 2014 at 9:08 PM, Shashidhar Rao wrote: > Hi, > > Can somebody please help me in clari

Re: Hadoop cluster monitoring

2014-04-14 Thread Shashidhar Rao
Thanks Arun Murthy On Tue, Apr 15, 2014 at 11:32 AM, Arun Murthy wrote: > Lots of folks use Apache Ambari (http://ambari.apache.org/) to deploy and > monitor their Hadoop cluster. Ambari uses Ganglia/Nagios as underlying > technology and has much better UI etc. > > hth, > Arun > > > On Mon, Apr