date:20140414

Re: Hadoop cluster monitoring

2014-04-14 Thread Shashidhar Rao

Thanks Arun Murthy On Tue, Apr 15, 2014 at 11:32 AM, Arun Murthy wrote: > Lots of folks use Apache Ambari (http://ambari.apache.org/) to deploy and > monitor their Hadoop cluster. Ambari uses Ganglia/Nagios as underlying > technology and has much better UI etc. > > hth, > Arun > > > On Mon, Apr

Re: Hadoop cluster monitoring

2014-04-14 Thread Arun Murthy

Lots of folks use Apache Ambari (http://ambari.apache.org/) to deploy and monitor their Hadoop cluster. Ambari uses Ganglia/Nagios as underlying technology and has much better UI etc. hth, Arun On Mon, Apr 14, 2014 at 9:08 PM, Shashidhar Rao wrote: > Hi, > > Can somebody please help me in clari

Re: SSH in production environment

2014-04-14 Thread Shengjun Xin

If you want to use start-all.sh, you need to configure ssh keys or you can login the target machine to start service On Tue, Apr 15, 2014 at 11:56 AM, Shashidhar Rao wrote: > Hi , > > Can somebody please clarify in real production environment with multiple > nodes in cluster does ssh is impleme

Hadoop cluster Rack

2014-04-14 Thread Shashidhar Rao

Hi, Can somebody please explain how to decide on putting nodes in Rack or not. Say , I have 30 nodes , is there a rule that if the number of nodes reach certain number then it is better to put those nodes in Rack. How to decide whether to use Rack or not. Regards Shashi

Pig: java.lang.String cannot be cast to org.apache.pig.data.DataBag in specified map task

2014-04-14 Thread leiwang...@gmail.com

Hi, I am using cloudera and run mapreduce job written with pig latin, I met the following exception in a map task: 014-04-15 11:30:39,532 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.pig.data.DataBag

Re: 2-node cluster

2014-04-14 Thread Stanley Shi

You can just follow any instruction on deploying distributed cluster, just put several different services on the same host; Regards, *Stanley Shi,* On Tue, Apr 15, 2014 at 12:02 PM, Mohan Radhakrishnan < radhakrishnan.mo...@gmail.com> wrote: > Hi, > I have 2 nodes, one is OSX and the o

Re: Hadoop and Hbase

2014-04-14 Thread Shashidhar Rao

Thanks Ted Yu On Tue, Apr 15, 2014 at 9:37 AM, Ted Yu wrote: > When Region Servers are co-located with Datanodes, you can utilize short > circuit read feature. > See 12.11.2 of http://hbase.apache.org/book.html#perf.hdfs > > Factors to consider co-location include the allocation of memory of se

Hadoop cluster monitoring

2014-04-14 Thread Shashidhar Rao

Hi, Can somebody please help me in clarifying how hadoop cluster is monitored and profiled in real production environment. What are the tools and links if any. I heard Ganglia and HPROF. For HPROF , can somebody share some experience of how to configure to use HPROF to use with Hadoop Regards S

Re: Hadoop and Hbase

2014-04-14 Thread Ted Yu

When Region Servers are co-located with Datanodes, you can utilize short circuit read feature. See 12.11.2 of http://hbase.apache.org/book.html#perf.hdfs Factors to consider co-location include the allocation of memory of server - so that region server and Data node can have ample memory to fulfil

2-node cluster

2014-04-14 Thread Mohan Radhakrishnan

Hi, I have 2 nodes, one is OSX and the other is linux. How is a distributed cluster installed in this case ? What other networking equipment do I need ? Can I ask for pointers to instructions ? I am new. Thanks, Mohan

Hadoop and Hbase

2014-04-14 Thread Shashidhar Rao

Hi, Please somebody clarify how hadoop and hbase are both used in real production environment. Do the Region Servers of Hbase can be installed in Hadoop Datanodes or Region servers are separated from Hadoop data nodes in multiple clusters. I know it's Hbase centric but still if someone has exper

SSH in production environment

2014-04-14 Thread Shashidhar Rao

Hi , Can somebody please clarify in real production environment with multiple nodes in cluster does ssh is implemented with or without it. I have seen examples where keys are generated and those keys are copied in authorized files in other nodes to login to other nodes. Is this the same way done

Re: FileChannel.map与directbyteBuffer分析

2014-04-14 Thread Ted Yu

On Mac, there is no default program to open .msg file. Can you send in text ? Cheers On Mon, Apr 14, 2014 at 8:48 PM, lei liu wrote: > >

Re: Time taken to do a word count on 10 TB data.

2014-04-14 Thread Shashidhar Rao

Thanks stantley shi On Tue, Apr 15, 2014 at 6:25 AM, Stanley Shi wrote: > Rough estimation: since word count requires very little computation, it is > io centric, we can do estimation based on disk speed. > > Assume 10 disk with each 100MBps for each node, that is about 1GBps per > node; assume

Offline image viewer - account for edits ?

2014-04-14 Thread Manoj Samel

Hi, Is it correct to say that the offline image viewer does not accounts for any edits that are not yet merged into the fsimage? Thanks,

What constitutes as "Rename" in snapshot Diff report ?

2014-04-14 Thread Manoj Samel

Hi, In the SnapshotDiffReport class public enum DiffType { CREATE("+"), MODIFY("M"), DELETE("-"), RENAME("R"); ... If I do a "mv" on a file, in the snapshot diff, it shows as delete of old name and creation of new name. What constitutes a "RENAME" ? Thanks, Manoj

Re: how can i archive old data in HDFS?

2014-04-14 Thread ch huang

it just combine several file into one file ,no zip happened On Fri, Apr 11, 2014 at 9:10 PM, Peyman Mohajerian wrote: > There is: http://hadoop.apache.org/docs/r1.2.1/hadoop_archives.html > But not sure if it compresses the data or not. > > > On Thu, Apr 10, 2014 at 9:57 PM, Stanley Shi wrote: >

Re: Time taken to do a word count on 10 TB data.

2014-04-14 Thread Stanley Shi

Rough estimation: since word count requires very little computation, it is io centric, we can do estimation based on disk speed. Assume 10 disk with each 100MBps for each node, that is about 1GBps per node; assume 70% utilization in mapper, we have 700MBps for each node. For 30 nodes, it is total

Re: Setting debug log level for individual daemons

2014-04-14 Thread Stanley Shi

Add -Dhadoop.root.logger=DEBUG to Something like HADOOP_resourcemanager_opts in yarn-env.sh On Tuesday, April 15, 2014, Ashwin Shankar wrote: > Hi, > How do we set log level to debug for lets say only Resource manager > and not the other hadoop daemons ? > > -- > Thanks, > Ashwin > > > -- Rega

Re: heterogeneous storages in HDFS

2014-04-14 Thread Stanley Shi

This quick? 2.4 has just been released for a few weeks. On Monday, April 14, 2014, Azuryy wrote: > Hadoop 2.5 would be released on mid May. > > > Sent from my iPhone5s > > On 2014年4月14日, at 17:47, lei liu > > > wrote: > > When is hadoop released？ > > > > > 2014-04-14 17:04 GMT+08:00 Stanley Shi

Setting debug log level for individual daemons

2014-04-14 Thread Ashwin Shankar

Hi, How do we set log level to debug for lets say only Resource manager and not the other hadoop daemons ? -- Thanks, Ashwin

Re: Applying delta between two HDFS snapshot using hdfs command

2014-04-14 Thread Manoj Samel

Thanks Jing, The Jira is open since Nov 12 but seems a design doc was added just few days back ... Would you have any ETA on this ? Thanks again ! Manoj On Mon, Apr 14, 2014 at 2:47 PM, Jing Zhao wrote: > Hi Manoj, > > You're right, right now we do not have a complete snapshot > rollbac

Re: Applying delta between two HDFS snapshot using hdfs command

2014-04-14 Thread Jing Zhao

Hi Manoj, You're right, right now we do not have a complete snapshot rollback/restore functionality in HDFS. Thus users have to manually copy/delete files according to the snapshot diff report. There's an open jira HDFS-4167 for it. We plan to provide this support soon. Thanks, -Jing On Mon

Re: HDFS file system size issue

2014-04-14 Thread Abdelrahman Shettia

Hi Biswa, Are you sure that the replication factor of the files are three? Please run a ‘hadoop fsck / -blocks -files -locations’ and see the replication factor for each file. Also, Post the configuration of dfs.datanode.du.reserved and please check the real space presented by a DataNode by

Applying delta between two HDFS snapshot using hdfs command

2014-04-14 Thread Manoj Samel

Hi, It seems the only restore from a HDFS snapshot using hdfs command line is copy snapshot files to a target path. If the use cases are 0. stuff ... 1. Take snapshot s_N 2. Add some files, delete other files 3. Take snapshot s_N+1 then copying s_N+1 to target just copies the newly ad

Re: hdfs - get file block path for a datanode

2014-04-14 Thread Alexandros Papadopoulos

thanks for the response, yep y r right! sorry i didn't make it clear. i need this feature throw the java api ?? On 04/15/2014 12:04 AM, Peyman Mohajerian wrote: hadoop fsck -files -blocks -locations On Mon, Apr 14, 2014 at 4:43 PM, Alexandros Papadopoulos mailto:alex.pap...@gmail.com>> wr

Re: hdfs - get file block path for a datanode

2014-04-14 Thread Peyman Mohajerian

hadoop fsck -files -blocks -locations On Mon, Apr 14, 2014 at 4:43 PM, Alexandros Papadopoulos < alex.pap...@gmail.com> wrote: > hi all, > > in some cases as hdfs-client, i would like to know the file block path > in a datanode. > Is there a way to get a file block path for a datanode ?? >

hdfs - get file block path for a datanode

2014-04-14 Thread Alexandros Papadopoulos

hi all, in some cases as hdfs-client, i would like to know the file block path in a datanode. Is there a way to get a file block path for a datanode ?? Thanks in advance, alexpap

Re: How multiple input files are processed by mappers

2014-04-14 Thread Alok Kumar

Hi, You can just use put command to load file into HDFS https://hadoop.apache.org/docs/r0.18.3/hdfs_shell.html#put Copying files into hdfs won't require mapper or map-reduce job; It depends on your processing logic ( map-reduce code ) if you really require to have single merged file. Also, you ca

Re: Confirm that restoring a copy of snapshot involves copying the data

2014-04-14 Thread Manoj Samel

Any thoughts ? On Wed, Apr 9, 2014 at 10:08 AM, Manoj Samel wrote: > Hi, > > If I take HDFS snapshot and then restore is to some other directory using > > hdfs dfs -cp /xxx/.snapshot/nnn /aaa/bbb > > want to confirm that there is a copy of data from files under snapshot to > the target director.

unsubscribe

2014-04-14 Thread dinesh dakshan

Time taken to do a word count on 10 TB data.

2014-04-14 Thread Shashidhar Rao

Hi, Can somebody provide me a rough estimate of the time taken in hours/mins for a cluster of say 30 nodes to run a map reduce job to perform a word count on say 10 TB of data, assuming that the hardware and the map reduce program is tuned optimally. Just a rough estimate, it could be 5TB,10 TB o

Re: Map and reduce slots

2014-04-14 Thread João Paulo Forny

You need to differentiate slots from tasks. Tasks are spawned by TT and assigned to a free slot in the cluster. The number of map tasks for a Hadoop job is typically controlled by the input data size and the split size. The number of reduce tasks for a Hadoop job is controlled by the *mapreduce.job

Map and reduce slots

2014-04-14 Thread Shashidhar Rao

Hi, Can somebody clarify what are map and reduce slots and how Hadoop calculates these slots. Are these slots calculated based on the number of splits? I am getting different answers please help Regards Shashidhar

RE: Which Hadoop 2.x .jars are necessary for Apache Commons VFS HDFS access?

2014-04-14 Thread Roger Whitcomb

Thank you Dave, I got it. Needed a few other .jars as well (commons-cli and protobuf-java). But most importantly, the port was wrong. 50070 is for HTTP access, but using 8020 is correct for direct HDFS access. Thanks again, ~Roger From: dlmarion Sent: Friday,

Fwd: Changing default scheduler in hadoop

2014-04-14 Thread Mahesh Khandewal

-- Forwarded message -- From: Mahesh Khandewal Date: Mon, 14 Apr 2014 08:42:16 +0530 Subject: Re: Changing default scheduler in hadoop To: user@hadoop.apache.org Cc: Ekta Agrawal , "common-u...@hadoop.apache.org" , "hdfs-u...@hadoop.apache.org" Hi i have patch file of Resource Aw

Re: Doubt regarding Binary Compatibility\Source Compatibility with old mapred APIs and new mapreduce APIs in Hadoop

2014-04-14 Thread John Meagher

Also, "Source Compatibility" also means ONLY a recompile is needed. No code changes should be needed. On Mon, Apr 14, 2014 at 10:37 AM, John Meagher wrote: > Source Compatibility = you need to recompile and use the new version > as part of the compilation > > Binary Compatibility = you can take s

Re: Doubt regarding Binary Compatibility\Source Compatibility with old mapred APIs and new mapreduce APIs in Hadoop

2014-04-14 Thread John Meagher

Source Compatibility = you need to recompile and use the new version as part of the compilation Binary Compatibility = you can take something compiled against the old version and run it on the new version On Mon, Apr 14, 2014 at 9:19 AM, Radhe Radhe wrote: > Hello People, > > As per the Apache s

Re: unsubscribe

2014-04-14 Thread Levin ding

unsubscribe

Re: How multiple input files are processed by mappers

2014-04-14 Thread Nitin Pawar

1. In real production environment do we copy these 10 files in hdfs under a folder one by one. If this is the case then how many mappers do we specify 10 mappers. And do we use put command of hadoop to transfer this file. Ans: This will depend on what you want to do with files. There is no rule wh

How multiple input files are processed by mappers

2014-04-14 Thread Shashidhar Rao

Hi, Please can somebody clarify my doubts. Say. I have a cluster of 30 nodes and I want to put the files in HDFS. And all the files combine together the size is 10 TB but each file is roughly say 1GB only and the total number of files 10 files 1. In real production environment do we copy these 1

unsubscribe

2014-04-14 Thread Shawn

Doubt regarding Binary Compatibility\Source Compatibility with old mapred APIs and new mapreduce APIs in Hadoop

2014-04-14 Thread Radhe Radhe

Hello People, As per the Apache site http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html Binary CompatibilityFirst, we ensure binary compatibility to the applications that use old mapred APIs.

Re: Exception in hadoop jobtracker OOM

2014-04-14 Thread divye sheth

Sorry for the error. Did not have a proper look at the logs. This seems to be a JT issue. Ignore the previous email. Thanks Divye Sheth On Apr 14, 2014 6:06 PM, "divye sheth" wrote: > This usually occurs when the task takes more memory and exceeds its heap > space. You can increase the memory of

Re: Exception in hadoop jobtracker OOM

2014-04-14 Thread divye sheth

This usually occurs when the task takes more memory and exceeds its heap space. You can increase the memory of the tasks by setting a property in mapred-site.xml Mapred.child.opts specify the -Xmx to a higher value. P.S the property name might not be correct, you may look up the mapred-default.xm

Re: can't copy between hdfs

2014-04-14 Thread divye sheth

Hi, Try setting this property in hdfs-site.xml on the destination cluster. dfs.datanode.max.xcievers 4096 4096 would work if needed you may increase this to a higher number. A restart would be required in this case. Also make sure you have the ulimit set to a high number as had

Re: Exception in hadoop jobtracker OOM

2014-04-14 Thread dingweiqiong

unsubscribe 2014-04-12 23:02 GMT+08:00 Viswanathan J : > Hi, > > I'm using Hadoop v1.2.1 and it is running fine so long(3 months) without > any issues. > > Suddenly I got the below error in Jobtracker and jobs are failed to run. > > Is this issue in JT or TT or Jetty issue? > > 2014-04-12 02:13:

Re: Resetting dead datanodes list

2014-04-14 Thread divye sheth

If by resetting the list of dead datanodes you mean the web-console or the report command not showing the datanode that you removed, in this case you will have to do the following: 1. Remove the entry from the slaves file corresponding to the dead datanode. 2. Remove entry from exclude file 3. Run

Re: Exception in Jobtracker (java.lang.OutOfMemoryError: Java heap space)

2014-04-14 Thread Wellington Chevreuil

Hi Viswanathan, this looks like your job history is full, and is filling up your jobtracker heap: > 2014-04-12 02:25:47,963 ERROR org.apache.hadoop.mapred.JobHistory: Unable to > move history file to DONE canonical subfolder. > java.lang.OutOfMemoryError: Java heap space Have you set any value

Re: Hive install under hadoop

2014-04-14 Thread Thomas Bentsen

Seems like an ordinary Linux file permission thing. Are you logged in as user 'software' Does your user have permission to create dirs in /home/software? /th On Mon, 2014-04-14 at 16:12 +0800, EdwardKing wrote: > I want to use hive in hadoop2.2.0, so I execute following steps: > > [hadoop@mast

Re: heterogeneous storages in HDFS

2014-04-14 Thread Azuryy

Hadoop 2.5 would be released on mid May. Sent from my iPhone5s > On 2014年4月14日, at 17:47, lei liu wrote: > > When is hadoop released？ > > > > > 2014-04-14 17:04 GMT+08:00 Stanley Shi : >> Please find it in this page: https://wiki.apache.org/hadoop/Roadmap >> >> hadoop 2.3.0 only include

Re: heterogeneous storages in HDFS

2014-04-14 Thread lei liu

When is hadoop released？ 2014-04-14 17:04 GMT+08:00 Stanley Shi : > Please find it in this page: https://wiki.apache.org/hadoop/Roadmap > > hadoop 2.3.0 only include "phase 1" of the heterogeneous storage; "phase > 2" will be included in 2.5.0; > > Regards, > *Stanley Shi,* > > > > On Mon, Ap

Re: heterogeneous storages in HDFS

2014-04-14 Thread Stanley Shi

Please find it in this page: https://wiki.apache.org/hadoop/Roadmap hadoop 2.3.0 only include "phase 1" of the heterogeneous storage; "phase 2" will be included in 2.5.0; Regards, *Stanley Shi,* On Mon, Apr 14, 2014 at 4:38 PM, ascot.m...@gmail.com wrote: > hi, > > From 2.3.0 > 20 February,

Re: heterogeneous storages in HDFS

2014-04-14 Thread ascot.m...@gmail.com

hi, From 2.3.0 20 February, 2014: Release 2.3.0 available Apache Hadoop 2.3.0 contains a number of significant enhancements such as: Support for Heterogeneous Storage hierarchy in HDFS. Is it already there? Ascot On 14 Apr, 2014, at 4:34 pm, lei liu wrote: > On April 11 hadoop-2.4 is rele

heterogeneous storages in HDFS

2014-04-14 Thread lei liu

On April 11 hadoop-2.4 is released, the hadoop-2.4 does not include heterogeneous storages function, when does hadoop include the function? Thanks, LiuLei

unsubscribe

2014-04-14 Thread Hansi Klose

Hive install under hadoop

2014-04-14 Thread EdwardKing

I want to use hive in hadoop2.2.0, so I execute following steps: [hadoop@master /]$ tar �Cxzf hive-0.11.0.tar.gz [hadoop@master /]$ export HIVE_HOME=/home/software/hive [hadoop@master /]$ export PATH=${HIVE_HOME}/bin:${PATH} [hadoop@master /]$ hadoop fs -mkdir /tmp [hadoop@master /]$ hadoop fs -m

Re: hadoop inconsistent behaviour

2014-04-14 Thread Gordon Wang

You can find the reduce container from RM's web page. BTW: from above log, you can check if application master crashes. On Mon, Apr 14, 2014 at 3:12 PM, Rahul Singh wrote: > how do i identify an reduce container? there are multiple container dirs > in my application id folder in userlogs. > > >

Re: hadoop inconsistent behaviour

2014-04-14 Thread Rahul Singh

how do i identify an reduce container? there are multiple container dirs in my application id folder in userlogs. On Mon, Apr 14, 2014 at 12:29 PM, Gordon Wang wrote: > Hi Rahul, > > What is the log of reduce container ? Please paste the log and we can see > the reason. > > > On Mon, Apr 14, 20

Re: hadoop inconsistent behaviour

2014-04-14 Thread Rahul Singh

I cleaned up the log directory before running the job. Now there is no nodemanager jobs. when i see in userlogs directory i am getting some syslog files with the following error: 2014-04-14 11:58:23,472 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: poc-hadoop06/127.0.1.1:40

Re: Resetting dead datanodes list

2014-04-14 Thread Sandeep Nemuri

You can add the hostname/IP in exclude file and run this command hadoop dfsadmin -refreshNodes. On Mon, Apr 14, 2014 at 11:34 AM, Stanley Shi wrote: > I believe there's some command to show list of datanodes from CLI, using > parsing HTML is not a good idea. HTML page is intended to be read by

Re: HDFS file system size issue

2014-04-14 Thread Sandeep Nemuri

Please check your logs directory usage. On Mon, Apr 14, 2014 at 12:08 PM, Biswajit Nayak wrote: > Whats the replication factor you have? I believe it should be 3. hadoop > dus shows that disk usage without replication. While name node ui page > gives with replication. > > 38gb * 3 =114gb ~ 1TB

Re: hadoop inconsistent behaviour

2014-04-14 Thread Gordon Wang

Hi Rahul, What is the log of reduce container ? Please paste the log and we can see the reason. On Mon, Apr 14, 2014 at 2:38 PM, Rahul Singh wrote: > Hi, > I am running a job(wordcount example) on 3 node cluster(1 master and 2 > slave), some times the job passes but some times it fails(as red

63 matches

Mail list logo