JournalNode desynchronized

2013-01-30 Thread Marcin Cylke
Hi I had a failure of one of the machines my JournalNode is running on. I've restored that machine's setup and would like to attach her to the existing JournalNode Quorum. When I try to run it I get the following error: ERROR org.apache.hadoop.security.UserGroupInformation:

Maximum Storage size in a Single datanode

2013-01-30 Thread jeba earnest
Hi, Is it possible to keep 1 Petabyte in a single data node? If not, How much is the maximum storage for a particular data node?  Regards, M. Jeba

Re: Maximum Storage size in a Single datanode

2013-01-30 Thread Bertrand Dechoux
I would say the hard limit is due to the OS local file system (and your budget). So short answer for ext3 : it doesn't seems so. http://en.wikipedia.org/wiki/Ext3 And I am not sure the answer is the most interesting. Even if you could put 1 Peta on one node, what is usually interesting is the

RE: Maximum Storage size in a Single datanode

2013-01-30 Thread Vijay Thakorlal
Hi Jeba, There are other considerations too, for example, if a single node holds 1 PB of data and if it were to die this would cause a significant amount of traffic as NameNode arranges for new replicas to be created. Vijay From: Bertrand Dechoux [mailto:decho...@gmail.com] Sent: 30

Re: Maximum Storage size in a Single datanode

2013-01-30 Thread Pamecha, Abhishek
What would be the reason you would do that? You would want to leverage distributed dataset for higher availability and better response times. The maximum storage depends completely on the disks capacity of your nodes and what your OS supports. Typically I have heard of about 1-2 TB/node to

Re: Maximum Storage size in a Single datanode

2013-01-30 Thread jeba earnest
I want to use either UBUNTU or REDHAT . I just want to know how much storage space we can allocate in a single data node. Is there any limitations in hadoop for storage in single node?   Regards, Jeba From: Pamecha, Abhishek apame...@ebay.com To:

Re: Maximum Storage size in a Single datanode

2013-01-30 Thread Chris Embree
You should probably think about this in a more cluster fashion. A single node with a PB of data is probably not a good allocation of CPU : Disk ration. In addition, you need enough RAM on your NameNode to keep track of all of your blocks. A few nodes with a PB each would quickly drive up NN RAM

Re: Maximum Storage size in a Single datanode

2013-01-30 Thread Mohammad Tariq
I completely agree with everyone in the thread. Perhaps you are not concerned much about the processing part, but it is still not a good idea. Remember the power of Hadoop lies in the principle of divide and rule and you are trying to go against that. On Wednesday, January 30, 2013, Chris Embree

Re: Maximum Storage size in a Single datanode

2013-01-30 Thread Michel Segel
Can you say Centos? :-) Sent from a remote device. Please excuse any typos... Mike Segel On Jan 30, 2013, at 4:21 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi, Also, think about the memory you will need in your DataNode to serve all this data... I'm not sure there is any

negative map input bytes counter?

2013-01-30 Thread Jim Donofrio
I have seen the map input bytes counter go negative temporarily on hadoop 1.x at the beginning of a job. It then corrects itself later in the job and seems to be accurate. Any ideas? http://terrier.org/docs/v2.2.1/hadoop_indexing.html I also saw this behavior in a job output listed at the

RE: Oozie workflow error - renewing token issue

2013-01-30 Thread Corbett Martin
Thanks for the tip. The sqoop command listed in the stdout log file is: sqoop import --driver org.apache.derby.jdbc.ClientDriver --connect jdbc:derby://test-server:1527/mondb

Re: what will happen when HDFS restarts but with some dead nodes

2013-01-30 Thread Jean-Marc Spaggiari
Hi Nan, When the Namenode will EXIT the safemode, you, you can assume that all blocks ARE fully replicated. If the Namenode is still IN safemode that mean that all blocks are NOT fully replicated. JM 2013/1/29, Nan Zhu zhunans...@gmail.com: So, we can assume that all blocks are fully

Re: Tricks to upgrading Sequence Files?

2013-01-30 Thread Terry Healy
AVROs versioning capability might help if that could replace SequenceFile in your workflow. Just a thought. -Terry On 1/29/13 9:17 PM, David Parks wrote: I'll consider a patch to the SequenceFile, if we could manually override the sequence file input Key and Value that's read from the

Re: what will happen when HDFS restarts but with some dead nodes

2013-01-30 Thread Chen He
That is correct if you do not manually exit NN safemode. Regards Chen On Jan 30, 2013 8:59 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Nan, When the Namenode will EXIT the safemode, you, you can assume that all blocks ARE fully replicated. If the Namenode is still IN safemode

Re: How to find Blacklisted Nodes via cli.

2013-01-30 Thread Nitin Pawar
*bin/hadoop dfsadmin -report should give you what you are looking for.* * * *a node is blacklisted only if there are too many failures on a particular node. You can clear it by restarting the particular datanode or tasktracker service. This is for the better performance of your hadoop cluster to

Re: what will happen when HDFS restarts but with some dead nodes

2013-01-30 Thread Harsh J
NN does recalculate new replication work to do due to unavailable replicas (under-replication) when it starts and receives all block reports, but executes this only after out of safemode. When in safemode, across the HDFS services, no mutations are allowed. On Wed, Jan 30, 2013 at 8:34 AM, Nan

Re: what will happen when HDFS restarts but with some dead nodes

2013-01-30 Thread Chen He
Hi Harsh I have a question. How namenode gets out of safemode in condition of data blocks lost, only administrator? Accordin to my experiences, the NN (0.21) stayed in safemode about several days before I manually turn safemode off. There were 2 blocks lost. Chen On Wed, Jan 30, 2013 at 10:27

Re: what will happen when HDFS restarts but with some dead nodes

2013-01-30 Thread Nitin Pawar
following are the configs it looks for . Unless Admin forces it to come out of safenode, it respects below values dfs.namenode.safemode.threshold-pct0.999fSpecifies the percentage of blocks that should satisfy the minimal replication requirement defined by dfs.namenode.replication.min. Values

Re: what will happen when HDFS restarts but with some dead nodes

2013-01-30 Thread Harsh J
Yes, if there are missing blocks (i.e. all replicas lost), and the block availability threshold is set to its default of 0.999f (99.9% availability required), then NN will not come out of safemode automatically. You can control this behavior by configuring dfs.namenode.safemode.threshold. On Wed,

Data migration from one cluster to other running diff. versions

2013-01-30 Thread Siddharth Tiwari
Hi Team, What is the best way to migrate data residing on one cluster to another cluster ? Are there better methods available than distcp ? What if both the clusters running different RPC protocol versions ? ** Cheers !!! Siddharth Tiwari Have a refreshing day !!!

Re: what will happen when HDFS restarts but with some dead nodes

2013-01-30 Thread Nan Zhu
I think Chen is asking replication lost, so, according to Harsh's reply, in safe mode, NN will know all blocks which has less replications than 3(by default setup) but no less than 1, and after getting out from safe mode, it will instruct the real replicating works? Hope I understand it

Re: Data migration from one cluster to other running diff. versions

2013-01-30 Thread Harsh J
DistCp is the fastest option, letting you copy data in parallel. For incompatible RPC versions between different HDFS clusters, the HFTP solution can work (documented on the DistCp manual). On Wed, Jan 30, 2013 at 10:13 PM, Siddharth Tiwari siddharth.tiw...@live.com wrote: Hi Team, What is the

Re: what will happen when HDFS restarts but with some dead nodes

2013-01-30 Thread Bertrand Dechoux
Well, the documentation is more explicite. Specifies the percentage of blocks that should satisfy the minimal replication requirement defined by* dfs.namenode.replication.min*. Which happens to be 1 by default but doesn't need to stay that way. Regards Bertrand On Wed, Jan 30, 2013 at 5:45

Re: Same Map/Reduce works on command BUT it hanges through Oozie withouth any error msg

2013-01-30 Thread Alejandro Abdelnur
Yaotian, *Oozie version? *More details on what exactly is your workflow action (mapred, java, shell, etc) *What is is in the task log of the oozie laucher job for that action? Thx On Fri, Jan 25, 2013 at 10:43 PM, yaotian yaot...@gmail.com wrote: I manually run it in Hadoop. It works. But

Re: Filesystem closed exception

2013-01-30 Thread Alejandro Abdelnur
Hemanth, Is FS caching enabled or not in your cluster? A simple solution would be to modify your mapper code not to close the FS. It will go away when the task ends anyway. Thx On Thu, Jan 24, 2013 at 5:26 PM, Hemanth Yamijala yhema...@thoughtworks.com wrote: Hi, We are noticing a

Re: Oozie workflow error - renewing token issue

2013-01-30 Thread Alejandro Abdelnur
Cobert, [Moving thread to user@oozie.a.o, BCCing common-user@hadoop.a.o] * What version of Oozie are you using? * Is the cluster a secure setup (Kerberos enabled)? * Would you mind posting the complete launcher logs? Thx On Wed, Jan 30, 2013 at 6:14 AM, Corbett Martin comar...@nhin.com

How to Integrate MicroStrategy with Hadoop

2013-01-30 Thread samir das mohapatra
Hi All, I wanted to know how to connect HAdoop with MircoStrategy Any help is very helpfull. Witing for you response Note: Any Url and Example will be really help full for me. Thanks, samir

How to Integrate SAP HANA WITH Hadoop

2013-01-30 Thread samir das mohapatra
Hi all I we need the connectivity of SAP HANA with Hadoop, Do you have any experience with that can you please share some documents and example with me ,so that it will be really help full for me thanks, samir

Re: How to Integrate MicroStrategy with Hadoop

2013-01-30 Thread samir das mohapatra
We are using coludera Hadoop On Thu, Jan 31, 2013 at 2:12 AM, samir das mohapatra samir.help...@gmail.com wrote: Hi All, I wanted to know how to connect HAdoop with MircoStrategy Any help is very helpfull. Witing for you response Note: Any Url and Example will be really help

Re: Oozie workflow error - renewing token issue

2013-01-30 Thread Daryn Sharp
The token renewer needs to be the job tracker principal. I think oozie had mr token hardcoded at one point, but later changed it to use a conf setting. The rest of the log looks very odd - ie. it looks like security is off, but it can't be. It's trying to renew hdfs tokens issued for the hdfs

Problem in reading Map Output file via RecordReaderImmutableBytesWritable, Put

2013-01-30 Thread anil gupta
Hi All, I am using HBase0.92.1. I am trying to break the HBase bulk loading into multiple MR jobs since i want to populate more than one HBase table from a single csv file. I have looked into MultiTableOutputFormat class but i doesnt solve my purpose becasue it does not generates HFile. I

Re: How to find Blacklisted Nodes via cli.

2013-01-30 Thread Hemanth Yamijala
Hi, Part answer: you can get the blacklisted tasktrackers using the command line: mapred job -list-blacklisted-trackers. Also, I think that a blacklisted tasktracker becomes 'unblacklisted' if it works fine after some time. Though I am not very sure about this. Thanks hemanth On Wed, Jan 30,

Fwd: YARN NM containers were killed

2013-01-30 Thread YouPeng Yang
Hi I have posted my question for a day,please can somebody help me to figure out what the problem is. Thank you. regards YouPeng Yang -- Forwarded message -- From: YouPeng Yang yypvsxf19870...@gmail.com Date: 2013/1/30 Subject: YARN NM containers were killed To:

Re: Filesystem closed exception

2013-01-30 Thread Hemanth Yamijala
FS Caching is enabled on the cluster (i.e. the default is not changed). Our code isn't actually mapper code, but a standalone java program being run as part of Oozie. It just seemed confusing and not a very clear strategy to leave unclosed resources. Hence my suggestion to get an uncached FS

Re: What is the best way to load data from one cluster to another cluster (Urgent requirement)

2013-01-30 Thread Satbeer Lamba
I might be wrong but have you considered distcp? On Jan 31, 2013 11:15 AM, samir das mohapatra samir.help...@gmail.com wrote: Hi All, Any one knows, how to load data from one hadoop cluster(CDH4) to another Cluster (CDH4) . They way our project needs are 1) It should be delta load or

Recommendation required for Right Hadoop Distribution (CDH OR HortonWork)

2013-01-30 Thread samir das mohapatra
Hi All, My Company wanted to implement right Distribution for Apache Hadoop for its Production as well as Dev. Can any one suggest me which one will good for future. Hints: They wanted to know both pros and cons. Regards, samir.

Re: What is the best way to load data from one cluster to another cluster (Urgent requirement)

2013-01-30 Thread samir das mohapatra
thanks all. On Thu, Jan 31, 2013 at 11:19 AM, Satbeer Lamba satbeer.la...@gmail.comwrote: I might be wrong but have you considered distcp? On Jan 31, 2013 11:15 AM, samir das mohapatra samir.help...@gmail.com wrote: Hi All, Any one knows, how to load data from one hadoop

Re: Issue with running hadoop program using eclipse

2013-01-30 Thread Mohammad Tariq
Hello Vikas, It clearly shows that the class can not be found. For debugging, you can write your MR job as a standalone java program and debug it. It works. And if you want to just debug your mapper / reducer logic, you should look into using MRUnit. There is a good