NameNode HA from a client perspective

2016-05-03 Thread Cecile, Adam
Hello All, I'd like to have a piece of advice regarding how my HDFS clients should handle the NameNode high availability feature. I have a complete setup running with ZKFC and I can see one active and one standby NameNode. When I kill the active one, the standy gets active and when the origina

Re: Reconfigured Namenode stuck in safemode

2016-05-03 Thread Sumit Kumar
Hello All, We've been experimenting with namenode re-configuration in zookeeper based HA configuration. We've been able to automate the set up part using bigtop scripts. We're trying to use the same setup scripts for re-configuration should one of the namenodes die. I found that these scripts

AD with Hadoop....

2016-05-03 Thread Kumar Jayapal
Hello All, When we configure Active Directory With Hadoop do we have any limitation on user or group naming convention. How long can they be and can they contain characters like . or _ #@$%^&*. Can any one of you point me to the link in which these convention are defined. Thanks Jay

Re: S3 Hadoop FileSystems

2016-05-03 Thread Chris Nauroth
Hello Elliot, You're welcome, and the time was not wasted at all. This is exactly the kind of valuable discussion that we like to share on the user@ list. As an outcome, we now have a more definitive answer about how MD5 verification works in s3a. Thank you for starting the discussion. --Ch

Re: Securing secrets for S3 FileSystems in DistCp

2016-05-03 Thread Elliot West
Hi Larry, Thank you for the JIRA link and description. This is appears to be very relevant to what we're trying to achieve. I'll have a read and try it out. Elliot. On 3 May 2016 at 14:09, Larry McCay wrote: > Hi Elliot - > > You may find the following patch interesting: > https://issues.apac

Re: Securing secrets for S3 FileSystems in DistCp

2016-05-03 Thread Larry McCay
Hi Elliot - You may find the following patch interesting: https://issues.apache.org/jira/browse/HADOOP-12548 This enables the use of the Credential Provider API to protect secrets for the s3a filesystem. The design document attached to it describes how to use it. If you are not using s3a, ther

Re: Guideline on setting Namenode RPC Handler count (client and service)

2016-05-03 Thread Chackravarthy Esakkimuthu
Thanks Brahma for the reply, Will look into the issue you mentioned. (yes we are using 2.6.0 (hdp-2.2)) On Tue, May 3, 2016 at 6:04 PM, Brahma Reddy Battula < brahmareddy.batt...@huawei.com> wrote: > Hope you are using hadoop-2.6 release. > > > > As you are targeting to amount of time it’s getti

Re: Securing secrets for S3 FileSystems in DistCp

2016-05-03 Thread Elliot West
Thanks for your reply. We have IAM users, each with their own sets of keys. Could you explain how I can use roles in this situation? Elliot. On 3 May 2016 at 13:46, Shekhar Sharma wrote: > Have u used IAM (identity access management ) roles ? > On 3 May 2016 18:11, "Elliot West" wrote: > >>

Re: Securing secrets for S3 FileSystems in DistCp

2016-05-03 Thread Shekhar Sharma
Have u used IAM (identity access management ) roles ? On 3 May 2016 18:11, "Elliot West" wrote: > Hello, > > We're currently using DistCp and S3 FileSystems to move data from a > vanilla Apache Hadoop cluster to S3. We've been concerned about exposing > our AWS secrets on our shared, on-premise

Securing secrets for S3 FileSystems in DistCp

2016-05-03 Thread Elliot West
Hello, We're currently using DistCp and S3 FileSystems to move data from a vanilla Apache Hadoop cluster to S3. We've been concerned about exposing our AWS secrets on our shared, on-premise cluster. As a work-around we've patched DistCp to load these secrets from a JCEKS keystore. This seems to w

RE: Guideline on setting Namenode RPC Handler count (client and service)

2016-05-03 Thread Brahma Reddy Battula
Hope you are using hadoop-2.6 release. As you are targeting to amount of time it’s getting processed, your proposed configs options ( ipc.ping.interval and split threshold can be changed) should be fine . I mean to say, 2nd and 3rd options. You can try once, let’s know. Had seen related iss

Re: Guideline on setting Namenode RPC Handler count (client and service)

2016-05-03 Thread Chackravarthy Esakkimuthu
To add more details on why NN startup delayed while setting handler count as 600. We are seeing many duplicate full block reports (FBR) from most of the DN's for long time (around 3 hours since NN startup) even though NN comes out of safe mode in 10 or 15 mins. Since NN comes out of safe mode, dup

Re: S3 Hadoop FileSystems

2016-05-03 Thread Elliot West
Thank you, I had a look at HADOOP-13076 and associated codes snippets in the AWS SDK. I agree that the MD5 check does appear to be taking place after all. I appreciate your efforts in looking into that matter and raising the ticket. Apologies for any time wasting that I may have caused. Cheers -

Any way to merge two large hdfs cluster without copy data?

2016-05-03 Thread raymond
Hey, Seems we are doing a lot of cluster migration and merge works…. Now, we have two large hdfs cluster ( each have PBs data), And we need to merge them into a single one. I know we can do it by distcp data from one cluster to another, and decommission nodes from one cluster and join another

Re: Best way to migrate PB scale data between live cluster?

2016-05-03 Thread raymond
Just to come back with our actual choice FYI. We finally choose to use distcp ver2 to do the migration work ( the edit log approaching we developed by ourselves is not verified, and we need to do it quick, so…) , to minimize possible issues during the migration period. We also utilize the snaps

Re: Best way to migrate PB scale data between live cluster?

2016-05-03 Thread raymond
Just to come back with our actual choice FYI. We finally choose to use distcp ver2 to do the migration work ( the edit log approaching we developed by ourselves is not verified, and we need to do it quick, so…) , to minimize possible issues during the migration period. We also utilize the snaps