Re: Slow read from S3 on CDH 5.8.0 (includes HADOOP-12346)

2016-08-20 Thread max scalf
Just out of curiosity, have you enabled S3 endpoint for this ? Hopefully u are running this cluster inside a VPC, if so an endpoint would help as the S3 traffic will not go out to the Internet... Any new policies put in place for your S3 bucket as others have mentioned something about throttling

Re: HDFS backup to S3

2016-06-15 Thread max scalf
ver, that would be a code change in DistCp, and not as easy as a > script. But that would address the scalability issue that you are worried > about. > > > > Thanks > > Anu > > > > > > > > *From: *max scalf <oracle.bl...@gmail.com> > *Date

HDFS backup to S3

2016-06-15 Thread max scalf
Hello Hadoop community, we are running hadoop in AWS(not EMR) but hortonworks distro on EC2 instance. Everything is all setup and working as expected. Our design calls for running HDFS/data nodes on local/ephemeral storage and we have 3X replication enabled by default, all of the metastore

Re: HDFS how to specify the exact datanode to put data on?

2015-07-20 Thread max scalf
May I ask why you need to do that? Y not let Hadoop handle that for u? On Sunday, July 19, 2015, Shiyao Ma i...@introo.me wrote: Hi, I'd like to put my data selectively on some datanodes. Currently I can do that by shutting down un-needed datanodes. But this is a little laborsome. Is

Re: copy data from one hadoop cluster to another hadoop cluster + cant use distcp

2015-06-19 Thread max scalf
Not to hijack this post but how would you deal with data that is maintained by hive(Orc format file, hive created tables etc..)...Would we copy the hivemetastore(MySQL) and move that over to new cluster? On Friday, June 19, 2015, Joep Rottinghuis jrottingh...@gmail.com wrote: You can't set up a

Re: Swap requirements

2015-03-25 Thread max scalf
Thank you harsh. Can you please explain what you mean when u said Just simple virtual memory used by the process ? Doesn't virtual memory means swap? On Wednesday, March 25, 2015, Harsh J ha...@cloudera.com wrote: The suggestion (regarding swappiness) is not for disabling swap as much as it

Re: AWS Private and Public Ip

2015-03-13 Thread max scalf
you will get the the private ip to work until and unless you are in your VPC connected to a VPN or a direct connect. For what you are doing, i would use the public IP that should work just fine. On Fri, Mar 13, 2015 at 3:00 PM, Krish Donald gotomyp...@gmail.com wrote: Hi, I am using Elastic

Re: Not able to ping AWS host

2015-03-10 Thread max scalf
On Mon, Mar 9, 2015 at 5:15 PM, max scalf oracle.bl...@gmail.com wrote: when you say the security group has all open ports, is that open to public (0.0.0.0) or to your specific IP(if so is ur ip correct)? also are the instance inside of a VPC ?? On Mon, Mar 9, 2015 at 5:05 PM, Krish Donald gotomyp

Re: Not able to ping AWS host

2015-03-10 Thread max scalf
Destination Target 172.31.0.0/16 local 0.0.0.0/0 igw-6d16cxxx On Tue, Mar 10, 2015 at 6:47 AM, max scalf oracle.bl...@gmail.com wrote: inside your VPC -- subnet -- does the route table have a internet gateway attached(that should have a gateway of 0.0.0.0/0 as well)... On Mon, Mar 9, 2015

Re: What skills to Learn to become Hadoop Admin

2015-03-09 Thread max scalf
at 9:32 AM, max scalf oracle.bl...@gmail.com wrote: Krish, I dont mean to hijack your mail here but i wanted to find out how/what you did for the below portion, as i am trying to go down your path as well, i was able to get 4-5 node cluster using ambari and cdh and now wanted to take it to next

Re: sorting in hive -- general

2015-03-08 Thread max scalf
by in hive works different from terasort. In case of terasort you can merge output files and get one file with globally sorted data. On Sun, Mar 8, 2015 at 7:55 AM, max scalf oracle.bl...@gmail.com wrote: Thank you Alexander. So is it fair to assume when sort by is used and multiple files

Re: sorting in hive -- general

2015-03-08 Thread max scalf
(PARTITION BY A ORDER BY B) On Sat, Mar 7, 2015 at 3:02 PM, max scalf oracle.bl...@gmail.com wrote: Hello all, I am a new to hadoop and hive in general and i am reading hadoop the definitive guide by Tom White and on page 504 for the hive chapter, Tom says below with regards to soritng

Re: What skills to Learn to become Hadoop Admin

2015-03-07 Thread max scalf
Krish, I dont mean to hijack your mail here but i wanted to find out how/what you did for the below portion, as i am trying to go down your path as well, i was able to get 4-5 node cluster using ambari and cdh and now wanted to take it to next level. What have you done for below? I have done a

sorting in hive -- general

2015-03-07 Thread max scalf
Hello all, I am a new to hadoop and hive in general and i am reading hadoop the definitive guide by Tom White and on page 504 for the hive chapter, Tom says below with regards to soritng *Sorting and Aggregating* *Sorting data in Hive can be achieved by using a standard ORDER BY clause. ORDER BY

Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?

2015-03-06 Thread max scalf
@jonathan, I totaly agree that this is reinventing the wheel, but think about the folks who wants to do this setup from scratch to better under hadoop or maybe those folks who are going to do admin realted work...and hence the need to setting is up from scratch... @alexandar, Yes you are right,

Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?

2015-03-05 Thread max scalf
Here is a easy way to go about assigning static name to your ec2 instance. When you get the launch an EC2-instance from aws console when you get to the point of selecting VPC, ip address screen there is a screen that says USER DATA...put the below in with appropriate host name(change

Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?

2015-03-05 Thread max scalf
, Alexander Pivovarov apivova...@gmail.com wrote: what about DNS? if you have 2 computers (nn and dn) how nn knows dn ip? The script puts only this computer ip to /etc/hosts On Thu, Mar 5, 2015 at 6:39 PM, max scalf oracle.bl...@gmail.com wrote: Here is a easy way to go about assigning static