Re: What skills to Learn to become Hadoop Admin

2015-03-07 Thread jay vyas
Setting up vendor distros is a great first step. 1) Running TeraSort and benchmarking is a good step. You can also run larger, full stack hadoop applications like bigpetstore, which we curate here : https://github.com/apache/bigtop/tree/master/bigtop-bigpetstore/. 2) Write some mapreduce or

Re: What skills to Learn to become Hadoop Admin

2015-03-07 Thread max scalf
Krish, I dont mean to hijack your mail here but i wanted to find out how/what you did for the below portion, as i am trying to go down your path as well, i was able to get 4-5 node cluster using ambari and cdh and now wanted to take it to next level. What have you done for below? I have done a

Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?

2015-03-07 Thread tesm...@gmail.com
Dear Jonathan, Would you please describe the process of running EMR based Hadoop for $15.00, I tried and my cost were rocketing like $60 for one hour. Regards On 05/03/2015 23:57, Jonathan Aquilina wrote: krish EMR wont cost you much with all the testing and data we ran through the test

Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?

2015-03-07 Thread Jonathan Aquilina
When i was testing I was using default setup 1 master node 2 core and no task nodes. i would spiin up the cluster then terminate it. The term for that is a transient cluster. When the big data was needing to be crunched i changed the setup a bit. An Important note there is a limitation of 20

Snappy Configuration in Hadoop2.5.2

2015-03-07 Thread donhoff_h
Hi, experts. I meet the following problem when configuring the Snappy lib in Hadoop2.5.2 My snappy installation home is /opt/snappy My Hadoop installation home is /opt/hadoop/hadoophome To configure the snappy path, I tried to add the following environment variables in /etc/profile and

sorting in hive -- general

2015-03-07 Thread max scalf
Hello all, I am a new to hadoop and hive in general and i am reading hadoop the definitive guide by Tom White and on page 504 for the hive chapter, Tom says below with regards to soritng *Sorting and Aggregating* *Sorting data in Hive can be achieved by using a standard ORDER BY clause. ORDER BY

Re: sorting in hive -- general

2015-03-07 Thread Alexander Pivovarov
sort by query produces multiple independent files. order by - just one file usually sort by is used with distributed by. In older hive versions (0.7) they might be used to implement local sort within partition similar to RANK() OVER (PARTITION BY A ORDER BY B) On Sat, Mar 7, 2015 at 3:02 PM,