Re: How to stop a mapreduce job from terminal running on Hadoop Cluster?

2015-04-12 Thread Pradeep Gollakota
Also, mapred job -kill On Sun, Apr 12, 2015 at 11:07 AM, Shahab Yunus wrote: > You can kill t by using the following yarn command > > yarn application -kill > > https://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/YarnCommands.html > > Or use old hadoop job command > http://stack

Re: Custom FileInputFormat.class

2014-12-01 Thread Pradeep Gollakota
Can you expand on your use case a little bit please? It may be that you're duplicating functionality. You can take a look at the CombineFileInputFormat for inspiration. If this is indeed taking a long time, one cheap to implement thing you can do is to parallelize the calls to get block locations.

Re: [Blog] Doubts On CCD-410 Sample Dumps on Ecosystem Projects

2014-10-06 Thread Pradeep Gollakota
are log files, NOT database files with a specific > schema. So > ​ I think​ > Pig is the best way to access and process this data. > > On Tue, Oct 7, 2014 at 4:10 AM, Pradeep Gollakota > wrote: > >> I agree with the answers suggested above. >> >> 3.

Re: [Blog] Doubts On CCD-410 Sample Dumps on Ecosystem Projects

2014-10-06 Thread Pradeep Gollakota
I agree with the answers suggested above. 3. B 4. D 5. C On Mon, Oct 6, 2014 at 2:58 PM, Ulul wrote: > Hi > > No, Pig is a data manipulation language for data already in Hadoop. > The question is about importing data from OLTP DB (eg Oracle, MySQL...) to > Hadoop, this is what Sqoop is for (SQ

Re: datanode down, disk replaced , /etc/fstab changed. Can't bring it back up. Missing lock file?

2014-10-03 Thread Pradeep Gollakota
Looks like you're facing the same problem as this SO. http://stackoverflow.com/questions/10705140/hadoop-datanode-fails-to-start-throwing-org-apache-hadoop-hdfs-server-common-sto Try the suggested fix. On Fri, Oct 3, 2014 at 6:57 PM, Colin Kincaid Williams wrote: > We had a datanode go down, an

Re: Block placement without rack aware

2014-10-02 Thread Pradeep Gollakota
It appears to be randomly chosen. I just came across this blog post from Lars George about HBase file locality in HDFS http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html On Thu, Oct 2, 2014 at 4:12 PM, SF Hadoop wrote: > What is the block placement policy hadoop follows when rack

Rolling upgrades

2014-08-01 Thread Pradeep Gollakota
Hi All, Is it possible to do a rolling upgrade from Hadoop 2.2 to 2.4? Thanks, Pradeep

Re: YARN creates only 1 container

2014-05-27 Thread Pradeep Gollakota
I believe it's behaving as expected. It will spawn 64 containers because that's how much memory you have available. The vcores isn't harshly enforced since CPUs can be elastic. This blog from cloudera explain how to enforce CPU limits using CGroups. http://blog.cloudera.com/blog/2013/12/managing-mu

Re: change the Yarn application container memory size when it is running

2014-02-10 Thread Pradeep Gollakota
I'm not sure I understand the use case for something like that. I'm pretty sure the YARN API doesn't support it though. What you might be able to do is to tear down your existing container and request a new one. On Mon, Feb 10, 2014 at 10:28 AM, Thomas Bentsen wrote: > I am no Yarn expert at al

Re: issue about datanode memory usage

2013-12-09 Thread Pradeep Gollakota
Good question... answer is: it depends. I *think* Oracle JDK applies the rightmost. See this SO for more info. http://stackoverflow.com/questions/2740725/duplicated-java-runtime-options-what-is-the-order-of-preference On Mon, Dec 9, 2013 at 11:14 PM, ch huang wrote: > hi,maillist: >

Re: even possible?

2013-10-16 Thread Pradeep Gollakota
Don't fix it if it ain't broken =P There shouldn't be any reason why you couldn't change it (back) to the standard way that cloudera distributions are set up. Off the top of my head, I can't think of anything that you're missing. But at the same time, if your cluster is working as is, why change i

Re: Yarn killing my Application Master

2013-10-14 Thread Pradeep Gollakota
sfully registered with RM > > > On Fri, Oct 11, 2013 at 3:53 PM, Pradeep Gollakota > wrote: > >> All, >> >> I have a Yarn application that is launching a single container. The >> container completes successfully but the application fails because the node >>

Re: State of Art in Hadoop Log aggregation

2013-10-10 Thread Pradeep Gollakota
There are plenty of log aggregation tools both open source and commercial off the shelf. Here's some http://devopsangle.com/2012/04/19/8-splunk-alternatives/ My personal recommendation is LogStash. On Thu, Oct 10, 2013 at 10:38 PM, Raymond Tay wrote: > You can try Chukwa which is part of the in

Re: Improving MR job disk IO

2013-10-10 Thread Pradeep Gollakota
ing job > provides (1) Much better disk throughput and, (2) CPU load is almost evenly > spread across all cores/threads (no CPU gets pegged to 100%). > > > > > On Thu, Oct 10, 2013 at 11:15 AM, Pradeep Gollakota > wrote: > >> Actually... I believe that is expected behav

Re: Improving MR job disk IO

2013-10-10 Thread Pradeep Gollakota
Actually... I believe that is expected behavior. Since your CPU is pegged at 100% you're not going to be IO bound. Typically jobs tend to be CPU bound or IO bound. If you're CPU bound you expect to see low IO throughput. If you're IO bound, you expect to see low CPU usage. On Thu, Oct 10, 2013 at

Re: modify HDFS

2013-10-02 Thread Pradeep Gollakota
Since hadoop 3.0 is 2 major versions higher, it will be significantly different than working with hadoop 1.1.2. The hadoop-1.1 branch is available on SVN at http://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1/ On Tue, Oct 1, 2013 at 11:30 PM, Karim Awara wrote: > Hi all, > > My p

Re: IncompatibleClassChangeError

2013-09-29 Thread Pradeep Gollakota
29 PM, lei liu wrote: > Yes, My job is compiled in CHD3u3, and I run the job on CDH4.3.1, but I > use the mr1 of CHD4.3.1 to run the job. > > What are the different mr1 of cdh4 and mr of cdh3? > > Thanks, > > LiuLei > > > 2013/9/30 Pradeep Gollakota > >>

Re: IncompatibleClassChangeError

2013-09-29 Thread Pradeep Gollakota
I believe it's a difference between the version that your code was compiled against vs the version that you're running against. Make sure that you're not packaging hadoop jar's into your jar and make sure you're compiling against the correct version as well. On Sun, Sep 29, 2013 at 7:27 PM, lei l

Re: Help writing a YARN application

2013-09-23 Thread Pradeep Gollakota
> > thanks, > Arun > > On Sep 20, 2013, at 11:24 AM, Pradeep Gollakota > wrote: > > Hi All, > > I've been trying to write a Yarn application and I'm completely lost. I'm > using Hadoop 2.0.0-cdh4.4.0 (Cloudera distribution). I've uploa

Re: How to best decide mapper output/reducer input for a huge string?

2013-09-22 Thread Pradeep Gollakota
ow do I make my >> map output compressed? Yes, the Tables in HBase are compressed. >> >> Although, there's no real bottleneck, the time it takes to process the >> entire table is huge. I have to constantly check if i can optimize it >> somehow.. **** >> >&

Re: How to best decide mapper output/reducer input for a huge string?

2013-09-21 Thread Pradeep Gollakota
gt; > I have looked at both Pig/Hive to do the job but i'm supposed to do this > via a MR job.. So, cannot use either of that.. Do you recommend me to try > something if i have the data in that format? > > > On Sat, Sep 21, 2013 at 12:26 PM, Pradeep Gollakota > wrot

Re: How to best decide mapper output/reducer input for a huge string?

2013-09-20 Thread Pradeep Gollakota
I'm sorry but I don't understand your question. Is the output of the mapper you're describing the key portion? If it is the key, then your data should already be sorted by HouseHoldId since it occurs first in your key. The SortComparator will tell Hadoop how to sort your data. So you use this if y

Help writing a YARN application

2013-09-20 Thread Pradeep Gollakota
Hi All, I've been trying to write a Yarn application and I'm completely lost. I'm using Hadoop 2.0.0-cdh4.4.0 (Cloudera distribution). I've uploaded my sample code to github at https://github.com/pradeepg26/sample-yarn The problem is that my application master is exiting with a status of 1 (I'm e