Job keeps running in LocalJobRunner under Cloudera 5.1

2014-08-22 Thread Something Something
Need some quick help. Our job runs fine under MapR, but when we start the same job on Cloudera 5.1, it keeps running in Local mode. I am sure this is some kind of configuration issue. Any quick tips? 14/08/22 12:16:58 INFO mapreduce.Job: map 0% reduce 0% 14/08/22 12:17:03 INFO mapred.LocalJobRun

Is Hadoop's TooRunner thread-safe?

2014-03-17 Thread Something Something
I would like to trigger a few Hadoop jobs simultaneously. I've created a pool of threads using Executors.newFixedThreadPool. Idea is that if the pool size is 2, my code will trigger 2 Hadoop jobs at the same exact time using 'ToolRunner.run'. In my testing, I noticed that these 2 threads keep st

Re: Merging files

2013-07-31 Thread Something Something
234 Does this calculation look right? On Wed, Jul 31, 2013 at 10:28 AM, John Meagher wrote: > It is file size based, not file count based. For fewer files up the > max-file-blocks setting. > > On Wed, Jul 31, 2013 at 12:21 PM, Something Something > wrote: > > Thanks, John. But

Re: Merging files

2013-07-31 Thread Something Something
tps://github.com/edwardcapriolo/filecrush > > On Wed, Jul 31, 2013 at 2:40 AM, Something Something > wrote: > > Each bz2 file after merging is about 50Megs. The reducers take about 9 > > minutes. > > > > Note: 'getmerge' is not an option. There isn't enou

Re: Merging files

2013-07-30 Thread Something Something
Jul 30, 2013 at 10:34 PM, Ben Juhn wrote: > How big are your 50 files? How long are the reducers taking? > > On Jul 30, 2013, at 10:26 PM, Something Something < > mailinglist...@gmail.com> wrote: > > > Hello, > > > > One of our pig scripts creates over 500 s

Merging files

2013-07-30 Thread Something Something
Hello, One of our pig scripts creates over 500 small part files. To save on namespace, we need to cut down the # of files, so instead of saving 500 small files we need to merge them into 50. We tried the following: 1) When we set parallel number to 50, the Pig script takes a long time - for ob

How many machines did my MR job used?

2013-06-20 Thread Something Something
Hello, I am running a Pig script which internally starts several jobs. For one of the jobs that uses maximum no. of mappers & reducers, I need to find out how many machines it's running on & which machines are those. I looked around the JobTracker UI, but couldn't find this information. Is it t

Re: MR job for creating splits

2012-05-13 Thread Something Something
g to bulkload anyway (which requires Put or KeyValue values, > both of which you can get the size from). > > On Sun, May 13, 2012 at 2:11 AM, Something Something < > mailinglist...@gmail.com> wrote: > > > Is there no way to find out inside a single redu

Re: MR job for creating splits

2012-05-12 Thread Something Something
alue in a row until the size reached a > certain limit. > > On Sat, May 12, 2012 at 7:21 PM, Something Something < > mailinglist...@gmail.com> wrote: > > > Hello, > > > > This is really a MapReduce question, but the output from this will be > used &g

Re: Starting Map Reduce Job on EC2

2012-01-15 Thread Something Something
All monitoring browser ports.. such as On Sun, Jan 15, 2012 at 5:00 PM, Lance Norskog wrote: > Can you open all of the monitoring browser ports? > > On Sun, Jan 15, 2012 at 3:03 PM, Something Something > wrote: > > Good point. Those ports may not be open. So next que

Re: Starting Map Reduce Job on EC2

2012-01-15 Thread Something Something
#x27;s why we are using EC2 -:) On Sun, Jan 15, 2012 at 12:03 PM, Ronald Petty wrote: > Something Something, > > Have you confirmed you can connect to the port from your remote machine? > > telnet ec2-xx 9000 > > Kindest regards. > > Ron > > On Sun, Ja

Re: Business logic in cleanup?

2011-11-18 Thread Something Something
source and you'll > understand what I mean by it not being a stage or even an event, but > just a tail call after all map()s are called. > > On Fri, Nov 18, 2011 at 8:58 PM, Something Something > wrote: > > Thanks again for the clarification. Not sure what you mean

Re: Business logic in cleanup?

2011-11-18 Thread Something Something
the 'cleanup', our job completes in 18 minutes. When we don't write in 'cleanup' it takes 3 hours!!! Knowing this if you were to decide, would you use 'cleanup' for this purpose? Thanks once again for your advice. On Thu, Nov 17, 2011 at 9:35 PM, Harsh J wro

Re: Business logic in cleanup?

2011-11-17 Thread Something Something
. > > On Thu, Nov 17, 2011 at 9:53 AM, Something Something > wrote: > > Is the idea of writing business logic in cleanup method of a Mapper good > or > > bad? We think we can make our Mapper run faster if we keep accumulating > > data in a HashMap in a Mapper, and

Business logic in cleanup?

2011-11-16 Thread Something Something
Is the idea of writing business logic in cleanup method of a Mapper good or bad? We think we can make our Mapper run faster if we keep accumulating data in a HashMap in a Mapper, and later in the cleanup() method write it. 1) Does Map/Reduce paradigm guarantee that cleanup will always be called

Re: Distributing our jars to all machines in a cluster

2011-11-16 Thread Something Something
s in them, > but if it does, it could be the cause of your problem. > > Try checking your jar for a duplicate license dir in the META-INF > (something like: unzip -l .jar | awk '{print $4}' | sort | > uniq -d) > > > Friso > > > On 16 nov. 2011, at 17:

Re: Distributing our jars to all machines in a cluster

2011-11-16 Thread Something Something
machines by > default). Pre-distributing sounds tedious and error prone to me. What if > you have different jobs that require different versions of the same > dependency? > > > HTH, > Friso > > > > > > On 16 nov. 2011, at 15:42, Something Something wro

Re: Distributing our jars to all machines in a cluster

2011-11-16 Thread Something Something
r patience & help with our questions. On Wed, Nov 16, 2011 at 6:29 AM, Something Something < mailinglist...@gmail.com> wrote: > Hmm... there must be a different way 'cause we don't need to do that to > run Pig jobs. > > > On Tue, Nov 15, 2011 at 10:58 PM, Daa

Re: Distributing our jars to all machines in a cluster

2011-11-16 Thread Something Something
the machine once > the job starts. Is that an option? > > Daan. > > On 16 Nov 2011, at 07:24, Something Something wrote: > > > Until now we were manually copying our Jars to all machines in a Hadoop > > cluster. This used to work until our cluster size was small. Now our &

Re: Distributing our jars to all machines in a cluster

2011-11-15 Thread Something Something
Until now we were manually copying our Jars to all machines in a Hadoop cluster. This used to work until our cluster size was small. Now our cluster is getting bigger. What's the best way to start a Hadoop Job that automatically distributes the Jar to all machines in a cluster? I read the doc a