Need some quick help. Our job runs fine under MapR, but when we start the
same job on Cloudera 5.1, it keeps running in Local mode.
I am sure this is some kind of configuration issue. Any quick tips?
14/08/22 12:16:58 INFO mapreduce.Job: map 0% reduce 0%
14/08/22 12:17:03 INFO mapred.LocalJobRun
I would like to trigger a few Hadoop jobs simultaneously. I've created a
pool of threads using Executors.newFixedThreadPool. Idea is that if the
pool size is 2, my code will trigger 2 Hadoop jobs at the same exact time
using 'ToolRunner.run'. In my testing, I noticed that these 2 threads keep
st
234
Does this calculation look right?
On Wed, Jul 31, 2013 at 10:28 AM, John Meagher wrote:
> It is file size based, not file count based. For fewer files up the
> max-file-blocks setting.
>
> On Wed, Jul 31, 2013 at 12:21 PM, Something Something
> wrote:
> > Thanks, John. But
tps://github.com/edwardcapriolo/filecrush
>
> On Wed, Jul 31, 2013 at 2:40 AM, Something Something
> wrote:
> > Each bz2 file after merging is about 50Megs. The reducers take about 9
> > minutes.
> >
> > Note: 'getmerge' is not an option. There isn't enou
Jul 30, 2013 at 10:34 PM, Ben Juhn wrote:
> How big are your 50 files? How long are the reducers taking?
>
> On Jul 30, 2013, at 10:26 PM, Something Something <
> mailinglist...@gmail.com> wrote:
>
> > Hello,
> >
> > One of our pig scripts creates over 500 s
Hello,
One of our pig scripts creates over 500 small part files. To save on
namespace, we need to cut down the # of files, so instead of saving 500
small files we need to merge them into 50. We tried the following:
1) When we set parallel number to 50, the Pig script takes a long time -
for ob
Hello,
I am running a Pig script which internally starts several jobs. For one of
the jobs that uses maximum no. of mappers & reducers, I need to find out
how many machines it's running on & which machines are those.
I looked around the JobTracker UI, but couldn't find this information. Is
it t
g to bulkload anyway (which requires Put or KeyValue values,
> both of which you can get the size from).
>
> On Sun, May 13, 2012 at 2:11 AM, Something Something <
> mailinglist...@gmail.com> wrote:
>
> > Is there no way to find out inside a single redu
alue in a row until the size reached a
> certain limit.
>
> On Sat, May 12, 2012 at 7:21 PM, Something Something <
> mailinglist...@gmail.com> wrote:
>
> > Hello,
> >
> > This is really a MapReduce question, but the output from this will be
> used
&g
All monitoring browser ports.. such as
On Sun, Jan 15, 2012 at 5:00 PM, Lance Norskog wrote:
> Can you open all of the monitoring browser ports?
>
> On Sun, Jan 15, 2012 at 3:03 PM, Something Something
> wrote:
> > Good point. Those ports may not be open. So next que
#x27;s why we are using EC2 -:)
On Sun, Jan 15, 2012 at 12:03 PM, Ronald Petty wrote:
> Something Something,
>
> Have you confirmed you can connect to the port from your remote machine?
>
> telnet ec2-xx 9000
>
> Kindest regards.
>
> Ron
>
> On Sun, Ja
source and you'll
> understand what I mean by it not being a stage or even an event, but
> just a tail call after all map()s are called.
>
> On Fri, Nov 18, 2011 at 8:58 PM, Something Something
> wrote:
> > Thanks again for the clarification. Not sure what you mean
the 'cleanup', our job completes in 18 minutes. When we don't write in
'cleanup' it takes 3 hours!!! Knowing this if you were to decide, would
you use 'cleanup' for this purpose?
Thanks once again for your advice.
On Thu, Nov 17, 2011 at 9:35 PM, Harsh J wro
.
>
> On Thu, Nov 17, 2011 at 9:53 AM, Something Something
> wrote:
> > Is the idea of writing business logic in cleanup method of a Mapper good
> or
> > bad? We think we can make our Mapper run faster if we keep accumulating
> > data in a HashMap in a Mapper, and
Is the idea of writing business logic in cleanup method of a Mapper good or
bad? We think we can make our Mapper run faster if we keep accumulating
data in a HashMap in a Mapper, and later in the cleanup() method write it.
1) Does Map/Reduce paradigm guarantee that cleanup will always be called
s in them,
> but if it does, it could be the cause of your problem.
>
> Try checking your jar for a duplicate license dir in the META-INF
> (something like: unzip -l .jar | awk '{print $4}' | sort |
> uniq -d)
>
>
> Friso
>
>
> On 16 nov. 2011, at 17:
machines by
> default). Pre-distributing sounds tedious and error prone to me. What if
> you have different jobs that require different versions of the same
> dependency?
>
>
> HTH,
> Friso
>
>
>
>
>
> On 16 nov. 2011, at 15:42, Something Something wro
r patience & help with our questions.
On Wed, Nov 16, 2011 at 6:29 AM, Something Something <
mailinglist...@gmail.com> wrote:
> Hmm... there must be a different way 'cause we don't need to do that to
> run Pig jobs.
>
>
> On Tue, Nov 15, 2011 at 10:58 PM, Daa
the machine once
> the job starts. Is that an option?
>
> Daan.
>
> On 16 Nov 2011, at 07:24, Something Something wrote:
>
> > Until now we were manually copying our Jars to all machines in a Hadoop
> > cluster. This used to work until our cluster size was small. Now our
&
Until now we were manually copying our Jars to all machines in a Hadoop
cluster. This used to work until our cluster size was small. Now our
cluster is getting bigger. What's the best way to start a Hadoop Job that
automatically distributes the Jar to all machines in a cluster?
I read the doc a
20 matches
Mail list logo