Niels,
I am not sure I can help with that unless I know better what "a special
distribution" means. Unless you are doing a massive amount of processing in
your reducer having a partition that is only close to balancing the
distribution is a big win over all of the other options that put the d
Hi Robert,
On Tue, Feb 28, 2012 at 21:41, Robert Evans wrote:
> I would recommend that you do what terrasort does and use a different
> partitioner, to ensure that all keys within a given range will go to a
> single reducer. If your partitioner is set up correctly then all you have
> to do is
I would recommend that you do what terrasort does and use a different
partitioner, to ensure that all keys within a given range will go to a single
reducer. If your partitioner is set up correctly then all you have to do is to
concatenate the files together, if you even need to do that.
Look a
On 02/27/2012 11:33 PM, Stuti Awasthi wrote:
Hi Marcos,
Thanks for the pointers. I am also thinking on the similar lines.
I am doubtful at 1 point :
I will be having separate data files for every interval. Let's take example if
I have 5 mins interval file which contain data for 2 hours and 10
Hi,
We have a job that outputs a set of files that are several hundred MB of
text each.
Using the comparators and such we can produce output files that are each
sorted by themselves.
What we want is to have one giant outputfile (outside of the cluster) that
is sorted.
Now we see the following op
OK I'm not sure if there's a better way, but at least you can write a shell
script to combine "job -history" and "job -list", like:
foreach `hadoop job -list`
hadoop job -history $i
Jie
On Tue, Feb 28, 2012 at 10:47 AM, Pedro Costa wrote:
> hadoop job -list" will only list the JobId Stat
Hi,
Some time ago I had an idea and implemented it.
Normally you can only run a single gzipped input file through a single
mapper and thus only on a single CPU core.
What I created makes it possible to process a Gzipped file in such a way
that it can run on several mappers in parallel.
I've put
hadoop job -list" will only list the JobId State StartTime
UserNamePrioritySchedulingInfo.
The job history will list in detail the time spent on each phase of the
Job. The problem is that, if I've a list of job that completed, the job
history only prints the details the first j
Try "hadoop job -list" :)
Jie
On Tue, Feb 28, 2012 at 8:37 AM, Pedro Costa wrote:
> Hi,
>
> In MapReduce the command bin/hadoop -job history only list
> the first job. How can I list the history of all jobs?
>
> --
> Best regards,
>
>
Hi,
I am a committer on MRUnit. I'd love to help you use it. We have a
user-list which you can subscribe to here:
http://incubator.apache.org/mrunit/mail-lists.html
Cheers,
Brock
On Tue, Feb 28, 2012 at 1:02 PM, Akhtar Muhammad Din
wrote:
> Yes, I have checked it before, there is only single
10 matches
Mail list logo