Re: Number of Clustering MR-Jobs

Dan Filimon Wed, 27 Mar 2013 23:26:55 -0700

Yes, it des depend on the number of mappers and what Ted suggested
(splitting the input file) worked for me.


Here's [1] the code I used to split a SequenceFile (I wrote so that it
re-splits m files into n files, hence the name).

[1] 
https://github.com/dfilimon/mahout/blob/skm/examples/src/main/java/org/apache/mahout/clustering/streaming/tools/ResplitSequenceFiles.java

On Thu, Mar 28, 2013 at 2:26 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> Your idea that this is related to your single input file is the most likely
> cause.
>
> If your input file is relatively small then splitting it up to force
> multiple mappers is the easiest solution.
>
> If your input file is larger, then you might be able to convince the
> map-reduce framework to use more mappers.
>
> On Wed, Mar 27, 2013 at 6:09 PM, Sebastian Briesemeister <
> sebastian.briesemeis...@unister.de> wrote:
>
>> Yes, correct. It currently starts a single Map task.
>>
>>
>>
>> Ted Dunning <ted.dunn...@gmail.com> schrieb:
>>
>> >Do you mean that it starts a single map task?
>> >
>> >On Wed, Mar 27, 2013 at 5:10 PM, Sebastian Briesemeister <
>> >sebastian.briesemeis...@unister-gmbh.de> wrote:
>> >
>> >> Dear all,
>> >>
>> >> I am trying to start the FuzzyKMeansDriver on a hadoop cluster so
>> >that
>> >> it starts multiple MapReduce-Jobs. However, it always starts just a
>> >> single MR-Job?!
>> >>
>> >> I figured it might be caused by the fact that I generated my input
>> >data
>> >> into a single file using SequenceFile.Writer???
>> >> Or is there another way to influence the number of mapper tasks?
>> >>
>> >> Thanks in advance
>> >> Sebastian
>> >>
>>
>> --
>> Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail
>> gesendet.

Re: Number of Clustering MR-Jobs

Reply via email to