Re: MatrixMultiplicationJob runs with 1 mapper only ?

satish verma Mon, 28 Jan 2013 05:43:48 -0800

I faced this problem too.

Split the seq file in which ur data is there into
Multiple files. Then run the matrix multiplication with the folder as input
. If the folder contains N sequence files, N mappers will be created.




On Monday, 28 January 2013, Sean Owen wrote:

> These are settings to Hadoop, not Mahout. You may need to set them in
> your cluster config. They are still only suggestions.
>
> The question still remains why you think you need several mappers. Why?
>
> On Mon, Jan 28, 2013 at 1:28 PM, Stuti Awasthi <stutiawas...@hcl.com>
> wrote:
> > Hi,
> > I would like to again consolidate all the steps which I performed.
> >
> > Issue : MatrixMultiplication example is getting executed with only 1 map
> task.
> >
> > Steps :
> > 1. I created a file with size 104MB which is divided into 11 blocks with
> size 10MB each. The file contains 200x100000 size of matrix.
> > 2. I exported $MAHOUT_OPTS to the following
> >           $   echo $MAHOUT_OPTS
> >           -Dmapred.min.split.size=10485760 -Dmapred.map.tasks=7
> > 3.  Tried to execute matrix multiplication example using commandline :
> > mahout matrixmult --inputPathA /test/points/matrixA --numRowsA 200
> --numColsA 100000 --inputPathB /test/points/matrixA --numRowsB 200
> --numColsB 100000 --tempDir /test/temp
> >
> > When I check the Jobtracker UI , its shows me following for the running
> job :
> > Running Map Tasks : 1
> > Occupied Map Slots: 1
> >
> > How can I distribute the map task on different mappers for
> MatrixMultiplication Job dynamically.
> > Is it even possible that MatrixMultiplication can run distributedly on
> multiple mappers as it internally uses CompositeInputFormat .
> >
> > Please Suggest
> >
> > Thanks
> > Stuti
> >
> >
> > -----Original Message-----
> > From: Sean Owen [mailto:sro...@gmail.com]
> > Sent: Wednesday, January 23, 2013 6:42 PM
> > To: Mahout User List
> > Subject: Re: MatrixMultiplicationJob runs with 1 mapper only ?
> >
> > Mappers are usually extremely fast since they start themselves on top of
> the data and their job is usually just parsing and emitting key value
> pairs. Hadoop's choices are usually fine.
> >
> > If not it is usually because the mapper is emitting far more data than
> it ingests. Are you computing some kind of Cartesian product of input?
> >
> > That's slow no matter what. More mappers may increase parallelism but
> its still a lot of I/O. Avoid it if you can by sampling or pruning
> unimportant values. Otherwise , try to implement a Combiner.
> > On Jan 23, 2013 12:04 PM, "Jonas Grote" <jfgr...@gmail.com> wrote:
> >
> >> I'd play with the mapred.map.tasks option. Setting it to something
> >> bigger than 1 gave me performance improvements for various hadoop jobs
> >> on my cluster.
> >>
> >>
> >> 2013/1/16 Ashish <paliwalash...@gmail.com>
> >>
> >> > I am afraid I don't know the answer. Need to experiment a bit more.
> >> > I
> >> have
> >> > not used CompositeInputFormat so cannot comment.
> >> >
> >> > Probably, someone else on the ML(Mailing List) would be able to
> >> > guide
> >> here.
> >> >
> >> >
> >> > On Wed, Jan 16, 2013 at 6:01 PM, Stuti Awasthi
> >> > <stutiawas...@hcl.com>
> >> > wrote:
> >> >
> >> > > Thanks Ashish,
> >> > >
> >> > > So according to the link if one is using CompositeInputFormat then
> >> > > it
> >> > will
> >> > > take entire file as Input to a mapper without considering
> >> > > InputSplits/blocksize.
> >> > > If I am understanding it correctly then it is asking to break
> >> > > [Original Input File]->[flie1,file2,....] .
> >> > >
> >> > > So If my file is  [/test/MatrixA] --> [/test/smallfiles/file1,
> >> > > [/test/smallfiles/file2, [/test/smallfiles/file3...............  ]
> >> > >
> >> > > Now will the input path in MatrixMultiplicationJob will be
> >> > > directory
> >> path
> >> > > : /test/smallfiles  ??
> >> > >
> >> > > Will breaking file in such manner will cause problem in
> >> > > algorithmic execution of MR job. Im not sure if output will be
> correct .
> >> > >
> >> > > -----Original Message-----
> >> > > From: Ashish [mailto:paliwalash...@gmail.com]
> >> > > Sent: Wednesday, Januar

Re: MatrixMultiplicationJob runs with 1 mapper only ?

Reply via email to