I faced this problem too. Split the seq file in which ur data is there into Multiple files. Then run the matrix multiplication with the folder as input . If the folder contains N sequence files, N mappers will be created.
On Monday, 28 January 2013, Sean Owen wrote: > These are settings to Hadoop, not Mahout. You may need to set them in > your cluster config. They are still only suggestions. > > The question still remains why you think you need several mappers. Why? > > On Mon, Jan 28, 2013 at 1:28 PM, Stuti Awasthi <stutiawas...@hcl.com> > wrote: > > Hi, > > I would like to again consolidate all the steps which I performed. > > > > Issue : MatrixMultiplication example is getting executed with only 1 map > task. > > > > Steps : > > 1. I created a file with size 104MB which is divided into 11 blocks with > size 10MB each. The file contains 200x100000 size of matrix. > > 2. I exported $MAHOUT_OPTS to the following > > $ echo $MAHOUT_OPTS > > -Dmapred.min.split.size=10485760 -Dmapred.map.tasks=7 > > 3. Tried to execute matrix multiplication example using commandline : > > mahout matrixmult --inputPathA /test/points/matrixA --numRowsA 200 > --numColsA 100000 --inputPathB /test/points/matrixA --numRowsB 200 > --numColsB 100000 --tempDir /test/temp > > > > When I check the Jobtracker UI , its shows me following for the running > job : > > Running Map Tasks : 1 > > Occupied Map Slots: 1 > > > > How can I distribute the map task on different mappers for > MatrixMultiplication Job dynamically. > > Is it even possible that MatrixMultiplication can run distributedly on > multiple mappers as it internally uses CompositeInputFormat . > > > > Please Suggest > > > > Thanks > > Stuti > > > > > > -----Original Message----- > > From: Sean Owen [mailto:sro...@gmail.com] > > Sent: Wednesday, January 23, 2013 6:42 PM > > To: Mahout User List > > Subject: Re: MatrixMultiplicationJob runs with 1 mapper only ? > > > > Mappers are usually extremely fast since they start themselves on top of > the data and their job is usually just parsing and emitting key value > pairs. Hadoop's choices are usually fine. > > > > If not it is usually because the mapper is emitting far more data than > it ingests. Are you computing some kind of Cartesian product of input? > > > > That's slow no matter what. More mappers may increase parallelism but > its still a lot of I/O. Avoid it if you can by sampling or pruning > unimportant values. Otherwise , try to implement a Combiner. > > On Jan 23, 2013 12:04 PM, "Jonas Grote" <jfgr...@gmail.com> wrote: > > > >> I'd play with the mapred.map.tasks option. Setting it to something > >> bigger than 1 gave me performance improvements for various hadoop jobs > >> on my cluster. > >> > >> > >> 2013/1/16 Ashish <paliwalash...@gmail.com> > >> > >> > I am afraid I don't know the answer. Need to experiment a bit more. > >> > I > >> have > >> > not used CompositeInputFormat so cannot comment. > >> > > >> > Probably, someone else on the ML(Mailing List) would be able to > >> > guide > >> here. > >> > > >> > > >> > On Wed, Jan 16, 2013 at 6:01 PM, Stuti Awasthi > >> > <stutiawas...@hcl.com> > >> > wrote: > >> > > >> > > Thanks Ashish, > >> > > > >> > > So according to the link if one is using CompositeInputFormat then > >> > > it > >> > will > >> > > take entire file as Input to a mapper without considering > >> > > InputSplits/blocksize. > >> > > If I am understanding it correctly then it is asking to break > >> > > [Original Input File]->[flie1,file2,....] . > >> > > > >> > > So If my file is [/test/MatrixA] --> [/test/smallfiles/file1, > >> > > [/test/smallfiles/file2, [/test/smallfiles/file3............... ] > >> > > > >> > > Now will the input path in MatrixMultiplicationJob will be > >> > > directory > >> path > >> > > : /test/smallfiles ?? > >> > > > >> > > Will breaking file in such manner will cause problem in > >> > > algorithmic execution of MR job. Im not sure if output will be > correct . > >> > > > >> > > -----Original Message----- > >> > > From: Ashish [mailto:paliwalash...@gmail.com] > >> > > Sent: Wednesday, Januar