RE: MatrixMultiplicationJob runs with 1 mapper only ?

2013-01-16 Thread Stuti Awasthi
Hey Sean, Thanks for response. MatrixMultiplicationJob help shows the usage like : usage: command [Generic Options] [Job-Specific Options] Here Generic Option can be provided by -D property=value. Hence I tried with commandline -D options but it seems like that it is not making any effect. It

RE: MatrixMultiplicationJob runs with 1 mapper only ?

2013-01-16 Thread Stuti Awasthi
Hi, I tried to call programmatically also but facing same issue : Only single MapTask is running and that too spilling the map output continuously. Hence im not able to generate the output for large matrix multiplication. Code Snippet : DistributedRowMatrix a = new DistributedRowMatrix(new

RE: MatrixMultiplicationJob runs with 1 mapper only ?

2013-01-16 Thread Sean Owen
Why do you need multiple mappers? Is one too slow? Many are not necessarily faster for small input On Jan 16, 2013 10:46 AM, Stuti Awasthi stutiawas...@hcl.com wrote: Hi, I tried to call programmatically also but facing same issue : Only single MapTask is running and that too spilling the map

RE: MatrixMultiplicationJob runs with 1 mapper only ?

2013-01-16 Thread Stuti Awasthi
The issue is that currently my matrix is of dimension (100x100k), Later it can be (1MX10M) or big. Even now if my job is running with the single mapper for (100x100k) and it is not able to complete the Job. As I mentioned map task just proceed to 0.99% and started spilling the map output.

Re: MatrixMultiplicationJob runs with 1 mapper only ?

2013-01-16 Thread Ashish
MatrixMultiplicationJob internally sets InputFormat as CompositeInputFormat JobConf conf = new JobConf(initialConf, MatrixMultiplicationJob.class); conf.setInputFormat(CompositeInputFormat.class); and AFAIK, CompositeInputFormat ignores the splits. See this

RE: MatrixMultiplicationJob runs with 1 mapper only ?

2013-01-16 Thread Stuti Awasthi
Thanks Ashish, So according to the link if one is using CompositeInputFormat then it will take entire file as Input to a mapper without considering InputSplits/blocksize. If I am understanding it correctly then it is asking to break [Original Input File]-[flie1,file2,] . So If my file is

Re: MatrixMultiplicationJob runs with 1 mapper only ?

2013-01-16 Thread Ashish
I am afraid I don't know the answer. Need to experiment a bit more. I have not used CompositeInputFormat so cannot comment. Probably, someone else on the ML(Mailing List) would be able to guide here. On Wed, Jan 16, 2013 at 6:01 PM, Stuti Awasthi stutiawas...@hcl.com wrote: Thanks Ashish,

Re: Test multiple similarities using the same data

2013-01-16 Thread Sean Owen
You can try resetting all the random seeds with RandomUtils.useTestSeed() On Jan 16, 2013 4:01 PM, Zia mel ziad.kame...@gmail.com wrote: Hi How to evaluate a recommender using different similarities ? Once we call evaluator.evaluate(recommenderBuilder,..) it will decide the training and test

Recommend to a group of users

2013-01-16 Thread Zia mel
Hi Can we use Mahout to recommend to a group of users that share similar interests? Maybe some clustering or so. Thanks

Re: Recommend to a group of users

2013-01-16 Thread Sean Owen
Not really directly, no. You can make N individual recommendations and combine them, and there are many ways to do that. You can blindly rank them on their absolute scores. You can interleave rankings so each gets every Nth slot in the recommendation. A popular metric is to rank by least-aversion

Re: Which ML Algorithms i can run without hadoop..

2013-01-16 Thread Ted Dunning
Logistic regression is a good place to start. The Mahout implementation stands alone without Hadoop. Look for OnlineLogisticRegression. On Mon, Jan 14, 2013 at 10:23 PM, VIGNESH S vigneshkln...@gmail.com wrote: Hi, I am looking for a light weight library for email classification.. can

Re: Which ML Algorithms i can run without hadoop..

2013-01-16 Thread VIGNESH S
Hi Ted, Thanks .. On Thu, Jan 17, 2013 at 12:41 AM, Ted Dunning ted.dunn...@gmail.com wrote: Logistic regression is a good place to start. The Mahout implementation stands alone without Hadoop. Look for OnlineLogisticRegression. On Mon, Jan 14, 2013 at 10:23 PM, VIGNESH S