Re: QRDecomposition performance

2013-01-28 Thread Sebastian Schelter
This is great news and will automatically boost the performance of all our ALS-based recommenders which are all using QRDecomposition internally. On 28.01.2013 04:02, Ted Dunning wrote: Did that. You are right. The QRD in mahout is abysmally slow. I wrote a new version on the airplane that

Re: QRDecomposition performance

2013-01-28 Thread Sean Owen
Is it worth simply using the Commons Math implementation? On Mon, Jan 28, 2013 at 8:04 AM, Sebastian Schelter s...@apache.org wrote: This is great news and will automatically boost the performance of all our ALS-based recommenders which are all using QRDecomposition internally. On 28.01.2013

RE: MatrixMultiplicationJob runs with 1 mapper only ?

2013-01-28 Thread Stuti Awasthi
Hi, I would like to again consolidate all the steps which I performed. Issue : MatrixMultiplication example is getting executed with only 1 map task. Steps : 1. I created a file with size 104MB which is divided into 11 blocks with size 10MB each. The file contains 200x10 size of matrix.

Re: MatrixMultiplicationJob runs with 1 mapper only ?

2013-01-28 Thread Sean Owen
These are settings to Hadoop, not Mahout. You may need to set them in your cluster config. They are still only suggestions. The question still remains why you think you need several mappers. Why? On Mon, Jan 28, 2013 at 1:28 PM, Stuti Awasthi stutiawas...@hcl.com wrote: Hi, I would like to

Re: MatrixMultiplicationJob runs with 1 mapper only ?

2013-01-28 Thread satish verma
I faced this problem too. Split the seq file in which ur data is there into Multiple files. Then run the matrix multiplication with the folder as input . If the folder contains N sequence files, N mappers will be created. On Monday, 28 January 2013, Sean Owen wrote: These are settings to

Re: QRDecomposition performance

2013-01-28 Thread Ying Liao
A wrapper is needed then because Commons Math takes in and outputs in different data structure. On Mon, Jan 28, 2013 at 3:14 AM, Sean Owen sro...@gmail.com wrote: Is it worth simply using the Commons Math implementation? On Mon, Jan 28, 2013 at 8:04 AM, Sebastian Schelter s...@apache.org

Re: QRDecomposition performance

2013-01-28 Thread Ted Dunning
Yeah... having to copy the matrix is a pain in the butt. On Mon, Jan 28, 2013 at 8:13 AM, Ying Liao yliao...@gmail.com wrote: A wrapper is needed then because Commons Math takes in and outputs in different data structure. On Mon, Jan 28, 2013 at 3:14 AM, Sean Owen sro...@gmail.com wrote:

Re: Precision question

2013-01-28 Thread Zia mel
Any thoughts of this ? On Sat, Jan 26, 2013 at 10:55 AM, Zia mel ziad.kame...@gmail.com wrote: OK , in the precison when we reduce the size of sample to .1 or 0.05 , would the results be related when we check with all the data ? For example, if we have data1 and data2 and test them using 0.1

Re: Precision question

2013-01-28 Thread Sean Owen
Impossible to say. More data means a more reliable estimate all else equal. That's about it. On Jan 28, 2013 5:17 PM, Zia mel ziad.kame...@gmail.com wrote: Any thoughts of this ? On Sat, Jan 26, 2013 at 10:55 AM, Zia mel ziad.kame...@gmail.com wrote: OK , in the precison when we reduce the

Re: Precision question

2013-01-28 Thread Zia mel
What about running several tests on small data , can't that give an indicator of how big data will perform ? Thanks On Mon, Jan 28, 2013 at 11:19 AM, Sean Owen sro...@gmail.com wrote: Impossible to say. More data means a more reliable estimate all else equal. That's about it. On Jan 28, 2013

Re: Precision question

2013-01-28 Thread Sean Owen
Yes several independent samples of all the data will, together, give you a better estimate of the real metric value than any individual one. On Mon, Jan 28, 2013 at 5:41 PM, Zia mel ziad.kame...@gmail.com wrote: What about running several tests on small data , can't that give an indicator of

Re: Heirarch clustering

2013-01-28 Thread jamal sasha
Sorry.. accidental sent out: But as I was saying.. I was looking in the link : https://cwiki.apache.org/confluence/display/MAHOUT/Top+Down+Clustering but its not very clear how to perform heirarchical clustering? Also, in the end.. I would also want to get the ids where each of the cluster center