date:20140218

Re: Mahout on Spark?

2014-02-18 Thread Sebastian Schelter

I'm also convinced that Spark is a superior platform for executing distributed ML algorithms. We've had a discussion about a change from Hadoop to another platform some time ago, but at that point in time it was not clear which of the upcoming dataflow processing systems (Spark, Hyracks, Strato

Re: Mahout on Spark?

2014-02-18 Thread Nick Pentreath

I know the Spark/Mllib devs can occasionally be quite set in ways of doing certain things, but we'd welcome as many Mahout devs as possible to work together. It may be too late, but perhaps a GSoC project to look at a port of some stuff like co occurrence recommender and streaming k-means?

Re: Mahout on Spark?

2014-02-18 Thread Dmitriy Lyubimov

yes, this is a popular initiative. On Tue, Feb 18, 2014 at 1:08 PM, Ying Liao wrote: > Just wonder what is the future of Mahout. We are seeing new stuff from > 0xdata and skytree. And spark is also design for in-memory iterative > analysis. What about mahout? Will mahout run on top of spark in

Re: Mahout on Spark?

2014-02-18 Thread Ted Dunning

On Tue, Feb 18, 2014 at 1:58 PM, Nick Pentreath wrote: > My (admittedly heavily biased) view is Spark is a superior platform overall > for ML. If the two communities can work together to leverage the strengths > of Spark, and the large amount of good stuff in Mahout (as well as the > fantastic dep

Re: Mahout on Spark?

2014-02-18 Thread Nick Pentreath

Spark provides a "lower-level" ML library called MLlib. MLI / MLBase is built on top of this and includes some high-level abstractions similar in nature to distributed matrices / dataframes. But it's still pretty new and rough at this point (https://github.com/amplab/MLI). MLlib already provides (

Re: Mahout on Spark?

2014-02-18 Thread Mohit Singh

In general, if you are interested in machine learning.. think there is already a machine learning specific initiative on spark called Mlbase ( http://www.mlbase.org/) and graphx (http://amplab.github.io/graphx/) for graphlab style ml. On Tue, Feb 18, 2014 at 1:14 PM, Harshit Bapna wrote: >

Re: Mahout on Spark?

2014-02-18 Thread Harshit Bapna

I am very eager to know the same from the community. Thanks for bringing it up. --Harshit On Tue, Feb 18, 2014 at 1:08 PM, Ying Liao wrote: > Just wonder what is the future of Mahout. We are seeing new stuff from > 0xdata and skytree. And spark is also design for in-memory iterative > analysis

Mahout on Spark?

2014-02-18 Thread Ying Liao

Just wonder what is the future of Mahout. We are seeing new stuff from 0xdata and skytree. And spark is also design for in-memory iterative analysis. What about mahout? Will mahout run on top of spark in future? Thanks, Ying Liao

Apache Mahout 0.9 released

2014-02-18 Thread Suneel Marthi

The Apache Mahout PMC is pleased to announce the release of Mahout 0.9. Mahout's goal is to build scalable machine learning libraries focused primarily in the areas of collaborative filtering (recommenders), clustering and classification (known collectively as the "3Cs"), as well as the necessary i

Re: reduce is too slow in StreamingKmeans

2014-02-18 Thread Suneel Marthi

Streaming KMeans runs with a single reducer that runs Ball KMeans and hence the slow performance that you have been experiencing. How did u come up with -km 63000? Given that u would like 1 clusters (= k) and have 2,000,000 datapoints (= n) so k * ln(n) = 1 * ln(2 * 10^6) = 145087 (ro

Re: Mahout 0.8, Hadoop 1.2.1 and Lucene version

2014-02-18 Thread Suneel Marthi

You definitely don't have to mess with hadoop source. On Tuesday, February 18, 2014 10:28 AM, Stamatis Rapanakis wrote: I try to run an example and get the following error: eb 18, 2014 4:31:28 PM org.apache.hadoop.mapred.LocalJobRunner$Job run WARNING: job_local_0001 *java.lang.NoSuchFi

Mahout 0.8, Hadoop 1.2.1 and Lucene version

2014-02-18 Thread Stamatis Rapanakis

I try to run an example and get the following error: eb 18, 2014 4:31:28 PM org.apache.hadoop.mapred.LocalJobRunner$Job run WARNING: job_local_0001 *java.lang.NoSuchFieldError: LUCENE_43* at org.apache.mahout.common.lucene.AnalyzerUtils.createAnalyzer(AnalyzerUtils.java:35) at org.apache.mahout.ve

Re: [Edit] Approach for Clustering Data

2014-02-18 Thread Bikash Gupta

Thanks Sean. I will check how to support 0.9 with CDH4. However 0.9 has solved my problem. On Tue, Feb 18, 2014 at 7:45 PM, Sean Owen wrote: > FYI, CDH5 includes version 0.8 + patches. But 0.9 should work fine > with CDH4. You do have to build with the Hadoop 2.x profile, as usual. > > On T

Re: [Edit] Approach for Clustering Data

2014-02-18 Thread Sean Owen

FYI, CDH5 includes version 0.8 + patches. But 0.9 should work fine with CDH4. You do have to build with the Hadoop 2.x profile, as usual. On Tue, Feb 18, 2014 at 2:06 PM, Ted Dunning wrote: > Bikash, > > Don't use that version. Use a more recent release. We can't help that > Cloudera has an old

RE: seqdumper output?

2014-02-18 Thread Allen, Ronald L.

Hello again, and sorry to bother you with this once again, I'm having a bit of trouble. My CSV files are just full of numbers (doubles). Each line looks something like this: 2.4135,1.1120. I'm not sure if this makes a big difference. But when I try to do step #2, I can't seem to figure out

Re: [Edit] Approach for Clustering Data

2014-02-18 Thread Bikash Gupta

Yeah Tedseems there is major change in 0.9 In 0.9 I found that clsuteredPoint data are getting written in Pair rather than only Vector. Its good. Thanks to everyone to answer correctly for an unframed question :) On Tue, Feb 18, 2014 at 7:36 PM, Ted Dunning wrote: > Bikash, > > Don't use th

Re: [Edit] Approach for Clustering Data

2014-02-18 Thread Ted Dunning

Bikash, Don't use that version. Use a more recent release. We can't help that Cloudera has an old version. On Tue, Feb 18, 2014 at 1:26 AM, Bikash Gupta wrote: > Suneel, > > Thanks for the information. > > I am using 0.7 packaged with CDH . > > On Tue, Feb 18, 2014 at 2:14 PM, Suneel Marthi

Re: [Edit] Approach for Clustering Data

2014-02-18 Thread Bikash Gupta

Suneel, Thanks for the information. I am using 0.7 packaged with CDH . On Tue, Feb 18, 2014 at 2:14 PM, Suneel Marthi wrote: > > > > > > > On Tuesday, February 18, 2014 3:37 AM, Bikash Gupta > wrote: > > Ted/Peter, > > Thanks for the response. > > This is exactly what I am trying to achieve.

Re: [Edit] Approach for Clustering Data

2014-02-18 Thread Suneel Marthi

On Tuesday, February 18, 2014 3:37 AM, Bikash Gupta wrote: Ted/Peter, Thanks for the response. This is exactly what I am trying to achieve. May be I was not able to put my questions clearly. I am clustering on few variables of Customer/User(except their customer_id/user_id) and storing

Re: [Edit] Approach for Clustering Data

2014-02-18 Thread Bikash Gupta

Ted/Peter, Thanks for the response. This is exactly what I am trying to achieve. May be I was not able to put my questions clearly. I am clustering on few variables of Customer/User(except their customer_id/user_id) and storing customer_id/user_id list in a separate place. Question) What is the

Re: Mahout on Spark?

Re: Mahout on Spark?

Re: Mahout on Spark?

Re: Mahout on Spark?

Re: Mahout on Spark?

Re: Mahout on Spark?

Re: Mahout on Spark?

Mahout on Spark?

Apache Mahout 0.9 released

Re: reduce is too slow in StreamingKmeans

Re: Mahout 0.8, Hadoop 1.2.1 and Lucene version

Mahout 0.8, Hadoop 1.2.1 and Lucene version

Re: [Edit] Approach for Clustering Data

Re: [Edit] Approach for Clustering Data

RE: seqdumper output?

Re: [Edit] Approach for Clustering Data

Re: [Edit] Approach for Clustering Data

Re: [Edit] Approach for Clustering Data

Re: [Edit] Approach for Clustering Data

Re: [Edit] Approach for Clustering Data

20 matches

Site Navigation

Mail list logo

Footer information