I'm also convinced that Spark is a superior platform for executing
distributed ML algorithms. We've had a discussion about a change from
Hadoop to another platform some time ago, but at that point in time it
was not clear which of the upcoming dataflow processing systems (Spark,
Hyracks, Strato
I know the Spark/Mllib devs can occasionally be quite set in ways of doing
certain things, but we'd welcome as many Mahout devs as possible to work
together.
It may be too late, but perhaps a GSoC project to look at a port of some stuff
like co occurrence recommender and streaming k-means?
yes, this is a popular initiative.
On Tue, Feb 18, 2014 at 1:08 PM, Ying Liao wrote:
> Just wonder what is the future of Mahout. We are seeing new stuff from
> 0xdata and skytree. And spark is also design for in-memory iterative
> analysis. What about mahout? Will mahout run on top of spark in
On Tue, Feb 18, 2014 at 1:58 PM, Nick Pentreath wrote:
> My (admittedly heavily biased) view is Spark is a superior platform overall
> for ML. If the two communities can work together to leverage the strengths
> of Spark, and the large amount of good stuff in Mahout (as well as the
> fantastic dep
Spark provides a "lower-level" ML library called MLlib. MLI / MLBase is
built on top of this and includes some high-level abstractions similar in
nature to distributed matrices / dataframes. But it's still pretty new and
rough at this point (https://github.com/amplab/MLI).
MLlib already provides (
In general, if you are interested in machine learning.. think there is
already a machine learning specific initiative on spark called Mlbase (
http://www.mlbase.org/)
and graphx (http://amplab.github.io/graphx/) for graphlab style ml.
On Tue, Feb 18, 2014 at 1:14 PM, Harshit Bapna wrote:
>
I am very eager to know the same from the community.
Thanks for bringing it up.
--Harshit
On Tue, Feb 18, 2014 at 1:08 PM, Ying Liao wrote:
> Just wonder what is the future of Mahout. We are seeing new stuff from
> 0xdata and skytree. And spark is also design for in-memory iterative
> analysis
Just wonder what is the future of Mahout. We are seeing new stuff from
0xdata and skytree. And spark is also design for in-memory iterative
analysis. What about mahout? Will mahout run on top of spark in future?
Thanks,
Ying Liao
The Apache Mahout PMC is pleased to announce the release of Mahout 0.9.
Mahout's goal is to build scalable machine learning libraries focused
primarily in the areas of collaborative filtering (recommenders),
clustering and classification (known collectively as the "3Cs"), as well as the
necessary i
Streaming KMeans runs with a single reducer that runs Ball KMeans and hence the
slow performance that you have been experiencing.
How did u come up with -km 63000?
Given that u would like 1 clusters (= k) and have 2,000,000 datapoints (=
n) so k * ln(n) = 1 * ln(2 * 10^6) = 145087 (ro
You definitely don't have to mess with hadoop source.
On Tuesday, February 18, 2014 10:28 AM, Stamatis Rapanakis
wrote:
I try to run an example and get the following error:
eb 18, 2014 4:31:28 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
WARNING: job_local_0001
*java.lang.NoSuchFi
I try to run an example and get the following error:
eb 18, 2014 4:31:28 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
WARNING: job_local_0001
*java.lang.NoSuchFieldError: LUCENE_43*
at
org.apache.mahout.common.lucene.AnalyzerUtils.createAnalyzer(AnalyzerUtils.java:35)
at
org.apache.mahout.ve
Thanks Sean.
I will check how to support 0.9 with CDH4.
However 0.9 has solved my problem.
On Tue, Feb 18, 2014 at 7:45 PM, Sean Owen wrote:
> FYI, CDH5 includes version 0.8 + patches. But 0.9 should work fine
> with CDH4. You do have to build with the Hadoop 2.x profile, as usual.
>
> On T
FYI, CDH5 includes version 0.8 + patches. But 0.9 should work fine
with CDH4. You do have to build with the Hadoop 2.x profile, as usual.
On Tue, Feb 18, 2014 at 2:06 PM, Ted Dunning wrote:
> Bikash,
>
> Don't use that version. Use a more recent release. We can't help that
> Cloudera has an old
Hello again, and sorry to bother you with this once again,
I'm having a bit of trouble. My CSV files are just full of numbers (doubles).
Each line looks something like this: 2.4135,1.1120. I'm not sure if this makes
a big difference. But when I try to do step #2, I can't seem to figure out
Yeah Tedseems there is major change in 0.9
In 0.9 I found that clsuteredPoint data are getting written in
Pair rather than only Vector. Its good.
Thanks to everyone to answer correctly for an unframed question :)
On Tue, Feb 18, 2014 at 7:36 PM, Ted Dunning wrote:
> Bikash,
>
> Don't use th
Bikash,
Don't use that version. Use a more recent release. We can't help that
Cloudera has an old version.
On Tue, Feb 18, 2014 at 1:26 AM, Bikash Gupta wrote:
> Suneel,
>
> Thanks for the information.
>
> I am using 0.7 packaged with CDH .
>
> On Tue, Feb 18, 2014 at 2:14 PM, Suneel Marthi
Suneel,
Thanks for the information.
I am using 0.7 packaged with CDH .
On Tue, Feb 18, 2014 at 2:14 PM, Suneel Marthi wrote:
>
>
>
>
>
>
> On Tuesday, February 18, 2014 3:37 AM, Bikash Gupta
> wrote:
>
> Ted/Peter,
>
> Thanks for the response.
>
> This is exactly what I am trying to achieve.
On Tuesday, February 18, 2014 3:37 AM, Bikash Gupta
wrote:
Ted/Peter,
Thanks for the response.
This is exactly what I am trying to achieve. May be I was not able to
put my questions clearly.
I am clustering on few variables of Customer/User(except their
customer_id/user_id) and storing
Ted/Peter,
Thanks for the response.
This is exactly what I am trying to achieve. May be I was not able to
put my questions clearly.
I am clustering on few variables of Customer/User(except their
customer_id/user_id) and storing customer_id/user_id list in a
separate place.
Question) What is the
20 matches
Mail list logo