Hi,
I am new to mahout.
i have text data in fomat as
Id,age,income,perwt,sex,city,product
1,23,2200,40,2,Boston,product #1
I want to perform kmeans clustering based on 2 feilds that is age and
income.And i also want perform in specific number of clusters.
I have already performed clustering by
How are u converting your data to sequencefile?
If you are not sure check this link:
http://stackoverflow.com/questions/13663567/mahout-csv-to-vector-and-running-the-program
Are you getting any clusteredpoints after running k-means?
It would help if you could list the commands you had
Thanks for your response
yes,I get clustered points after running Kmeans. I have done clustering
sucessfully with 20newsdata and reuters data.Clusterdump also works
properly with above stated examples.
Now,
i have text data in fomat as
Id,age,income,perwt,sex,city,product
[
https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13691954#comment-13691954
]
Grant Ingersoll commented on MAHOUT-1214:
-
Hi,
Any progress on this? It is the
On Mon, Jun 24, 2013 at 12:14 PM, Rajan Gupta rajangupta0...@gmail.comwrote:
Do i need to create custom code for this, if yes do help me
Yes. You definitely need custom code for this.
You also need to think about your data and why you want clusters.
What does age mean to a cluster? Are
[
https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13691969#comment-13691969
]
Yiqun Hu commented on MAHOUT-1214:
--
Grant, we have addressed all review comments and
See https://builds.apache.org/job/Mahout-Quality/2102/
--
[...truncated 7204 lines...]
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
Ok, so i was fairly easily able to build some DSL for our matrix
manipulation (similar to breeze) in scala:
inline matrix or vector:
val a = dense((1, 2, 3), (3, 4, 5))
val b:Vector = (1,2,3)
block views and assignments (element/row/vector/block/block of row or
vector)
a(::, 0)
a(1, ::)
a(0
Dmitriy,
This is very pretty.
On Mon, Jun 24, 2013 at 6:48 PM, Dmitriy Lyubimov dlie...@gmail.com wrote:
Ok, so i was fairly easily able to build some DSL for our matrix
manipulation (similar to breeze) in scala:
inline matrix or vector:
val a = dense((1, 2, 3), (3, 4, 5))
val
See
https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters-II/522/changes
Changes:
[smarthi] MAHOUT-944: lucene2seq - more code cleanup, removed unused imports
[smarthi] MAHOUT-833: Make conversion to sequence files map-reduce - fixed
issue with not reading a directory list
[smarthi]
Yeah, I'm totally on board with a pretty scala DSL on top of some of our
stuff.
In particular, I've been experimenting with with wrapping the
DistributedRowMatrix
in a scalding wrapper, so we can do things like
val matrixAsTypedPipe =
DistributedRowMatrixPipe(new DistributedRowMatrix(numRows,
That looks great Dmitry!
The thing about Breeze that drives the complexity in it is partly
specialization for Float, Double and Int matrices, and partly getting the
syntax to just work for all combinations of matrix types and operands etc.
mostly it does just work but occasionally not.
Hi,
I am Samiran. I participated in 3 day local workshop at ICFOSS (
http://community.apache.org/mentoringprogramme-icfoss-pilot.html). I am
looking forward to contribute to Mahout project.
I am Java beginner and learning it fast. My interest domain is data mining
and I am familiar with
On Mon, Jun 24, 2013 at 1:46 PM, Nick Pentreath nick.pentre...@gmail.comwrote:
That looks great Dmitry!
The thing about Breeze that drives the complexity in it is partly
specialization for Float, Double and Int matrices, and partly getting the
syntax to just work for all combinations of
I think that contrib modules would be very interesting. Specifically, good
Scala DSL, pig integration and so on.
On Mon, Jun 24, 2013 at 9:55 PM, Dmitriy Lyubimov dlie...@gmail.com wrote:
On Mon, Jun 24, 2013 at 1:46 PM, Nick Pentreath nick.pentre...@gmail.com
wrote:
That looks great
You're right on that - so far doubles is all I've needed and all I can
currently see needing.
I'll take a look at your project and see how easy it is to integrate with my
Spark ALS and other code - syntax wise it looks almost the same so swapping out
the linear algebra backend would be
Well one fundamental step to get there in Mahout realm, the way i see it,
is to create DSLs for Mahout's DRMs in spark. That's actually one of the
other reasons i chose not to follow Breeze. When we unwind Mahout DRM's, we
may see sparse or dense slices there with named vectors. To translate that
See https://builds.apache.org/job/Mahout-Quality/2103/
Hi!
Is the Google hangouts dev session tomorrow/Tuesday still happening?
Lurkingly,
Buro Mookerji
On Fri, Jun 14, 2013 at 3:37 AM, Grant Ingersoll gsing...@apache.orgwrote:
It seems to be that 6 pm ET is the consensus time for the majority of
people, although my having screwed up the poll
Not sure, but if we are having it I think we should focus on what's left for
0.8 release.
From: Bhaskar Mookerji mooke...@spin-one.org
To: dev@mahout.apache.org
Cc: Suneel Marthi suneel_mar...@yahoo.com
Sent: Monday, June 24, 2013 6:35 PM
Subject: Re:
See
https://builds.apache.org/job/mahout-nightly/org.apache.mahout$mahout-integration/1272/changes
Changes:
[smarthi] MAHOUT-944: lucene2seq - more code cleanup, removed unused imports
[smarthi] MAHOUT-833: Make conversion to sequence files map-reduce - fixed
issue with not reading a
See https://builds.apache.org/job/mahout-nightly/1272/changes
Changes:
[smarthi] MAHOUT-944: lucene2seq - more code cleanup, removed unused imports
[smarthi] MAHOUT-833: Make conversion to sequence files map-reduce - fixed
issue with not reading a directory list
[smarthi] MAHOUT-833: Make
[
https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692518#comment-13692518
]
Robin Anil commented on MAHOUT-1214:
https://reviews.apache.org/r/11931/
I have
[
https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692589#comment-13692589
]
Yiqun Hu commented on MAHOUT-1214:
--
Hi, Robin,
We also response to your comments about
[
https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692593#comment-13692593
]
Yiqun Hu commented on MAHOUT-1214:
--
Robin, just see your response. Let us digest it then
I'd really like to, but had a trip come up. If possible, can we push for one
week? Otherwise, if others want to go forward, I can try to set things up and
share it w/ others.
On Jun 24, 2013, at 6:35 PM, Bhaskar Mookerji mooke...@spin-one.org wrote:
Hi!
Is the Google hangouts dev session
[
https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692620#comment-13692620
]
Yiqun Hu commented on MAHOUT-1214:
--
Robin, I understand the philosophy of mahout. But
Can someone w/ more Hadoop experience look at this? We are getting:
java.lang.ClassCastException: org.apache.mahout.text.LuceneSegmentInputSplit
cannot be cast to org.apache.hadoop.mapred.InputSplit
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
at
Never mind the noise here, I misread this!
Still, we have some error going on w/ random failures.
On Jun 24, 2013, at 8:33 PM, Grant Ingersoll gsing...@apache.org wrote:
Can someone w/ more Hadoop experience look at this? We are getting:
java.lang.ClassCastException:
I am fine with pushing by a week.
From: Grant Ingersoll gsing...@apache.org
To: dev@mahout.apache.org
Cc: Suneel Marthi suneel_mar...@yahoo.com
Sent: Monday, June 24, 2013 8:25 PM
Subject: Re: (Bi-)Weekly/Monthly Dev Sessions
I'd really like to, but had
[
https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhang da updated MAHOUT-1214:
-
Attachment: (was: MAHOUT-1214.patch)
Improve the accuracy of the Spectral KMeans Method
Sebastian Schelter created MAHOUT-1268:
--
Summary: Wrong output directory for CVB
Key: MAHOUT-1268
URL: https://issues.apache.org/jira/browse/MAHOUT-1268
Project: Mahout
Issue Type: Bug
[
https://issues.apache.org/jira/browse/MAHOUT-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Schelter updated MAHOUT-1268:
---
Attachment: MAHOUT-1268.patch
Wrong output directory for CVB
[
https://issues.apache.org/jira/browse/MAHOUT-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692745#comment-13692745
]
Jake Mannix commented on MAHOUT-1268:
-
has this been tested with cluster_reuters.sh?
[
https://issues.apache.org/jira/browse/MAHOUT-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692749#comment-13692749
]
Suneel Marthi commented on MAHOUT-1268:
---
[~jake.mannix] testing cluster_reuters.sh
[
https://issues.apache.org/jira/browse/MAHOUT-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692753#comment-13692753
]
Suneel Marthi commented on MAHOUT-1268:
---
[~ssc] Please commit this, applied the
36 matches
Mail list logo