Need Help in Clustering

2013-06-24 Thread Rajan Gupta
Hi, I am new to mahout. i have text data in fomat as Id,age,income,perwt,sex,city,product 1,23,2200,40,2,Boston,product #1 I want to perform kmeans clustering based on 2 feilds that is age and income.And i also want perform in specific number of clusters. I have already performed clustering by

Re: Need Help in Clustering

2013-06-24 Thread Suneel Marthi
How are u converting your data to sequencefile?  If you are not sure check this link: http://stackoverflow.com/questions/13663567/mahout-csv-to-vector-and-running-the-program Are you getting any clusteredpoints after running k-means? It would help if you could list the commands you had

Re: Need Help in Clustering

2013-06-24 Thread Rajan Gupta
Thanks for your response yes,I get clustered points after running Kmeans. I have done clustering sucessfully with 20newsdata and reuters data.Clusterdump also works properly with above stated examples. Now, i have text data in fomat as Id,age,income,perwt,sex,city,product

[jira] [Commented] (MAHOUT-1214) Improve the accuracy of the Spectral KMeans Method

2013-06-24 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13691954#comment-13691954 ] Grant Ingersoll commented on MAHOUT-1214: - Hi, Any progress on this? It is the

Re: Need Help in Clustering

2013-06-24 Thread Ted Dunning
On Mon, Jun 24, 2013 at 12:14 PM, Rajan Gupta rajangupta0...@gmail.comwrote: Do i need to create custom code for this, if yes do help me Yes. You definitely need custom code for this. You also need to think about your data and why you want clusters. What does age mean to a cluster? Are

[jira] [Commented] (MAHOUT-1214) Improve the accuracy of the Spectral KMeans Method

2013-06-24 Thread Yiqun Hu (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13691969#comment-13691969 ] Yiqun Hu commented on MAHOUT-1214: -- Grant, we have addressed all review comments and

Build failed in Jenkins: Mahout-Quality #2102

2013-06-24 Thread Apache Jenkins Server
See https://builds.apache.org/job/Mahout-Quality/2102/ -- [...truncated 7204 lines...] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at

Re: Mahout vectors/matrices/solvers on spark

2013-06-24 Thread Dmitriy Lyubimov
Ok, so i was fairly easily able to build some DSL for our matrix manipulation (similar to breeze) in scala: inline matrix or vector: val a = dense((1, 2, 3), (3, 4, 5)) val b:Vector = (1,2,3) block views and assignments (element/row/vector/block/block of row or vector) a(::, 0) a(1, ::) a(0

Re: Mahout vectors/matrices/solvers on spark

2013-06-24 Thread Ted Dunning
Dmitriy, This is very pretty. On Mon, Jun 24, 2013 at 6:48 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: Ok, so i was fairly easily able to build some DSL for our matrix manipulation (similar to breeze) in scala: inline matrix or vector: val a = dense((1, 2, 3), (3, 4, 5)) val

Build failed in Jenkins: Mahout-Examples-Cluster-Reuters-II #522

2013-06-24 Thread Apache Jenkins Server
See https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters-II/522/changes Changes: [smarthi] MAHOUT-944: lucene2seq - more code cleanup, removed unused imports [smarthi] MAHOUT-833: Make conversion to sequence files map-reduce - fixed issue with not reading a directory list [smarthi]

Re: Mahout vectors/matrices/solvers on spark

2013-06-24 Thread Jake Mannix
Yeah, I'm totally on board with a pretty scala DSL on top of some of our stuff. In particular, I've been experimenting with with wrapping the DistributedRowMatrix in a scalding wrapper, so we can do things like val matrixAsTypedPipe = DistributedRowMatrixPipe(new DistributedRowMatrix(numRows,

Re: Mahout vectors/matrices/solvers on spark

2013-06-24 Thread Nick Pentreath
That looks great Dmitry!  ​The thing about Breeze that drives the complexity in it is partly specialization for Float, Double and Int matrices, and partly getting the syntax to just work for all combinations of matrix types and operands etc. mostly it does just work but occasionally not.

Development guide (ICFOSS)

2013-06-24 Thread Samiran Raj Boro
Hi, I am Samiran. I participated in 3 day local workshop at ICFOSS ( http://community.apache.org/mentoringprogramme-icfoss-pilot.html). I am looking forward to contribute to Mahout project. I am Java beginner and learning it fast. My interest domain is data mining and I am familiar with

Re: Mahout vectors/matrices/solvers on spark

2013-06-24 Thread Dmitriy Lyubimov
On Mon, Jun 24, 2013 at 1:46 PM, Nick Pentreath nick.pentre...@gmail.comwrote: That looks great Dmitry! The thing about Breeze that drives the complexity in it is partly specialization for Float, Double and Int matrices, and partly getting the syntax to just work for all combinations of

Re: Mahout vectors/matrices/solvers on spark

2013-06-24 Thread Ted Dunning
I think that contrib modules would be very interesting. Specifically, good Scala DSL, pig integration and so on. On Mon, Jun 24, 2013 at 9:55 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: On Mon, Jun 24, 2013 at 1:46 PM, Nick Pentreath nick.pentre...@gmail.com wrote: That looks great

Re: Mahout vectors/matrices/solvers on spark

2013-06-24 Thread Nick Pentreath
You're right on that - so far doubles is all I've needed and all I can currently see needing.  ​I'll take a look at your project and see how easy it is to integrate with my Spark ALS and other code - syntax wise it looks almost the same so swapping out the linear algebra backend would be

Re: Mahout vectors/matrices/solvers on spark

2013-06-24 Thread Dmitriy Lyubimov
Well one fundamental step to get there in Mahout realm, the way i see it, is to create DSLs for Mahout's DRMs in spark. That's actually one of the other reasons i chose not to follow Breeze. When we unwind Mahout DRM's, we may see sparse or dense slices there with named vectors. To translate that

Jenkins build is back to normal : Mahout-Quality #2103

2013-06-24 Thread Apache Jenkins Server
See https://builds.apache.org/job/Mahout-Quality/2103/

Re: (Bi-)Weekly/Monthly Dev Sessions

2013-06-24 Thread Bhaskar Mookerji
Hi! Is the Google hangouts dev session tomorrow/Tuesday still happening? Lurkingly, Buro Mookerji On Fri, Jun 14, 2013 at 3:37 AM, Grant Ingersoll gsing...@apache.orgwrote: It seems to be that 6 pm ET is the consensus time for the majority of people, although my having screwed up the poll

Re: (Bi-)Weekly/Monthly Dev Sessions

2013-06-24 Thread Suneel Marthi
Not sure, but if we are having it I think we should focus on what's left for 0.8 release. From: Bhaskar Mookerji mooke...@spin-one.org To: dev@mahout.apache.org Cc: Suneel Marthi suneel_mar...@yahoo.com Sent: Monday, June 24, 2013 6:35 PM Subject: Re:

Build failed in Jenkins: mahout-nightly » Mahout Integration #1272

2013-06-24 Thread Apache Jenkins Server
See https://builds.apache.org/job/mahout-nightly/org.apache.mahout$mahout-integration/1272/changes Changes: [smarthi] MAHOUT-944: lucene2seq - more code cleanup, removed unused imports [smarthi] MAHOUT-833: Make conversion to sequence files map-reduce - fixed issue with not reading a

Build failed in Jenkins: mahout-nightly #1272

2013-06-24 Thread Apache Jenkins Server
See https://builds.apache.org/job/mahout-nightly/1272/changes Changes: [smarthi] MAHOUT-944: lucene2seq - more code cleanup, removed unused imports [smarthi] MAHOUT-833: Make conversion to sequence files map-reduce - fixed issue with not reading a directory list [smarthi] MAHOUT-833: Make

[jira] [Commented] (MAHOUT-1214) Improve the accuracy of the Spectral KMeans Method

2013-06-24 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692518#comment-13692518 ] Robin Anil commented on MAHOUT-1214: https://reviews.apache.org/r/11931/ I have

[jira] [Commented] (MAHOUT-1214) Improve the accuracy of the Spectral KMeans Method

2013-06-24 Thread Yiqun Hu (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692589#comment-13692589 ] Yiqun Hu commented on MAHOUT-1214: -- Hi, Robin, We also response to your comments about

[jira] [Commented] (MAHOUT-1214) Improve the accuracy of the Spectral KMeans Method

2013-06-24 Thread Yiqun Hu (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692593#comment-13692593 ] Yiqun Hu commented on MAHOUT-1214: -- Robin, just see your response. Let us digest it then

Re: (Bi-)Weekly/Monthly Dev Sessions

2013-06-24 Thread Grant Ingersoll
I'd really like to, but had a trip come up. If possible, can we push for one week? Otherwise, if others want to go forward, I can try to set things up and share it w/ others. On Jun 24, 2013, at 6:35 PM, Bhaskar Mookerji mooke...@spin-one.org wrote: Hi! Is the Google hangouts dev session

[jira] [Commented] (MAHOUT-1214) Improve the accuracy of the Spectral KMeans Method

2013-06-24 Thread Yiqun Hu (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692620#comment-13692620 ] Yiqun Hu commented on MAHOUT-1214: -- Robin, I understand the philosophy of mahout. But

Re: Build failed in Jenkins: mahout-nightly » Mahout Integration #1272

2013-06-24 Thread Grant Ingersoll
Can someone w/ more Hadoop experience look at this? We are getting: java.lang.ClassCastException: org.apache.mahout.text.LuceneSegmentInputSplit cannot be cast to org.apache.hadoop.mapred.InputSplit at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412) at

Re: Build failed in Jenkins: mahout-nightly » Mahout Integration #1272

2013-06-24 Thread Grant Ingersoll
Never mind the noise here, I misread this! Still, we have some error going on w/ random failures. On Jun 24, 2013, at 8:33 PM, Grant Ingersoll gsing...@apache.org wrote: Can someone w/ more Hadoop experience look at this? We are getting: java.lang.ClassCastException:

Re: (Bi-)Weekly/Monthly Dev Sessions

2013-06-24 Thread Suneel Marthi
I am fine with pushing by a week. From: Grant Ingersoll gsing...@apache.org To: dev@mahout.apache.org Cc: Suneel Marthi suneel_mar...@yahoo.com Sent: Monday, June 24, 2013 8:25 PM Subject: Re: (Bi-)Weekly/Monthly Dev Sessions I'd really like to, but had

[jira] [Updated] (MAHOUT-1214) Improve the accuracy of the Spectral KMeans Method

2013-06-24 Thread zhang da (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhang da updated MAHOUT-1214: - Attachment: (was: MAHOUT-1214.patch) Improve the accuracy of the Spectral KMeans Method

[jira] [Created] (MAHOUT-1268) Wrong output directory for CVB

2013-06-24 Thread Sebastian Schelter (JIRA)
Sebastian Schelter created MAHOUT-1268: -- Summary: Wrong output directory for CVB Key: MAHOUT-1268 URL: https://issues.apache.org/jira/browse/MAHOUT-1268 Project: Mahout Issue Type: Bug

[jira] [Updated] (MAHOUT-1268) Wrong output directory for CVB

2013-06-24 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1268: --- Attachment: MAHOUT-1268.patch Wrong output directory for CVB

[jira] [Commented] (MAHOUT-1268) Wrong output directory for CVB

2013-06-24 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692745#comment-13692745 ] Jake Mannix commented on MAHOUT-1268: - has this been tested with cluster_reuters.sh?

[jira] [Commented] (MAHOUT-1268) Wrong output directory for CVB

2013-06-24 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692749#comment-13692749 ] Suneel Marthi commented on MAHOUT-1268: --- [~jake.mannix] testing cluster_reuters.sh

[jira] [Commented] (MAHOUT-1268) Wrong output directory for CVB

2013-06-24 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692753#comment-13692753 ] Suneel Marthi commented on MAHOUT-1268: --- [~ssc] Please commit this, applied the