Re: (Bi-)Weekly/Monthly Dev Sessions

2013-06-24 Thread Pradeep Pujari
Is it only for commiters? If not, I am looking forward to join this hangout. On Wed, Jun 12, 2013 at 4:26 AM, Grant Ingersoll wrote: > Hi, > > One of the things we kicked around at Buzzwords was having a > weekly/bi-weekly/monthly dev session via Google hangout (Drill does this > with good succe

[jira] [Updated] (MAHOUT-1214) Improve the accuracy of the Spectral KMeans Method

2013-06-24 Thread zhang da (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhang da updated MAHOUT-1214: - Attachment: MAHOUT-1214.patch removed the string input format, let's not include in this fix then.

[jira] [Commented] (MAHOUT-1268) Wrong output directory for CVB

2013-06-24 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692753#comment-13692753 ] Suneel Marthi commented on MAHOUT-1268: --- [~ssc] Please commit this, applied the pat

[jira] [Commented] (MAHOUT-1268) Wrong output directory for CVB

2013-06-24 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692749#comment-13692749 ] Suneel Marthi commented on MAHOUT-1268: --- [~jake.mannix] testing cluster_reuters.sh

[jira] [Commented] (MAHOUT-1268) Wrong output directory for CVB

2013-06-24 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692745#comment-13692745 ] Jake Mannix commented on MAHOUT-1268: - has this been tested with cluster_reuters.sh?

[jira] [Updated] (MAHOUT-1268) Wrong output directory for CVB

2013-06-24 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1268: --- Attachment: MAHOUT-1268.patch > Wrong output directory for CVB > ---

[jira] [Created] (MAHOUT-1268) Wrong output directory for CVB

2013-06-24 Thread Sebastian Schelter (JIRA)
Sebastian Schelter created MAHOUT-1268: -- Summary: Wrong output directory for CVB Key: MAHOUT-1268 URL: https://issues.apache.org/jira/browse/MAHOUT-1268 Project: Mahout Issue Type: Bug

[jira] [Updated] (MAHOUT-1214) Improve the accuracy of the Spectral KMeans Method

2013-06-24 Thread zhang da (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhang da updated MAHOUT-1214: - Attachment: (was: MAHOUT-1214.patch) > Improve the accuracy of the Spectral KMeans Method >

Re: (Bi-)Weekly/Monthly Dev Sessions

2013-06-24 Thread Suneel Marthi
I am fine with pushing by a week. From: Grant Ingersoll To: dev@mahout.apache.org Cc: Suneel Marthi Sent: Monday, June 24, 2013 8:25 PM Subject: Re: (Bi-)Weekly/Monthly Dev Sessions I'd really like to, but had a trip come up.  If possible, can we push

Re: Build failed in Jenkins: mahout-nightly » Mahout Integration #1272

2013-06-24 Thread Grant Ingersoll
Never mind the noise here, I misread this! Still, we have some error going on w/ random failures. On Jun 24, 2013, at 8:33 PM, Grant Ingersoll wrote: > Can someone w/ more Hadoop experience look at this? We are getting: > > java.lang.ClassCastException: org.apache.mahout.text.LuceneSegmentInp

Re: Build failed in Jenkins: mahout-nightly » Mahout Integration #1272

2013-06-24 Thread Grant Ingersoll
Can someone w/ more Hadoop experience look at this? We are getting: java.lang.ClassCastException: org.apache.mahout.text.LuceneSegmentInputSplit cannot be cast to org.apache.hadoop.mapred.InputSplit at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412) at org.apache.

Re: (Bi-)Weekly/Monthly Dev Sessions

2013-06-24 Thread Grant Ingersoll
I'd really like to, but had a trip come up. If possible, can we push for one week? Otherwise, if others want to go forward, I can try to set things up and share it w/ others. On Jun 24, 2013, at 6:35 PM, Bhaskar Mookerji wrote: > Hi! > > Is the Google hangouts dev session tomorrow/Tuesday s

[jira] [Commented] (MAHOUT-1214) Improve the accuracy of the Spectral KMeans Method

2013-06-24 Thread Yiqun Hu (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692620#comment-13692620 ] Yiqun Hu commented on MAHOUT-1214: -- Robin, I understand the philosophy of mahout. But wh

[jira] [Commented] (MAHOUT-1214) Improve the accuracy of the Spectral KMeans Method

2013-06-24 Thread Yiqun Hu (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692593#comment-13692593 ] Yiqun Hu commented on MAHOUT-1214: -- Robin, just see your response. Let us digest it then

[jira] [Commented] (MAHOUT-1214) Improve the accuracy of the Spectral KMeans Method

2013-06-24 Thread Yiqun Hu (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692589#comment-13692589 ] Yiqun Hu commented on MAHOUT-1214: -- Hi, Robin, We also response to your comments about w

[jira] [Commented] (MAHOUT-1214) Improve the accuracy of the Spectral KMeans Method

2013-06-24 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692518#comment-13692518 ] Robin Anil commented on MAHOUT-1214: https://reviews.apache.org/r/11931/ I have actu

Build failed in Jenkins: mahout-nightly #1272

2013-06-24 Thread Apache Jenkins Server
See Changes: [smarthi] MAHOUT-944: lucene2seq - more code cleanup, removed unused imports [smarthi] MAHOUT-833: Make conversion to sequence files map-reduce - fixed issue with not reading a directory list [smarthi] MAHOUT-833: Make co

Build failed in Jenkins: mahout-nightly » Mahout Integration #1272

2013-06-24 Thread Apache Jenkins Server
See Changes: [smarthi] MAHOUT-944: lucene2seq - more code cleanup, removed unused imports [smarthi] MAHOUT-833: Make conversion to sequence files map-reduce - fixed issue with not reading a directo

Re: (Bi-)Weekly/Monthly Dev Sessions

2013-06-24 Thread Suneel Marthi
Not sure, but if we are having it I think we should focus on what's left for 0.8 release. From: Bhaskar Mookerji To: dev@mahout.apache.org Cc: Suneel Marthi Sent: Monday, June 24, 2013 6:35 PM Subject: Re: (Bi-)Weekly/Monthly Dev Sessions Hi! Is the Go

Re: (Bi-)Weekly/Monthly Dev Sessions

2013-06-24 Thread Bhaskar Mookerji
Hi! Is the Google hangouts dev session tomorrow/Tuesday still happening? Lurkingly, Buro Mookerji On Fri, Jun 14, 2013 at 3:37 AM, Grant Ingersoll wrote: > It seems to be that 6 pm ET is the consensus time for the majority of > people, although my having screwed up the poll didn't help. > > Bi

Jenkins build is back to normal : Mahout-Quality #2103

2013-06-24 Thread Apache Jenkins Server
See

Re: Mahout vectors/matrices/solvers on spark

2013-06-24 Thread Dmitriy Lyubimov
Well one fundamental step to get there in Mahout realm, the way i see it, is to create DSLs for Mahout's DRMs in spark. That's actually one of the other reasons i chose not to follow Breeze. When we unwind Mahout DRM's, we may see sparse or dense slices there with named vectors. To translate that i

Re: Mahout vectors/matrices/solvers on spark

2013-06-24 Thread Nick Pentreath
You're right on that - so far doubles is all I've needed and all I can currently see needing.  ​I'll take a look at your project and see how easy it is to integrate with my Spark ALS and other code - syntax wise it looks almost the same so swapping out the linear algebra backend would be quite

Re: Mahout vectors/matrices/solvers on spark

2013-06-24 Thread Ted Dunning
I think that contrib modules would be very interesting. Specifically, good Scala DSL, pig integration and so on. On Mon, Jun 24, 2013 at 9:55 PM, Dmitriy Lyubimov wrote: > On Mon, Jun 24, 2013 at 1:46 PM, Nick Pentreath >wrote: > > > That looks great Dmitry! > > > > > > The thing about Breeze

Re: Mahout vectors/matrices/solvers on spark

2013-06-24 Thread Dmitriy Lyubimov
On Mon, Jun 24, 2013 at 1:46 PM, Nick Pentreath wrote: > That looks great Dmitry! > > > The thing about Breeze that drives the complexity in it is partly > specialization for Float, Double and Int matrices, and partly getting the > syntax to "just work" for all combinations of matrix types and ope

Re: Mahout vectors/matrices/solvers on spark

2013-06-24 Thread Dmitriy Lyubimov
On Mon, Jun 24, 2013 at 1:24 PM, Jake Mannix wrote: > Yeah, I'm totally on board with a pretty scala DSL on top of some of our > stuff. > > In particular, I've been experimenting with with wrapping the > DistributedRowMatrix > in a scalding wrapper, so we can do things like > > val matrixAsTypedP

Development guide (ICFOSS)

2013-06-24 Thread Samiran Raj Boro
Hi, I am Samiran. I participated in 3 day local workshop at ICFOSS ( http://community.apache.org/mentoringprogramme-icfoss-pilot.html). I am looking forward to contribute to Mahout project. I am Java beginner and learning it fast. My interest domain is data mining and I am familiar with clustering

Re: Mahout vectors/matrices/solvers on spark

2013-06-24 Thread Nick Pentreath
That looks great Dmitry!  ​The thing about Breeze that drives the complexity in it is partly specialization for Float, Double and Int matrices, and partly getting the syntax to "just work" for all combinations of matrix types and operands etc. mostly it does "just work" but occasionally not.

Re: Mahout vectors/matrices/solvers on spark

2013-06-24 Thread Jake Mannix
Yeah, I'm totally on board with a pretty scala DSL on top of some of our stuff. In particular, I've been experimenting with with wrapping the DistributedRowMatrix in a scalding wrapper, so we can do things like val matrixAsTypedPipe = DistributedRowMatrixPipe(new DistributedRowMatrix(numRows,

Build failed in Jenkins: Mahout-Examples-Cluster-Reuters-II #522

2013-06-24 Thread Apache Jenkins Server
See Changes: [smarthi] MAHOUT-944: lucene2seq - more code cleanup, removed unused imports [smarthi] MAHOUT-833: Make conversion to sequence files map-reduce - fixed issue with not reading a directory list [smarthi]

Re: Mahout vectors/matrices/solvers on spark

2013-06-24 Thread Ted Dunning
Dmitriy, This is very pretty. On Mon, Jun 24, 2013 at 6:48 PM, Dmitriy Lyubimov wrote: > Ok, so i was fairly easily able to build some DSL for our matrix > manipulation (similar to breeze) in scala: > > inline matrix or vector: > > val a = dense((1, 2, 3), (3, 4, 5)) > > val b:Vector = (1,2

Re: Mahout vectors/matrices/solvers on spark

2013-06-24 Thread Dmitriy Lyubimov
Ok, so i was fairly easily able to build some DSL for our matrix manipulation (similar to breeze) in scala: inline matrix or vector: val a = dense((1, 2, 3), (3, 4, 5)) val b:Vector = (1,2,3) block views and assignments (element/row/vector/block/block of row or vector) a(::, 0) a(1, ::) a(0

Build failed in Jenkins: Mahout-Quality #2102

2013-06-24 Thread Apache Jenkins Server
See -- [...truncated 7204 lines...] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcc

[jira] [Commented] (MAHOUT-1214) Improve the accuracy of the Spectral KMeans Method

2013-06-24 Thread Yiqun Hu (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691969#comment-13691969 ] Yiqun Hu commented on MAHOUT-1214: -- Grant, we have addressed all review comments and upl

Re: Need Help in Clustering

2013-06-24 Thread Ted Dunning
On Mon, Jun 24, 2013 at 12:14 PM, Rajan Gupta wrote: > Do i need to create custom code for this, if yes do help me > Yes. You definitely need custom code for this. You also need to think about your data and why you want clusters. What does age mean to a cluster? Are people with the same age s

[jira] [Commented] (MAHOUT-1214) Improve the accuracy of the Spectral KMeans Method

2013-06-24 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691954#comment-13691954 ] Grant Ingersoll commented on MAHOUT-1214: - Hi, Any progress on this? It is the

Re: Need Help in Clustering

2013-06-24 Thread Rajan Gupta
Thanks for your response yes,I get clustered points after running Kmeans. I have done clustering sucessfully with 20newsdata and reuters data.Clusterdump also works properly with above stated examples. Now, i have text data in fomat as Id,age,income,perwt,sex,city,product 1,23,2200,40,2,Boston,p

Re: Need Help in Clustering

2013-06-24 Thread Suneel Marthi
How are u converting your data to sequencefile?  If you are not sure check this link: http://stackoverflow.com/questions/13663567/mahout-csv-to-vector-and-running-the-program Are you getting any clusteredpoints after running k-means? It would help if you could list the commands you had executed

Need Help in Clustering

2013-06-24 Thread Rajan Gupta
Hi, I am new to mahout. i have text data in fomat as Id,age,income,perwt,sex,city,product 1,23,2200,40,2,Boston,product #1 I want to perform kmeans clustering based on 2 feilds that is age and income.And i also want perform in specific number of clusters. I have already performed clustering by