[jira] [Commented] (MAHOUT-1456) The wikipediaXMLSplitter example fails with "heap size" error

2014-03-17 Thread mahmood (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938856#comment-13938856 ] mahmood commented on MAHOUT-1456: - In that pastbin link, I see that only the last command

[jira] [Commented] (MAHOUT-1464) RowSimilarityJob on Spark

2014-03-17 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938810#comment-13938810 ] Dmitriy Lyubimov commented on MAHOUT-1464: -- Also, just FYI, much as i love to us

[jira] [Commented] (MAHOUT-1464) RowSimilarityJob on Spark

2014-03-17 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938805#comment-13938805 ] Dmitriy Lyubimov commented on MAHOUT-1464: -- 1. {code} val C = A.t %*% A {code}

[jira] [Commented] (MAHOUT-1464) RowSimilarityJob on Spark

2014-03-17 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938799#comment-13938799 ] Dmitriy Lyubimov commented on MAHOUT-1464: -- What's the best way to share PDF sou

[jira] [Commented] (MAHOUT-1346) Spark Bindings (DRM)

2014-03-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938735#comment-13938735 ] Hudson commented on MAHOUT-1346: SUCCESS: Integrated in Mahout-Quality #2527 (See [https

[jira] [Commented] (MAHOUT-1467) ClusterClassifier readPolicy leaks file handles

2014-03-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938736#comment-13938736 ] Hudson commented on MAHOUT-1467: SUCCESS: Integrated in Mahout-Quality #2527 (See [https

[jira] [Updated] (MAHOUT-1464) RowSimilarityJob on Spark

2014-03-17 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1464: --- Attachment: MAHOUT-1464.patch Luckily, Dmitriy's latest commit solved most of my pro

[jira] [Updated] (MAHOUT-1467) ClusterClassifier readPolicy leaks file handles

2014-03-17 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1467: -- Resolution: Fixed Assignee: Suneel Marthi Status: Resolved (was: Patch Available

[jira] [Commented] (MAHOUT-1466) Cluster visualization fails to execute

2014-03-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938553#comment-13938553 ] Hudson commented on MAHOUT-1466: SUCCESS: Integrated in Mahout-Quality #2526 (See [https

[jira] [Commented] (MAHOUT-1346) Spark Bindings (DRM)

2014-03-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938552#comment-13938552 ] Hudson commented on MAHOUT-1346: SUCCESS: Integrated in Mahout-Quality #2526 (See [https

[jira] [Updated] (MAHOUT-1467) ClusterClassifier readPolicy leaks file handles

2014-03-17 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1467: -- Attachment: MAHOUT-1467.patch > ClusterClassifier readPolicy leaks file handles >

[jira] [Updated] (MAHOUT-1467) ClusterClassifier readPolicy leaks file handles

2014-03-17 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1467: -- Fix Version/s: 1.0 Status: Patch Available (was: Open) > ClusterClassifier readPol

Re: [jira] [Updated] (MAHOUT-1346) Spark Bindings (DRM)

2014-03-17 Thread Dmitriy Lyubimov
exponential law for simulated singular values is probably too aggressive. also Q normalizations are not needed. I need to poke the data simulation there a bit more. On Mon, Mar 17, 2014 at 3:26 PM, Dmitriy Lyubimov wrote: > Hm. yeah. i can do the version of distributed QR used in MR SSVD and >

Re: [jira] [Updated] (MAHOUT-1346) Spark Bindings (DRM)

2014-03-17 Thread Dmitriy Lyubimov
Hm. yeah. i can do the version of distributed QR used in MR SSVD and subsequently defined by Nathan Halko in his dissertation. That version seemed to be incredibly numberically stable. But i guess this is too much for a work not aligned with my current interest. Anyway, Cholesky-based SSVD should

[jira] [Updated] (MAHOUT-1346) Spark Bindings (DRM)

2014-03-17 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov updated MAHOUT-1346: - Attachment: ScalaSparkBindings.pdf updating docs to reflect latest committed state. Bro

[jira] [Updated] (MAHOUT-1346) Spark Bindings (DRM)

2014-03-17 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov updated MAHOUT-1346: - Attachment: (was: ScalaSparkBindings.pdf) > Spark Bindings (DRM) > -

[jira] [Commented] (MAHOUT-1467) ClusterClassifier readPolicy leaks file handles

2014-03-17 Thread Andrew Musselman (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938367#comment-13938367 ] Andrew Musselman commented on MAHOUT-1467: -- Could you please submit a patch; let

[jira] [Updated] (MAHOUT-1467) ClusterClassifier readPolicy leaks file handles

2014-03-17 Thread Avi Shinnar (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avi Shinnar updated MAHOUT-1467: Description: The org.apache.mahout.clustering.classify.ClusterClassifier.readPolicy method leaks

Re: Is Cholesky too sensitive to rank deficiency?

2014-03-17 Thread Ted Dunning
This may not be an issue that can actually be cured. The cholesky trick is akin to squaring a number. Inherently you tend to lose precision by doing this. With the possibility of iteration we should consider more advanced methods for large qr. The great value of the cholesky trick is that on

[jira] [Commented] (MAHOUT-1464) RowSimilarityJob on Spark

2014-03-17 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938333#comment-13938333 ] Pat Ferrel commented on MAHOUT-1464: OK, refreshed the repo and now I see all the Spa

[jira] [Updated] (MAHOUT-1466) Cluster visualization fails to execute

2014-03-17 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1466: --- Resolution: Fixed Status: Resolved (was: Patch Available) > Cluster visuali

[jira] [Comment Edited] (MAHOUT-1464) RowSimilarityJob on Spark

2014-03-17 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938259#comment-13938259 ] Sebastian Schelter edited comment on MAHOUT-1464 at 3/17/14 7:32 PM: --

[jira] [Commented] (MAHOUT-1464) RowSimilarityJob on Spark

2014-03-17 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938259#comment-13938259 ] Sebastian Schelter commented on MAHOUT-1464: I'd like to rework my prototype

Re: [jira] [Created] (MAHOUT-1467) ClusterClassifier read/writePolicy leak file handles

2014-03-17 Thread Suneel Marthi
Could u submit a patch? Please work off of trunk as some if the clustering code was moved around . Sent from my iPhone > On Mar 17, 2014, at 3:13 PM, "Avi Shinnar (JIRA)" wrote: > > Avi Shinnar created MAHOUT-1467: > --- > > Summary: ClusterClassifi

[jira] [Commented] (MAHOUT-1464) RowSimilarityJob on Spark

2014-03-17 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938254#comment-13938254 ] Sebastian Schelter commented on MAHOUT-1464: I havent tested Spark on Hadoop

[jira] [Created] (MAHOUT-1467) ClusterClassifier read/writePolicy leak file handles

2014-03-17 Thread Avi Shinnar (JIRA)
Avi Shinnar created MAHOUT-1467: --- Summary: ClusterClassifier read/writePolicy leak file handles Key: MAHOUT-1467 URL: https://issues.apache.org/jira/browse/MAHOUT-1467 Project: Mahout Issue Typ

Is Cholesky too sensitive to rank deficiency?

2014-03-17 Thread Dmitriy Lyubimov
I still seem to get signficant differences on the norm differences of Householder QR and QR via Cholesky trick. our stock in-core QR seems to be comfortable populating some R values (and therefore Q columns) with values as small as 1e-16, whereas Cholesky computation for L seems to set these things

Re: [GSOC 2014] Uniform API for Mahout Clustering

2014-03-17 Thread Dmitriy Lyubimov
Yes. there's interest. Note that we are trying to unify linear algebra primitives and optimization on Spark as well. All new linear algebra and interaction with spark context should probably go thru this layer. This is ongoing thing but some stuff is working [1] [1] mAHOUT-1346 https://issues.apac

[GSOC 2014] Uniform API for Mahout Clustering

2014-03-17 Thread chalitha udara Perera
Hi All, Going through the mail tread Mahout 1.0 goals, I found that the main focus of mahout is now towards the code re-factoring and integration with Spark rather than implementing new algorithms. Recently I have used mahout for implementing document clustering module a Content Management System.

[jira] [Commented] (MAHOUT-1464) RowSimilarityJob on Spark

2014-03-17 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938020#comment-13938020 ] Pat Ferrel commented on MAHOUT-1464: So am I so no problem. My plan is to update the

[jira] [Commented] (MAHOUT-1461) The tour

2014-03-17 Thread Scott (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937997#comment-13937997 ] Scott commented on MAHOUT-1461: --- absolutely > The tour > > >

[jira] [Commented] (MAHOUT-1464) RowSimilarityJob on Spark

2014-03-17 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937987#comment-13937987 ] Sebastian Schelter commented on MAHOUT-1464: @Pat I'm pretty busy with non-Ma

[jira] [Reopened] (MAHOUT-1461) The tour

2014-03-17 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter reopened MAHOUT-1461: > The tour > > > Key: MAHOUT-1461 > URL: http

[jira] [Commented] (MAHOUT-1461) The tour

2014-03-17 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937979#comment-13937979 ] Sebastian Schelter commented on MAHOUT-1461: Could you create a version of th

[jira] [Commented] (MAHOUT-1461) The tour

2014-03-17 Thread Scott (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937963#comment-13937963 ] Scott commented on MAHOUT-1461: --- Sebastian, That particular page was instrumental in pulli

[jira] [Commented] (MAHOUT-1464) RowSimilarityJob on Spark

2014-03-17 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937954#comment-13937954 ] Dmitriy Lyubimov commented on MAHOUT-1464: -- Ps spark module has cdh4 maven profi

[jira] [Commented] (MAHOUT-1464) RowSimilarityJob on Spark

2014-03-17 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937951#comment-13937951 ] Dmitriy Lyubimov commented on MAHOUT-1464: -- I only ever ran spark code with hdfs

[jira] [Commented] (MAHOUT-1464) RowSimilarityJob on Spark

2014-03-17 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937925#comment-13937925 ] Pat Ferrel commented on MAHOUT-1464: Good news. At the danger of asking for too much,

[jira] [Comment Edited] (MAHOUT-1464) RowSimilarityJob on Spark

2014-03-17 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937925#comment-13937925 ] Pat Ferrel edited comment on MAHOUT-1464 at 3/17/14 3:33 PM: -

[jira] [Updated] (MAHOUT-1466) Cluster visualization fails to execute

2014-03-17 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1466: -- Attachment: (was: MAHOUT-1466.patch) > Cluster visualization fails to execute > --

[jira] [Updated] (MAHOUT-1466) Cluster visualization fails to execute

2014-03-17 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1466: -- Attachment: MAHOUT-1466.patch > Cluster visualization fails to execute > -

[jira] [Updated] (MAHOUT-1466) Cluster visualization fails to execute

2014-03-17 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1466: -- Affects Version/s: 0.9 This issue seems to have been caused by the fix for M-1339 from 0.9 rel

[jira] [Updated] (MAHOUT-1466) Cluster visualization fails to execute

2014-03-17 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1466: -- Attachment: MAHOUT-1466.patch > Cluster visualization fails to execute > -

[jira] [Updated] (MAHOUT-1466) Cluster visualization fails to execute

2014-03-17 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1466: -- Status: Patch Available (was: Open) > Cluster visualization fails to execute > --

[jira] [Commented] (MAHOUT-1466) Cluster visualization fails to execute

2014-03-17 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937593#comment-13937593 ] Suneel Marthi commented on MAHOUT-1466: --- Fix is to remove provided for all slf4j de

[jira] [Commented] (MAHOUT-1456) The wikipediaXMLSplitter example fails with "heap size" error

2014-03-17 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937586#comment-13937586 ] Suneel Marthi commented on MAHOUT-1456: --- I don't think this issue is related to run