0.8 and bug squashing on June 1

2013-06-01 Thread Grant Ingersoll
A few of us are at Berlin Buzzwords hanging out and working on Mahout, so if you are interested, feel free to jump on IRC (#mahout on freenode) for some discussion. Not all of our conversation will be translated to IRC, but we are happy to interact w/ others if interested. Also, sounds like

[jira] [Updated] (MAHOUT-1201) Some Mahout jobs do not pass user supplied Configuration object to sub jobs

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-1201: Fix Version/s: 0.8 Some Mahout jobs do not pass user supplied Configuration object

[jira] [Updated] (MAHOUT-1154) Implementing Streaming KMeans

2013-06-01 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Dunning updated MAHOUT-1154: Fix Version/s: 0.8 Implementing Streaming KMeans -

[jira] [Created] (MAHOUT-1235) ParallelALSFactorizationJob does not use VectorSumCombiner

2013-06-01 Thread Sebastian Schelter (JIRA)
Sebastian Schelter created MAHOUT-1235: -- Summary: ParallelALSFactorizationJob does not use VectorSumCombiner Key: MAHOUT-1235 URL: https://issues.apache.org/jira/browse/MAHOUT-1235 Project:

[jira] [Updated] (MAHOUT-1162) Adding BallKMeans and StreamingKMeans classes

2013-06-01 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Dunning updated MAHOUT-1162: Fix Version/s: 0.8 Adding BallKMeans and StreamingKMeans classes

[jira] [Commented] (MAHOUT-1126) Mac builds won't unjar

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672126#comment-13672126 ] Grant Ingersoll commented on MAHOUT-1126: - When I build the examples job jar, I

[jira] [Updated] (MAHOUT-1132) fpgrowth2 crash when have not unique items in one line

2013-06-01 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Dunning updated MAHOUT-1132: Fix Version/s: Backlog fpgrowth2 crash when have not unique items in one line

[jira] [Commented] (MAHOUT-684) Topics regularization for LDA

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672128#comment-13672128 ] Grant Ingersoll commented on MAHOUT-684: Any update on this?

[jira] [Resolved] (MAHOUT-670) Provide a performance measurement framework for Mahout

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-670. Resolution: Won't Fix People who want this can get it off of Github, as there isn't a

[jira] [Commented] (MAHOUT-1126) Mac builds won't unjar

2013-06-01 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672129#comment-13672129 ] Pat Ferrel commented on MAHOUT-1126: Right you are and so the solution has changed to

[jira] [Updated] (MAHOUT-775) L2 does not work with TrainAdaptiveLogisticRegression

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-775: --- Fix Version/s: 0.8 L2 does not work with TrainAdaptiveLogisticRegression

[jira] [Updated] (MAHOUT-1235) ParallelALSFactorizationJob does not use VectorSumCombiner

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1235: --- Fix Version/s: 0.8 ParallelALSFactorizationJob does not use VectorSumCombiner

[jira] [Resolved] (MAHOUT-1235) ParallelALSFactorizationJob does not use VectorSumCombiner

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1235. Resolution: Fixed ParallelALSFactorizationJob does not use VectorSumCombiner

[jira] [Commented] (MAHOUT-804) Each page in Mahout's Confluence Wiki has 2 URLs, with differing page styles and search behaviours

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672132#comment-13672132 ] Grant Ingersoll commented on MAHOUT-804: Not sure what to do, perhaps we should

[jira] [Commented] (MAHOUT-836) On donating my Robust PCA Java code to Mahout

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672133#comment-13672133 ] Grant Ingersoll commented on MAHOUT-836: Hi Sujit, This is interesting, do you

[jira] [Resolved] (MAHOUT-865) Refactor Sequential Clustering algorithms

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-865. Resolution: Won't Fix We should open issues for individual instances as desired.

[jira] [Commented] (MAHOUT-874) Extract Writables into a separate module to allow smaller dependencies

2013-06-01 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672143#comment-13672143 ] Ted Dunning commented on MAHOUT-874: Jake, Can you confirm that changing Hadoop to

[jira] [Commented] (MAHOUT-884) Matrix Concatenate utility

2013-06-01 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672144#comment-13672144 ] Ted Dunning commented on MAHOUT-884: Suneel, can you commit this if you think it is

[jira] [Commented] (MAHOUT-884) Matrix Concatenate utility

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672145#comment-13672145 ] Sebastian Schelter commented on MAHOUT-884: --- regarding the patch: please make

[jira] [Commented] (MAHOUT-1206) Add density-based clustering algorithms to mahout

2013-06-01 Thread Yexi Jiang (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672147#comment-13672147 ] Yexi Jiang commented on MAHOUT-1206: Still there is no comments?

[jira] [Resolved] (MAHOUT-942) Improbe the way to process the missing value for DF.

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-942. Resolution: Later Please reopen when you have a patch Improbe the way to

[jira] [Commented] (MAHOUT-874) Extract Writables into a separate module to allow smaller dependencies

2013-06-01 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672149#comment-13672149 ] Jake Mannix commented on MAHOUT-874: So marking hadoop as provided is nice, a smaller

[jira] [Commented] (MAHOUT-884) Matrix Concatenate utility

2013-06-01 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672150#comment-13672150 ] Suneel Marthi commented on MAHOUT-884: -- Agree with Sebastian. I can work on this

[jira] [Commented] (MAHOUT-950) Change BtJob to use new MultipleOutputs API

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672151#comment-13672151 ] Grant Ingersoll commented on MAHOUT-950: I think we still need to support 1.0.X,

[jira] [Commented] (MAHOUT-884) Matrix Concatenate utility

2013-06-01 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672154#comment-13672154 ] Suneel Marthi commented on MAHOUT-884: -- Also will be adding unit tests as part of

[jira] [Updated] (MAHOUT-952) ARFFVectorIterable/MapBackedArffModel doesn't handle question mark '?', other ARFF issues

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-952: --- Fix Version/s: 0.8 I think we can add this to 0.8. Joe or Stuart, can you update this

[jira] [Updated] (MAHOUT-953) ArffVectorIterable does not gracefully handle duplicate attribute name

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-953: --- Fix Version/s: 0.8 ArffVectorIterable does not gracefully handle duplicate attribute

[jira] [Commented] (MAHOUT-953) ArffVectorIterable does not gracefully handle duplicate attribute name

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672158#comment-13672158 ] Grant Ingersoll commented on MAHOUT-953: Stuart, any chance you can get a patch

[jira] [Commented] (MAHOUT-966) Mismatch in the number of points given by the clusterDumper and ClusterOutputPostProcessor

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672161#comment-13672161 ] Grant Ingersoll commented on MAHOUT-966: Any update on this? Seems like it should

[jira] [Updated] (MAHOUT-966) Mismatch in the number of points given by the clusterDumper and ClusterOutputPostProcessor

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-966: --- Fix Version/s: 0.8 Mismatch in the number of points given by the clusterDumper and

[jira] [Updated] (MAHOUT-974) org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob use integer as userId and itemId

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-974: -- Affects Version/s: (was: 0.6) 0.8

[jira] [Commented] (MAHOUT-974) org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob use integer as userId and itemId

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672163#comment-13672163 ] Sebastian Schelter commented on MAHOUT-974: --- Saikat, are you still on this?

[jira] [Resolved] (MAHOUT-978) spectralkmeans utility fails when input filename begins with leading underscore

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-978. Resolution: Won't Fix I'd say, won't fix, as there is a workaround. Please re-open if

[jira] [Updated] (MAHOUT-992) Audit DistributedCache use to support EMR

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-992: --- Fix Version/s: 0.8 Audit DistributedCache use to support EMR

[jira] [Resolved] (MAHOUT-1234) Canopy Clustering

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1234. Resolution: Won't Fix Canopy Clustering -

[jira] [Resolved] (MAHOUT-1025) Update documentation for LDA before the release.

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-1025. - Resolution: Fixed Update documentation for LDA before the release.

[jira] [Updated] (MAHOUT-1231) No input clusters found in error in kmeans

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1231: --- Affects Version/s: (was: 0.8) (was: 0.7)

[jira] [Resolved] (MAHOUT-1041) Support for PMML

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-1041. - Resolution: Won't Fix Without a patch, I don't see putting this in. Also, I don't see

[jira] [Updated] (MAHOUT-1204) Rewrite Benchmarks using Caliper

2013-06-01 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil updated MAHOUT-1204: --- Affects Version/s: 1.0 Rewrite Benchmarks using Caliper

[jira] [Resolved] (MAHOUT-1045) Cluster evaluators returning bad results

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-1045. - Resolution: Fixed Looks in and passing Cluster evaluators returning

[jira] [Commented] (MAHOUT-974) org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob use integer as userId and itemId

2013-06-01 Thread Saikat Kanjilal (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672172#comment-13672172 ] Saikat Kanjilal commented on MAHOUT-974: Yes, although I could use some general

[jira] [Resolved] (MAHOUT-1053) Use KMeans++ for cluster Initialization

2013-06-01 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Dunning resolved MAHOUT-1053. - Resolution: Fixed This is resolved by the new streaming k-means stuff. Use

[jira] [Resolved] (MAHOUT-1054) Use ball KMeans for clustering

2013-06-01 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Dunning resolved MAHOUT-1054. - Resolution: Fixed This is resolved by the new streaming k-means stuff. Use

[jira] [Commented] (MAHOUT-1117) Vectors are not hashable

2013-06-01 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672176#comment-13672176 ] Robin Anil commented on MAHOUT-1117: There is no single way good to hash a vector

[jira] [Resolved] (MAHOUT-1117) Vectors are not hashable

2013-06-01 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil resolved MAHOUT-1117. Resolution: Won't Fix Vectors are not hashable

[jira] [Commented] (MAHOUT-1065) Add CassandraDataModelTest

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672173#comment-13672173 ] Grant Ingersoll commented on MAHOUT-1065: - [~eduardo.gurgel] [~srowen] any update

[jira] [Updated] (MAHOUT-1070) DisplayKMeans example has transposed/mislabelled arguments

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-1070: Fix Version/s: 0.8 DisplayKMeans example has transposed/mislabelled arguments

[jira] [Resolved] (MAHOUT-1060) Search for nearest neighbor

2013-06-01 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Dunning resolved MAHOUT-1060. - Resolution: Fixed All of this capability has been added by Dan's streaming k-means clustering

[jira] [Commented] (MAHOUT-1080) Kmeans clustered output losses vectorId given in the input

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672182#comment-13672182 ] Grant Ingersoll commented on MAHOUT-1080: - Here's a thought: kill NamedVector,

[jira] [Commented] (MAHOUT-1070) DisplayKMeans example has transposed/mislabelled arguments

2013-06-01 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672180#comment-13672180 ] Suneel Marthi commented on MAHOUT-1070: --- Is someone looking at this patch? I can

[jira] [Commented] (MAHOUT-1052) Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values)

2013-06-01 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672183#comment-13672183 ] Suneel Marthi commented on MAHOUT-1052: --- I can get this patch in for the 0.8

[jira] [Resolved] (MAHOUT-1070) DisplayKMeans example has transposed/mislabelled arguments

2013-06-01 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil resolved MAHOUT-1070. Resolution: Fixed Committed DisplayKMeans example has transposed/mislabelled

[jira] [Commented] (MAHOUT-1047) CVB hangs after completion

2013-06-01 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672186#comment-13672186 ] Suneel Marthi commented on MAHOUT-1047: --- Tested this patch and committing to trunk.

[jira] [Assigned] (MAHOUT-1047) CVB hangs after completion

2013-06-01 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi reassigned MAHOUT-1047: - Assignee: Suneel Marthi CVB hangs after completion --

[jira] [Updated] (MAHOUT-1206) Add density-based clustering algorithms to mahout

2013-06-01 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil updated MAHOUT-1206: --- Fix Version/s: Backlog Add density-based clustering algorithms to mahout

[jira] [Updated] (MAHOUT-1220) seqdirectory brings empty files out

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1220: --- Fix Version/s: (was: 0.7) Affects Version/s: (was: 0.7)

[jira] [Updated] (MAHOUT-1228) Cleanup .gitignore

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1228: --- Affects Version/s: (was: 0.7) 0.8 Cleanup

[jira] [Updated] (MAHOUT-974) org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob use integer as userId and itemId

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-974: -- Fix Version/s: 0.8

[jira] [Updated] (MAHOUT-1228) Cleanup .gitignore

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1228: --- Fix Version/s: 0.8 Cleanup .gitignore --

[jira] [Created] (MAHOUT-1236) Need a cleaned up serialized format for Vectors to handle names and all other kinds of things

2013-06-01 Thread Ted Dunning (JIRA)
Ted Dunning created MAHOUT-1236: --- Summary: Need a cleaned up serialized format for Vectors to handle names and all other kinds of things Key: MAHOUT-1236 URL: https://issues.apache.org/jira/browse/MAHOUT-1236

[jira] [Updated] (MAHOUT-1047) CVB hangs after completion

2013-06-01 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1047: -- Resolution: Fixed Fix Version/s: (was: 0.7) Status: Resolved (was:

[jira] [Assigned] (MAHOUT-1026) Add LDA (CVB implementation) to the cluster_reuters.sh example script

2013-06-01 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi reassigned MAHOUT-1026: - Assignee: Suneel Marthi (was: Jake Mannix) Add LDA (CVB implementation) to the

[jira] [Updated] (MAHOUT-1153) Implement streaming random forests

2013-06-01 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil updated MAHOUT-1153: --- Fix Version/s: Backlog Affects Version/s: (was: 0.7) Implement streaming random

[jira] [Updated] (MAHOUT-1214) Improve the accuracy of the Spectral KMeans Method

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1214: --- Fix Version/s: Backlog Improve the accuracy of the Spectral KMeans Method

[jira] [Resolved] (MAHOUT-1080) Kmeans clustered output losses vectorId given in the input

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-1080. - Resolution: Duplicate MAHOUT-1236 address this in the more general case

[jira] [Commented] (MAHOUT-1236) Need a cleaned up serialized format for Vectors to handle names and all other kinds of things

2013-06-01 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672188#comment-13672188 ] Jake Mannix commented on MAHOUT-1236: - Why protobufs? Why not thrift or avro? Maybe

[jira] [Commented] (MAHOUT-1236) Need a cleaned up serialized format for Vectors to handle names and all other kinds of things

2013-06-01 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672190#comment-13672190 ] Sean Owen commented on MAHOUT-1236: --- There has always been a tension between all of the

[jira] [Commented] (MAHOUT-1065) Add CassandraDataModelTest

2013-06-01 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672191#comment-13672191 ] Sean Owen commented on MAHOUT-1065: --- AFAIK this is on hold until the dependencies are

[jira] [Commented] (MAHOUT-1236) Need a cleaned up serialized format for Vectors to handle names and all other kinds of things

2013-06-01 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672192#comment-13672192 ] Jake Mannix commented on MAHOUT-1236: - Thrift leaves off optional fields pretty well

[jira] [Updated] (MAHOUT-1026) Add LDA (CVB implementation) to the cluster_reuters.sh example script

2013-06-01 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1026: -- Attachment: MAHOUT-1026.patch Add LDA (CVB implementation) to the cluster_reuters.sh

[jira] [Commented] (MAHOUT-1026) Add LDA (CVB implementation) to the cluster_reuters.sh example script

2013-06-01 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672194#comment-13672194 ] Suneel Marthi commented on MAHOUT-1026: --- Jake, Attached patch takes care of (a)

[jira] [Commented] (MAHOUT-1236) Need a cleaned up serialized format for Vectors to handle names and all other kinds of things

2013-06-01 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672197#comment-13672197 ] Sean Owen commented on MAHOUT-1236: --- Yes it's probably very similar. The comment was

[jira] [Updated] (MAHOUT-1084) Kmeans for synthetic control example--there are 12 cluster during iterations.

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-1084: Fix Version/s: 0.8 We should make sure the examples work, so adding this to 0.8. My env.

[jira] [Resolved] (MAHOUT-1208) Not able to get the distance from the cluster.

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1208. Resolution: Won't Fix Not able to get the distance from the cluster.

[jira] [Updated] (MAHOUT-1204) Rewrite Benchmarks using Caliper

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1204: --- Fix Version/s: Backlog Rewrite Benchmarks using Caliper

[jira] [Resolved] (MAHOUT-1092) MultiNormal is slow in common case

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-1092. - Resolution: Fixed Fix Version/s: 0.8 Ted says it's fixed on 4944dcc7

[jira] [Commented] (MAHOUT-1094) when i am giving the testing data from the new set of data without using split ..it is giving the completely wrong confusion matrix

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672206#comment-13672206 ] Grant Ingersoll commented on MAHOUT-1094: - Please provide more details and

[jira] [Updated] (MAHOUT-1103) clusterpp is not writing directories for all clusters

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-1103: Fix Version/s: 0.8 clusterpp is not writing directories for all clusters

[jira] [Commented] (MAHOUT-1103) clusterpp is not writing directories for all clusters

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672209#comment-13672209 ] Grant Ingersoll commented on MAHOUT-1103: - [~dlyubimov] or [~mmolek] any updates

[jira] [Updated] (MAHOUT-1200) Mahout tests depend on writing to /tmp/hadoop-$user

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1200: --- Fix Version/s: 0.8 Mahout tests depend on writing to /tmp/hadoop-$user

[jira] [Commented] (MAHOUT-1080) Kmeans clustered output losses vectorId given in the input

2013-06-01 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672210#comment-13672210 ] Pat Ferrel commented on MAHOUT-1080: +10 As a frequent user of named vectors I would

[jira] [Commented] (MAHOUT-1108) cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true

2013-06-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672212#comment-13672212 ] Grant Ingersoll commented on MAHOUT-1108: - Elmer, can you supply a patch?

[jira] [Resolved] (MAHOUT-598) Downstream steps in the seq2sparse job flow looking in wrong location for output from previous steps when running in Elastic MapReduce (EMR) cluster

2013-06-01 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil resolved MAHOUT-598. --- Resolution: Cannot Reproduce Downstream steps in the seq2sparse job flow looking in wrong

[jira] [Updated] (MAHOUT-684) Topics regularization for LDA

2013-06-01 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil updated MAHOUT-684: -- Fix Version/s: 0.8 Assignee: Jake Mannix Jake, please take a look at this one commit/close as

[jira] [Resolved] (MAHOUT-775) L2 does not work with TrainAdaptiveLogisticRegression

2013-06-01 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil resolved MAHOUT-775. --- Resolution: Fixed L2 does not work with TrainAdaptiveLogisticRegression

[jira] [Updated] (MAHOUT-1196) LogisticModelParameters uses csv.getTargetCategories() even if csv is not used.

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1196: --- Fix Version/s: 0.8 LogisticModelParameters uses csv.getTargetCategories() even

[jira] [Commented] (MAHOUT-1196) LogisticModelParameters uses csv.getTargetCategories() even if csv is not used.

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672217#comment-13672217 ] Sebastian Schelter commented on MAHOUT-1196: Vineet, any progress on this?

[jira] [Updated] (MAHOUT-1179) GSOC 2013: Refactor and improve the classification APIs

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1179: --- Fix Version/s: Backlog GSOC 2013: Refactor and improve the classification APIs

[jira] [Updated] (MAHOUT-1177) GSOC 2013: Reform and simplify the clustering APIs

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1177: --- Fix Version/s: Backlog GSOC 2013: Reform and simplify the clustering APIs

[jira] [Updated] (MAHOUT-1178) GSOC 2013: Improve Lucene support in Mahout

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1178: --- Fix Version/s: Backlog GSOC 2013: Improve Lucene support in Mahout

[jira] [Resolved] (MAHOUT-804) Each page in Mahout's Confluence Wiki has 2 URLs, with differing page styles and search behaviours

2013-06-01 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil resolved MAHOUT-804. --- Resolution: Fixed Seems to be exporting correcting now. Each page in Mahout's

[jira] [Updated] (MAHOUT-1065) Add CassandraDataModelTest

2013-06-01 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil updated MAHOUT-1065: --- Affects Version/s: (was: 0.8) Fix Version/s: Backlog Going with what Sean said.

[jira] [Updated] (MAHOUT-1193) We may want a BlockSparseMatrix

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1193: --- Fix Version/s: Backlog We may want a BlockSparseMatrix

[jira] [Commented] (MAHOUT-1108) cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true

2013-06-01 Thread Elmer Garduno (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672224#comment-13672224 ] Elmer Garduno commented on MAHOUT-1108: --- I will submit it later today.

[jira] [Updated] (MAHOUT-1175) IllegalStateException and FileNotFoundException occures when running mahout inbuilt mapreduce implementation of frequent pattern mining.

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1175: --- Fix Version/s: Backlog IllegalStateException and FileNotFoundException occures

[jira] [Resolved] (MAHOUT-1162) Adding BallKMeans and StreamingKMeans classes

2013-06-01 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Dunning resolved MAHOUT-1162. - Resolution: Fixed THis has been checked in. Adding BallKMeans and

[jira] [Updated] (MAHOUT-1152) mRMR feature selection algorithm

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1152: --- Component/s: (was: Integration) mRMR feature selection algorithm

[jira] [Resolved] (MAHOUT-1210) Fix URLs in mahout-collection-codegen-plugin pom

2013-06-01 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Dunning resolved MAHOUT-1210. - Resolution: Fixed Fix Version/s: 0.8 Committed this. Great (and obscure) catch, Stevo!

[jira] [Updated] (MAHOUT-1152) mRMR feature selection algorithm

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1152: --- Fix Version/s: (was: 0.8) Backlog mRMR feature

[jira] [Updated] (MAHOUT-950) Change BtJob to use new MultipleOutputs API

2013-06-01 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil updated MAHOUT-950: -- Fix Version/s: 1.0 Change BtJob to use new MultipleOutputs API

  1   2   >