[jira] [Updated] (MAHOUT-1164) Make ARFF integration generate meta-data in JSON format

2013-06-09 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1164: --- Resolution: Fixed Assignee: Sebastian Schelter (was: Ted Dunning

[jira] [Updated] (MAHOUT-1163) Make random forest classifier meta-data file human readable

2013-06-09 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1163: --- Resolution: Fixed Assignee: Sebastian Schelter (was: Ted Dunning

[jira] [Updated] (MAHOUT-996) Support NamedVectors in arff.vector job by convention

2013-06-09 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-996: -- Fix Version/s: (was: 0.8) Backlog Moving this to the backlog

[jira] [Updated] (MAHOUT-1098) ColumnMeansJob broken

2013-06-09 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1098: --- Resolution: Fixed Status: Resolved (was: Patch Available) looks fixed to

Re: Adding twitter widget to the home page

2013-06-08 Thread Sebastian Schelter
I think its a great idea. On 08.06.2013 19:26, Robin Anil wrote: > Anyone oppose to me adding the twitter widget to out home page, seems like > lot of action over there and could feel more welcoming to users? >

Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion

2013-06-08 Thread Sebastian Schelter
Hi Grant, Very good release announcement. I propose that we deprecate a lot more, I think we should be aggressive here to pave the way for a clean and slim 1.0 release. I propose to additionally deprecate the following algorithms, as to my state of knowledge, they are not actively used: Collabor

Re: Work on ALS for future releases

2013-06-08 Thread Sebastian Schelter
Hi Saikat, Great that you want to work on the ALS code. I think it is very important to make it easier to use, ideally no knowledge of the papers and formulas should be necessary. As you know, the ALS code has a hyperparameter lambda that needs to be tuned in order to get a good factorization. Ar

[jira] [Resolved] (MAHOUT-1243) Dictionary file format in Lucene-Mahout integration is not in SequenceFileFormat

2013-06-08 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1243. Resolution: Fixed Added new option "seqDictOut" that trigger writ

[jira] [Updated] (MAHOUT-974) org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob use integer as userId and itemId

2013-06-08 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-974: -- Resolution: Fixed Status: Resolved (was: Patch Available) test issue is fixed

Re: Random Errors

2013-06-07 Thread Sebastian Schelter
I did and got no errors, the errors only occur during the execution of all tests. Am 07.06.2013 14:48 schrieb "Ted Dunning" : > Note that you can run an entire class of tests from the mvn command line. > > > On Fri, Jun 7, 2013 at 2:03 PM, Sebastian Schelter > wrote: >

[jira] [Updated] (MAHOUT-974) org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob use integer as userId and itemId

2013-06-07 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-974: -- Attachment: MAHOUT-974.patch Patch that adds the functionality. Tests don't wo

[jira] [Updated] (MAHOUT-974) org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob use integer as userId and itemId

2013-06-07 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-974: -- Status: Patch Available (was: Open

[jira] [Commented] (MAHOUT-974) org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob use integer as userId and itemId

2013-06-07 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13677975#comment-13677975 ] Sebastian Schelter commented on MAHOUT-974: --- Saikat, I've had a de

Re: Random Errors

2013-06-07 Thread Sebastian Schelter
I'm also getting errors on a test when executing all tests. Don't get the error when I run the test in the IDE or via mvn on the commandline. Do we now also have intra-test class parallelism? If yes, is there a way to disable this? --sebastian On 07.06.2013 09:11, Ted Dunning wrote: > This last

[jira] [Commented] (MAHOUT-992) Audit DistributedCache use to support EMR

2013-06-06 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13677192#comment-13677192 ] Sebastian Schelter commented on MAHOUT-992: --- To my knowledge it doe

Re: 0.8 progress

2013-06-06 Thread Sebastian Schelter
Hi Grant, Here's my take: Will/Must be finished: M-944 [include] M-958 [include] M-975 [include] M-1084 [include] M-1098 [include] M-1103 [include] M-1126 [push if no one steps up] M-1147 [include] M-1211 [push if no one steps up] M-1233 [push if no one steps up] M-1241 [include] C

[jira] [Commented] (MAHOUT-974) org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob use integer as userId and itemId

2013-06-06 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676816#comment-13676816 ] Sebastian Schelter commented on MAHOUT-974: --- Hi Saikat, The first two

[jira] [Commented] (MAHOUT-1241) Mailing list archives not available

2013-06-05 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13675730#comment-13675730 ] Sebastian Schelter commented on MAHOUT-1241: The correct URLs are li

Re: Suggested 0.8 Code Freeze Date

2013-06-03 Thread Sebastian Schelter
+1 on that. On 03.06.2013 00:26, Grant Ingersoll wrote: > I'd like to suggest a code freeze of June 10th 2013 for finishing 0.8 bugs. > > If they aren't in by then, they will get pushed, unless they are blockers. > > After that, I will create the release candidates. > > -Grant >

[jira] [Resolved] (MAHOUT-1154) Implementing Streaming KMeans

2013-06-02 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1154. Resolution: Fixed > Implementing Streaming KMe

[jira] [Resolved] (MAHOUT-1196) LogisticModelParameters uses csv.getTargetCategories() even if csv is not used.

2013-06-02 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1196. Resolution: Fixed > LogisticModelParameters uses csv.getTargetCategor

[jira] [Updated] (MAHOUT-716) Implement Boosting

2013-06-02 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-716: -- Fix Version/s: Backlog moving this to the backlog. [~hector.yee] if you find time

[jira] [Updated] (MAHOUT-732) Implement ranking autoencoder on top of gradient machine

2013-06-02 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-732: -- Fix Version/s: Backlog moving this to the backlog. [~hector.yee] if you find time

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

2013-06-02 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13672560#comment-13672560 ] Sebastian Schelter commented on MAHOUT-880: --- Moving this to the Backlo

[jira] [Updated] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

2013-06-02 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-880: -- Fix Version/s: Backlog > Add some matrix method(like addition, subtraction, n

[jira] [Commented] (MAHOUT-961) Modify the Tree/Forest Visualizer on DF.

2013-06-02 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13672552#comment-13672552 ] Sebastian Schelter commented on MAHOUT-961: --- Sorry for not looking at thi

[jira] [Updated] (MAHOUT-961) Modify the Tree/Forest Visualizer on DF.

2013-06-02 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-961: -- Fix Version/s: 0.8 > Modify the Tree/Forest Visualizer on

[jira] [Commented] (MAHOUT-1098) ColumnMeansJob broken

2013-06-02 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13672541#comment-13672541 ] Sebastian Schelter commented on MAHOUT-1098: [~dlyubimov] can we close

[jira] [Updated] (MAHOUT-1004) Distributed User-based Collaborative Filtering

2013-06-02 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1004: --- Fix Version/s: Backlog > Distributed User-based Collaborative Filter

[jira] [Commented] (MAHOUT-996) Support NamedVectors in arff.vector job by convention

2013-06-02 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13672535#comment-13672535 ] Sebastian Schelter commented on MAHOUT-996: --- I think that approach is re

[jira] [Commented] (MAHOUT-976) Implement Multilayer Perceptron

2013-06-02 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13672527#comment-13672527 ] Sebastian Schelter commented on MAHOUT-976: --- What's the status on thi

[jira] [Commented] (MAHOUT-975) Bug in Gradient Machine - Computation of the gradient

2013-06-02 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13672524#comment-13672524 ] Sebastian Schelter commented on MAHOUT-975: --- [~tdunning], can you have a

[jira] [Updated] (MAHOUT-975) Bug in Gradient Machine - Computation of the gradient

2013-06-02 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-975: -- Fix Version/s: 0.8 > Bug in Gradient Machine - Computation of the gradi

[jira] [Commented] (MAHOUT-1211) Replace deprecated Closables.closeQuietly calls

2013-06-02 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13672514#comment-13672514 ] Sebastian Schelter commented on MAHOUT-1211: Can anyone provide a temp

[jira] [Commented] (MAHOUT-1211) Replace deprecated Closables.closeQuietly calls

2013-06-02 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13672513#comment-13672513 ] Sebastian Schelter commented on MAHOUT-1211: I think the problem comes

[jira] [Updated] (MAHOUT-993) Some vector dumper flags are expecting arguments.

2013-06-02 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-993: -- Resolution: Fixed Assignee: Sebastian Schelter Status: Resolved (was

[jira] [Resolved] (MAHOUT-1228) Cleanup .gitignore

2013-06-02 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1228. Resolution: Fixed Assignee: Sebastian Schelter My bad, you are completely

[jira] [Resolved] (MAHOUT-1228) Cleanup .gitignore

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1228. Resolution: Invalid You are removing the target/* folders from the subprojects

[jira] [Commented] (MAHOUT-974) org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob use integer as userId and itemId

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13672327#comment-13672327 ] Sebastian Schelter commented on MAHOUT-974: --- Saikat, In the preprocessing

[jira] [Resolved] (MAHOUT-952) ARFFVectorIterable/MapBackedArffModel doesn't handle question mark '?', other ARFF issues

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-952. --- Resolution: Invalid Seems like this has already been fixed

[jira] [Updated] (MAHOUT-1132) fpgrowth2 crash when have not unique items in one line

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1132: --- Fix Version/s: (was: Backlog) 0.8 > fpgrowth2 crash w

[jira] [Updated] (MAHOUT-1152) mRMR feature selection algorithm

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1152: --- Fix Version/s: (was: 0.8) Backlog > mRMR feat

[jira] [Updated] (MAHOUT-1152) mRMR feature selection algorithm

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1152: --- Component/s: (was: Integration) > mRMR feature selection algori

[jira] [Updated] (MAHOUT-1175) IllegalStateException and FileNotFoundException occures when running mahout inbuilt mapreduce implementation of frequent pattern mining.

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1175: --- Fix Version/s: Backlog > IllegalStateException and FileNotFoundExcept

[jira] [Updated] (MAHOUT-1193) We may want a BlockSparseMatrix

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1193: --- Fix Version/s: Backlog > We may want a BlockSparseMat

[jira] [Updated] (MAHOUT-1178) GSOC 2013: Improve Lucene support in Mahout

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1178: --- Fix Version/s: Backlog > GSOC 2013: Improve Lucene support in Mah

[jira] [Updated] (MAHOUT-1179) GSOC 2013: Refactor and improve the classification APIs

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1179: --- Fix Version/s: Backlog > GSOC 2013: Refactor and improve the classificat

[jira] [Updated] (MAHOUT-1177) GSOC 2013: Reform and simplify the clustering APIs

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1177: --- Fix Version/s: Backlog > GSOC 2013: Reform and simplify the clustering A

[jira] [Updated] (MAHOUT-1196) LogisticModelParameters uses csv.getTargetCategories() even if csv is not used.

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1196: --- Fix Version/s: 0.8 > LogisticModelParameters uses csv.getTargetCategor

[jira] [Commented] (MAHOUT-1196) LogisticModelParameters uses csv.getTargetCategories() even if csv is not used.

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13672217#comment-13672217 ] Sebastian Schelter commented on MAHOUT-1196: Vineet, any progress on

[jira] [Updated] (MAHOUT-1200) Mahout tests depend on writing to /tmp/hadoop-$user

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1200: --- Fix Version/s: 0.8 > Mahout tests depend on writing to /tmp/hadoop-$u

[jira] [Updated] (MAHOUT-1204) Rewrite Benchmarks using Caliper

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1204: --- Fix Version/s: Backlog > Rewrite Benchmarks using Cali

[jira] [Resolved] (MAHOUT-1208) Not able to get the distance from the cluster.

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1208. Resolution: Won't Fix > Not able to get the distance from the

[jira] [Updated] (MAHOUT-1214) Improve the accuracy of the Spectral KMeans Method

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1214: --- Fix Version/s: Backlog > Improve the accuracy of the Spectral KMeans Met

[jira] [Updated] (MAHOUT-1228) Cleanup .gitignore

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1228: --- Fix Version/s: 0.8 > Cleanup .gitign

[jira] [Updated] (MAHOUT-974) org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob use integer as userId and itemId

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-974: -- Fix Version/s: 0.8

[jira] [Updated] (MAHOUT-1228) Cleanup .gitignore

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1228: --- Affects Version/s: (was: 0.7) 0.8 > Clea

[jira] [Updated] (MAHOUT-1220) seqdirectory brings empty files out

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1220: --- Fix Version/s: (was: 0.7) Affects Version/s: (was: 0.7

[jira] [Updated] (MAHOUT-1231) "No input clusters found in " error in kmeans

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1231: --- Affects Version/s: (was: 0.8) (was: 0.7

[jira] [Resolved] (MAHOUT-1234) Canopy Clustering

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1234. Resolution: Won't Fix > Canopy Cl

[jira] [Commented] (MAHOUT-974) org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob use integer as userId and itemId

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13672163#comment-13672163 ] Sebastian Schelter commented on MAHOUT-974: --- Saikat, are you still on

[jira] [Updated] (MAHOUT-974) org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob use integer as userId and itemId

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-974: -- Affects Version/s: (was: 0.6) 0.8

[jira] [Commented] (MAHOUT-884) Matrix Concatenate utility

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13672145#comment-13672145 ] Sebastian Schelter commented on MAHOUT-884: --- regarding the patch: please

[jira] [Resolved] (MAHOUT-1235) ParallelALSFactorizationJob does not use VectorSumCombiner

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1235. Resolution: Fixed > ParallelALSFactorizationJob does not

[jira] [Updated] (MAHOUT-1235) ParallelALSFactorizationJob does not use VectorSumCombiner

2013-06-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1235: --- Fix Version/s: 0.8 > ParallelALSFactorizationJob does not use VectorSumCombi

[jira] [Created] (MAHOUT-1235) ParallelALSFactorizationJob does not use VectorSumCombiner

2013-06-01 Thread Sebastian Schelter (JIRA)
Sebastian Schelter created MAHOUT-1235: -- Summary: ParallelALSFactorizationJob does not use VectorSumCombiner Key: MAHOUT-1235 URL: https://issues.apache.org/jira/browse/MAHOUT-1235 Project

[jira] [Resolved] (MAHOUT-1205) ParallelALSFactorizationJob should leverage the distributed cache

2013-05-07 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1205. Resolution: Fixed > ParallelALSFactorizationJob should leverage

Changelog entries

2013-05-06 Thread Sebastian Schelter
Please remember to add fixed issues to our changelog!

[jira] [Commented] (MAHOUT-1205) ParallelALSFactorizationJob should leverage the distributed cache

2013-05-06 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13650095#comment-13650095 ] Sebastian Schelter commented on MAHOUT-1205: I already have it finishe

[jira] [Created] (MAHOUT-1205) ParallelALSFactorizationJob should leverage the distributed cache

2013-05-05 Thread Sebastian Schelter (JIRA)
Sebastian Schelter created MAHOUT-1205: -- Summary: ParallelALSFactorizationJob should leverage the distributed cache Key: MAHOUT-1205 URL: https://issues.apache.org/jira/browse/MAHOUT-1205

Re: javadoc

2013-04-27 Thread Sebastian Schelter
Yes, that would be great. Please create a jira issue first. Best, Sebastian On 27.04.2013 11:19, Ángel Martínez González wrote: > Hi all, > There are a lot of of classes with no Javadoc comments in Mahout. Maybe > I could help with that? > > I could start with the following packages: > > org.ap

Re: Performance of ALS

2013-04-18 Thread Sebastian Schelter
VD. Definitely took a search to figure out what > 'gelsd' does in LAPACK! I'll see if I can test-drive this too to see > if it bumps performance. That would be great, JNI is a much smaller > requirement than a GPU! > > On Thu, Apr 18, 2013 at 10:01 PM, Sebastian Schelter

Re: Performance of ALS

2013-04-18 Thread Sebastian Schelter
I was just emailing something similar on Mahout(See my email). I saw the >>>> TU Berlin name and I thought you would do something about it :) This is >>>> excellent. One of the next gen work on Vectors is maybe investigating >>> this. >>>> >>>> >>>>

Re: Performance of ALS

2013-04-18 Thread Sebastian Schelter
I saw the >> TU Berlin name and I thought you would do something about it :) This is >> excellent. One of the next gen work on Vectors is maybe investigating this. >> >> >> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. >> >> >> On Th

Performance of ALS

2013-04-18 Thread Sebastian Schelter
Hi there, with regard to Robin mentioning JBlas [1] recently when we talked about the performance of our vector operations, I ported the solving code for ALS to JBlas today and got some awesome results. For the movielens 1M dataset and a factorization of rank 100, the runtimes per iteration dropp

Re: immutable vs mutable vectors

2013-04-15 Thread Sebastian Schelter
Yes, look at the assign() method which allows you to apply functions directly to the vector. /s On 15.04.2013 12:01, Andy Twigg wrote: > Hello all, > > As far as I can tell, mahout.math vectors are generally immutable, i.e. > > Vector x = foo > Vector y = bar > x=x.plus(y) // creates a new vect

Re: Call to action – Mahout needs your help

2013-04-13 Thread Sebastian Schelter
It would be great if you could write a little documentation on how to use the ALS recommenders in practice. On 05.04.2013 01:56, Andrew Musselman wrote: > In case this thread is still a good place to reply with an offer to help, > I'd love to pitch in. I have built a few production recommenders,

Re: PlusMult and other functions

2013-04-12 Thread Sebastian Schelter
+1 Am 12.04.2013 18:17 schrieb "Robin Anil" : > +1 to both the new methods and extending Guava functions > > Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. > > > On Fri, Apr 12, 2013 at 6:55 AM, Dan Filimon >wrote: > > > I'm adding interfaces for DoubleDoubleFunctions with the fol

[jira] [Comment Edited] (MAHOUT-1178) GSOC 2013: Improve Lucene support in Mahout

2013-04-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629000#comment-13629000 ] Sebastian Schelter edited comment on MAHOUT-1178 at 4/11/13 3:1

[jira] [Commented] (MAHOUT-1178) GSOC 2013: Improve Lucene support in Mahout

2013-04-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629000#comment-13629000 ] Sebastian Schelter commented on MAHOUT-1178: Gokhan, could you upload

[jira] [Comment Edited] (MAHOUT-1190) SequentialAccessSparseVector function assignment is very slow

2013-04-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628853#comment-13628853 ] Sebastian Schelter edited comment on MAHOUT-1190 at 4/11/13 11:4

[jira] [Commented] (MAHOUT-1190) SequentialAccessSparseVector function assignment is very slow

2013-04-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628853#comment-13628853 ] Sebastian Schelter commented on MAHOUT-1190: I took a look in the code

[jira] [Commented] (MAHOUT-974) org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob use integer as userId and itemId

2013-04-10 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13627925#comment-13627925 ] Sebastian Schelter commented on MAHOUT-974: --- I didn't do any work on i

[jira] [Updated] (MAHOUT-1050) mutable in-memory datamodel

2013-04-10 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1050: --- Resolution: Won't Fix Status: Resolved (was: Patch Avai

[jira] [Commented] (MAHOUT-1025) Update documentation for LDA before the release.

2013-04-10 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13627766#comment-13627766 ] Sebastian Schelter commented on MAHOUT-1025: I'm no expert on

Re: Code reviews and reviewers

2013-04-09 Thread Sebastian Schelter
Dan, it's a pleasure to review your code. Ask me anytime :) On 09.04.2013 13:31, Dan Filimon wrote: > Hi everyone, > > Sebastian has been reviewing my code on ReviewBoard [1] for a while now and > I feel bad for always asking him to do it. :) > > Is there anyone else who could have a look (I'll

[jira] [Resolved] (MAHOUT-1161) Unable to run CJKAnalyzer for conversion of a sequence file to sparse vector due to instantiation exception.

2013-04-08 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1161. Resolution: Fixed Fix Version/s: 0.8 Assignee: Sebastian Schelter

Re: Any interest in Data Preparation?

2013-04-06 Thread Sebastian Schelter
In general, I think it is great to have such tools. But they should be developed in context with a specific algorithm or problem. On 06.04.2013 17:47, Gokhan Capan wrote: > Hi, > > Are you guys interested in Weka like filters implementation, > like NominalToBinary, Discretize etc. > > I started

Re: Call to action – Mahout needs your help

2013-04-04 Thread Sebastian Schelter
Great to hear that you use Mahout in production! If you want to start working on it, you can either browse our jira issues or propose some issue to work on yourself. If you need some input, it would be awesome to enhance our ALS recommenders with cross-validation and tooling for finding a good reg

Re: CosineDistanceMeasure for 2 zero vectors?

2013-04-04 Thread Sebastian Schelter
On Fri, Apr 5, 2013 at 12:20 AM, Sebastian Schelter > wrote: > >> You can ignore the recommender stuff for the DistanceMeasure classes, as >> the recommenders use their own distance/similarity implementations. >> >> I justed wanted to comment on the example that

Re: CosineDistanceMeasure for 2 zero vectors?

2013-04-04 Thread Sebastian Schelter
; But now, the code returns 1. Is that a special value? I'd guess it means > you like it by default...? > > > On Fri, Apr 5, 2013 at 12:11 AM, Sebastian Schelter > wrote: > >> In recommender systems, it's dangerous to interpret "no interaction" as >

Re: CosineDistanceMeasure for 2 zero vectors?

2013-04-04 Thread Sebastian Schelter
ans > literally nothing. No interaction. Which could be either "don't like", > "don't like today", "dislike", etc. Which adds to the meaninglessness of > it. > > > On Thu, Apr 4, 2013 at 2:00 PM, Sebastian Schelter > wrote: > >&

Re: CosineDistanceMeasure for 2 zero vectors?

2013-04-04 Thread Sebastian Schelter
I think that in our recommender code, 0 should mean no rating or no interaction observed. I think modeling dislike with 0 creates lot of unnecessary problems. On 04.04.2013 22:56, Andrew Musselman wrote: > I see the arguments for having it defined, just raising the point that it's > a very strange

Re: CosineDistanceMeasure for 2 zero vectors?

2013-04-04 Thread Sebastian Schelter
Dislike should not be modeled by a zero rating IMHO. This might also create problems with the iterateNonZero() method in our vectors. On 04.04.2013 22:40, Andrew Musselman wrote: > I think it should return an "undefined" symbol. There is no angle between > two zero vectors. > > In a practical

[jira] [Commented] (MAHOUT-1161) Unable to run CJKAnalyzer for conversion of a sequence file to sparse vector due to instantiation exception.

2013-04-04 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13622351#comment-13622351 ] Sebastian Schelter commented on MAHOUT-1161: @rohit did you apply the p

[jira] [Commented] (MAHOUT-1161) Unable to run CJKAnalyzer for conversion of a sequence file to sparse vector due to instantiation exception.

2013-04-04 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13622088#comment-13622088 ] Sebastian Schelter commented on MAHOUT-1161: Hi Rohit, can you test we

[jira] [Commented] (MAHOUT-1025) Update documentation for LDA before the release.

2013-04-02 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620267#comment-13620267 ] Sebastian Schelter commented on MAHOUT-1025: Regarding 3) its perfectly

[jira] [Resolved] (MAHOUT-1184) Another take at pmd, findbugs and checkstyle

2013-04-01 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1184. Resolution: Fixed Fix Version/s: 0.8 > Another take at pmd, findb

[jira] [Created] (MAHOUT-1184) Another take at pmd, findbugs and checkstyle

2013-04-01 Thread Sebastian Schelter (JIRA)
Sebastian Schelter created MAHOUT-1184: -- Summary: Another take at pmd, findbugs and checkstyle Key: MAHOUT-1184 URL: https://issues.apache.org/jira/browse/MAHOUT-1184 Project: Mahout

[jira] [Commented] (MAHOUT-1181) Adding StreamingKMeans MapReduce classes

2013-03-29 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617274#comment-13617274 ] Sebastian Schelter commented on MAHOUT-1181: ma

<    4   5   6   7   8   9   10   11   12   13   >