[jira] [Commented] (MAHOUT-918) Implement SGD based classifiers using MapReduce

2011-12-12 Thread jirapos...@reviews.apache.org (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168194#comment-13168194 ] jirapos...@reviews.apache.org commented on MAHOUT-918: -- bq. On 201

Re: Review Request: MAHOUT-918 Parallelized SGD in MapReduce

2011-12-12 Thread issei yoshida
> On 2011-12-08 07:04:49, Ted Dunning wrote: > > > > Ted Dunning wrote: > This code got worse with these comments, not better. > > issei yoshida wrote: > Would you mind reviewing Diff revision 3? > You still seems to look at revision 2. Updated Diff revision 4 where I add some comm

[jira] [Commented] (MAHOUT-918) Implement SGD based classifiers using MapReduce

2011-12-12 Thread jirapos...@reviews.apache.org (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168193#comment-13168193 ] jirapos...@reviews.apache.org commented on MAHOUT-918: --

Re: Review Request: MAHOUT-918 Parallelized SGD in MapReduce

2011-12-12 Thread issei yoshida
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3072/ --- (Updated 2011-12-13 07:32:38.895973) Review request for mahout. Summary --

[jira] [Commented] (MAHOUT-923) Row mean job for PCA

2011-12-12 Thread jirapos...@reviews.apache.org (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168134#comment-13168134 ] jirapos...@reviews.apache.org commented on MAHOUT-923: --

[jira] [Commented] (MAHOUT-923) Row mean job for PCA

2011-12-12 Thread Raphael Cendrillon (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168133#comment-13168133 ] Raphael Cendrillon commented on MAHOUT-923: --- Thanks Lance. I've updated this on

Re: Review Request: Row mean job for PCA

2011-12-12 Thread Raphael Cendrillon
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3147/ --- (Updated 2011-12-13 04:46:47.630950) Review request for mahout, lancenorskog and

[jira] [Commented] (MAHOUT-918) Implement SGD based classifiers using MapReduce

2011-12-12 Thread jirapos...@reviews.apache.org (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168118#comment-13168118 ] jirapos...@reviews.apache.org commented on MAHOUT-918: -- bq. On 201

[jira] [Commented] (MAHOUT-926) Adding the Tree/Forest Visualizer

2011-12-12 Thread Ikumasa Mukai (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168117#comment-13168117 ] Ikumasa Mukai commented on MAHOUT-926: -- We can get the result like this. i. iris - c

Re: Review Request: MAHOUT-918 Parallelized SGD in MapReduce

2011-12-12 Thread issei yoshida
> On 2011-12-08 07:04:49, Ted Dunning wrote: > > > > Ted Dunning wrote: > This code got worse with these comments, not better. Would you mind reviewing Diff revision 3? You still seems to look at revision 2. - issei --- This is an

[jira] [Created] (MAHOUT-926) Adding the Tree/Forest Visualizer

2011-12-12 Thread Ikumasa Mukai (Created) (JIRA)
Adding the Tree/Forest Visualizer - Key: MAHOUT-926 URL: https://issues.apache.org/jira/browse/MAHOUT-926 Project: Mahout Issue Type: Improvement Components: Classification Reporter: Ikum

[jira] [Commented] (MAHOUT-840) Decision Forests should support Regression problems

2011-12-12 Thread Ikumasa Mukai (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168110#comment-13168110 ] Ikumasa Mukai commented on MAHOUT-840: -- Thank you so much for adopting my patch! I am

[jira] [Commented] (MAHOUT-923) Row mean job for PCA

2011-12-12 Thread Lance Norskog (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168111#comment-13168111 ] Lance Norskog commented on MAHOUT-923: -- The right way to set the vector class is to u

Re: [jira] [Commented] (MAHOUT-917) Build takes too long

2011-12-12 Thread Ted Dunning
This is much less true when applied to map-reduce programs. The point of scalability is that the code, well, scales. As such, with a few caveats testing small is actually pretty similar to testing large. This isn't quite try when it comes to convergence rates and such, but basic function should

Re: Tests running time

2011-12-12 Thread Lance Norskog
The Map/Reduce SGD patch includes a very nice trick which I did not know about. Here is an example: https://reviews.apache.org/r/3072/diff/2/?file=63195#file63195line36 It uses DummyRecordWriter to ship key/value pairs from mapper to reducer. On Mon, Dec 12, 2011 at 12:43 PM, Isabel Drost wrote

Re: [jira] [Commented] (MAHOUT-917) Build takes too long

2011-12-12 Thread Lance Norskog
The current unit tests and small-scale end-to-end tests are fine. Maybe a few could be trimmed down. Some of the long-running ones use fair-sized datasets to verify that they crunch numbers correctly. Mahout has some algorithms for which a realistic test should take a few hours and several servers

Re: [jira] [Commented] (MAHOUT-918) Implement SGD based classifiers using MapReduce

2011-12-12 Thread Lance Norskog
Suggestion: enhance examples/bin/classify-20newsgroups.sh to allow using this to generate the model, along with the online program. Lance On Mon, Dec 12, 2011 at 4:06 AM, jirapos...@reviews.apache.org (Commented) (JIRA) wrote: > >    [ > https://issues.apache.org/jira/browse/MAHOUT-918?page=com

[jira] [Commented] (MAHOUT-918) Implement SGD based classifiers using MapReduce

2011-12-12 Thread jirapos...@reviews.apache.org (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168092#comment-13168092 ] jirapos...@reviews.apache.org commented on MAHOUT-918: -- bq. On 201

[jira] [Commented] (MAHOUT-918) Implement SGD based classifiers using MapReduce

2011-12-12 Thread jirapos...@reviews.apache.org (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168091#comment-13168091 ] jirapos...@reviews.apache.org commented on MAHOUT-918: --

Jenkins build is still unstable: Mahout-Quality #1249

2011-12-12 Thread Apache Jenkins Server
See

[jira] [Updated] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

2011-12-12 Thread Dmitriy Lyubimov (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov updated MAHOUT-922: Attachment: MAHOUT-922.patch combing up the style and comments, removing stale code, etc.

[jira] [Commented] (MAHOUT-923) Row mean job for PCA

2011-12-12 Thread jirapos...@reviews.apache.org (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168009#comment-13168009 ] jirapos...@reviews.apache.org commented on MAHOUT-923: --

Re: Review Request: Row mean job for PCA

2011-12-12 Thread Raphael Cendrillon
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3147/ --- (Updated 2011-12-13 00:58:36.591798) Review request for mahout and Dmitriy Lyubi

[jira] [Commented] (MAHOUT-923) Row mean job for PCA

2011-12-12 Thread jirapos...@reviews.apache.org (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167980#comment-13167980 ] jirapos...@reviews.apache.org commented on MAHOUT-923: --

Re: Review Request: Row mean job for PCA

2011-12-12 Thread Sebastian Schelter
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3147/#review3866 --- /trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixRowMean

[jira] [Commented] (MAHOUT-923) Row mean job for PCA

2011-12-12 Thread jirapos...@reviews.apache.org (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167972#comment-13167972 ] jirapos...@reviews.apache.org commented on MAHOUT-923: -- bq. On 201

[jira] [Updated] (MAHOUT-923) Row mean job for PCA

2011-12-12 Thread Raphael Cendrillon (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raphael Cendrillon updated MAHOUT-923: -- Attachment: MAHOUT-923.patch > Row mean job for PCA > > >

[jira] [Commented] (MAHOUT-923) Row mean job for PCA

2011-12-12 Thread jirapos...@reviews.apache.org (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167967#comment-13167967 ] jirapos...@reviews.apache.org commented on MAHOUT-923: --

Re: Review Request: Row mean job for PCA

2011-12-12 Thread Raphael Cendrillon
> On 2011-12-12 02:10:01, Dmitriy Lyubimov wrote: > > Hm. I hope i did not read the code or miss something. > > > > 1 -- i am not sure this will actually work as intended unless # of reducers > > is corced to 1, of which i see no mention in the code. > > 2 -- mappers do nothing, passing on al

Re: Review Request: Row mean job for PCA

2011-12-12 Thread Raphael Cendrillon
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3147/ --- (Updated 2011-12-13 00:10:57.848590) Review request for mahout and Dmitriy Lyubi

[jira] [Commented] (MAHOUT-923) Row mean job for PCA

2011-12-12 Thread jirapos...@reviews.apache.org (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167965#comment-13167965 ] jirapos...@reviews.apache.org commented on MAHOUT-923: --

Re: Review Request: Row mean job for PCA

2011-12-12 Thread Raphael Cendrillon
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3147/ --- (Updated 2011-12-13 00:09:03.441301) Review request for mahout. Changes --

[jira] [Issue Comment Edited] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

2011-12-12 Thread Dmitriy Lyubimov (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167907#comment-13167907 ] Dmitriy Lyubimov edited comment on MAHOUT-922 at 12/12/11 10:44 PM:

[jira] [Commented] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

2011-12-12 Thread Dmitriy Lyubimov (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167907#comment-13167907 ] Dmitriy Lyubimov commented on MAHOUT-922: - Another idea that may further relief cp

Jenkins build is still unstable: Mahout-Quality #1248

2011-12-12 Thread Apache Jenkins Server
See

[jira] [Commented] (MAHOUT-924) Allow creation of symmetric adjacency matrices

2011-12-12 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167891#comment-13167891 ] Hudson commented on MAHOUT-924: --- Integrated in Mahout-Quality #1248 (See [https://builds.ap

[jira] [Updated] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

2011-12-12 Thread Dmitriy Lyubimov (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov updated MAHOUT-922: Attachment: MAHOUT-922.patch Updates from Sebastien and further memory tweaks to reduce GC

[jira] [Commented] (MAHOUT-875) Allow to obtain Mahout version information through the Java API

2011-12-12 Thread Oliver B. Fischer (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167871#comment-13167871 ] Oliver B. Fischer commented on MAHOUT-875: -- Can someone please review my patch?

[jira] [Commented] (MAHOUT-923) Row mean job for PCA

2011-12-12 Thread jirapos...@reviews.apache.org (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167798#comment-13167798 ] jirapos...@reviews.apache.org commented on MAHOUT-923: -- bq. On 201

Re: Tests running time

2011-12-12 Thread Isabel Drost
On 08.12.2011 Grant Ingersoll wrote: > great, b/c these really are mainstream tests. I suspect most of our > overhead is simply due to running map reduce jobs. Is there anything to be gained by checking the code itself with mrunit (I know it does have limitations, but if those tests really only

Re: [jira] [Commented] (MAHOUT-917) Build takes too long

2011-12-12 Thread Isabel Drost
On 08.12.2011 Lance Norskog wrote: > Given all this (especially the third), a separate Maven sub-project makes > more sense; it should not be part of the default build. It may just be my personal preference, but I prefer having unit test accompany the maven module they test. That also makes check

Jenkins build is unstable: Mahout-Quality #1247

2011-12-12 Thread Apache Jenkins Server
See

[jira] [Commented] (MAHOUT-925) Evaluate the reach of recommender algorithms

2011-12-12 Thread Sean Owen (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167573#comment-13167573 ] Sean Owen commented on MAHOUT-925: -- This is fine, though, don't you want to count like so

[jira] [Updated] (MAHOUT-925) Evaluate the reach of recommender algorithms

2011-12-12 Thread Anatoliy Kats (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anatoliy Kats updated MAHOUT-925: - Attachment: MAHOUT-925.patch > Evaluate the reach of recommender algorithms > ---

[jira] [Created] (MAHOUT-925) Evaluate the reach of recommender algorithms

2011-12-12 Thread Anatoliy Kats (Created) (JIRA)
Evaluate the reach of recommender algorithms Key: MAHOUT-925 URL: https://issues.apache.org/jira/browse/MAHOUT-925 Project: Mahout Issue Type: Improvement Components: Collaborative Filte

Build failed in Jenkins: Mahout-Quality #1246

2011-12-12 Thread Apache Jenkins Server
See -- [...truncated 777 lines...] A core/src/main/java/org/apache/mahout/fpm/pfpgrowth/fpgrowth/FPTreeDepthCache.java A core/src/main/java/org/apache/mahout/fpm/pfpgrowth/CountDescending

[jira] [Updated] (MAHOUT-924) Allow creation of symmetric adjacency matrices

2011-12-12 Thread Sebastian Schelter (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-924: -- Status: Patch Available (was: Open) > Allow creation of symmetric adjacency matric

[jira] [Updated] (MAHOUT-924) Allow creation of symmetric adjacency matrices

2011-12-12 Thread Sebastian Schelter (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-924: -- Attachment: MAHOUT-924.patch > Allow creation of symmetric adjacency matrices > ---

Re: Tests running time

2011-12-12 Thread Sean Owen
On Mon, Dec 12, 2011 at 1:15 PM, Grant Ingersoll wrote: > I'm not sure if it is completely valid, but it seems to me that if our > tests can't run concurrently, it also raises a doubt as to whether some of > our classes can be run concurrently. > I don't think there's a cause for concern; it's re

Re: Tests running time

2011-12-12 Thread Grant Ingersoll
On Dec 12, 2011, at 7:11 AM, Sean Owen wrote: > On Mon, Dec 12, 2011 at 11:59 AM, Grant Ingersoll wrote: > >> In Lucene, we simply print out what the seed is if the tests fail and then >> you can rerun that test by saying ant -Dtestseed= test >> > > I like that -- it's a separate thing but

[jira] [Created] (MAHOUT-924) Allow creation of symmetric adjacency matrices

2011-12-12 Thread Sebastian Schelter (Created) (JIRA)
Allow creation of symmetric adjacency matrices -- Key: MAHOUT-924 URL: https://issues.apache.org/jira/browse/MAHOUT-924 Project: Mahout Issue Type: Improvement Components: Graph Affec

Re: Tests running time

2011-12-12 Thread Sean Owen
On Mon, Dec 12, 2011 at 11:59 AM, Grant Ingersoll wrote: > In Lucene, we simply print out what the seed is if the tests fail and then > you can rerun that test by saying ant -Dtestseed= test > I like that -- it's a separate thing but it's a fine idea too. It lets you at least try different se

[jira] [Commented] (MAHOUT-918) Implement SGD based classifiers using MapReduce

2011-12-12 Thread jirapos...@reviews.apache.org (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167462#comment-13167462 ] jirapos...@reviews.apache.org commented on MAHOUT-918: -- bq. On 201

Re: Review Request: MAHOUT-918 Parallelized SGD in MapReduce

2011-12-12 Thread issei yoshida
> On 2011-12-08 07:04:49, Ted Dunning wrote: > > trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/PassiveAggressiveMapper.java, > > line 36 > > > > > > Needs a comment about how this works. Added comme

Re: Tests running time

2011-12-12 Thread Grant Ingersoll
On Dec 12, 2011, at 2:05 AM, Sean Owen wrote: > > It *seems* so much more like a test issue to me, solvable in the test > harness, and in a clear way: just split tests n ways across n JVMs instead > of 1 JVM with n threads. No (further) reliance on code being exactly well > behaved. It's just we

[jira] [Commented] (MAHOUT-918) Implement SGD based classifiers using MapReduce

2011-12-12 Thread jirapos...@reviews.apache.org (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167459#comment-13167459 ] jirapos...@reviews.apache.org commented on MAHOUT-918: --

Re: Tests running time

2011-12-12 Thread Grant Ingersoll
On Dec 11, 2011, at 2:42 PM, Sean Owen wrote: > On Sun, Dec 11, 2011 at 7:35 PM, Ted Dunning wrote: > >> The right way to handle this is to have instances get a random number >> generator that works like it should. Magic resets in the middle of >> operation are not a good idea. >> > > Why wo

Re: Review Request: MAHOUT-918 Parallelized SGD in MapReduce

2011-12-12 Thread issei yoshida
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3072/ --- (Updated 2011-12-12 11:51:59.547649) Review request for mahout. Summary --

[jira] [Commented] (MAHOUT-923) Row mean job for PCA

2011-12-12 Thread jirapos...@reviews.apache.org (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167443#comment-13167443 ] jirapos...@reviews.apache.org commented on MAHOUT-923: -- bq. On 201

Re: Review Request: Row mean job for PCA

2011-12-12 Thread Raphael Cendrillon
> On 2011-12-12 02:10:01, Dmitriy Lyubimov wrote: > > Hm. I hope i did not read the code or miss something. > > > > 1 -- i am not sure this will actually work as intended unless # of reducers > > is corced to 1, of which i see no mention in the code. > > 2 -- mappers do nothing, passing on al

[jira] [Commented] (MAHOUT-923) Row mean job for PCA

2011-12-12 Thread jirapos...@reviews.apache.org (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167440#comment-13167440 ] jirapos...@reviews.apache.org commented on MAHOUT-923: --

Re: Review Request: Row mean job for PCA

2011-12-12 Thread Raphael Cendrillon
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3147/ --- (Updated 2011-12-12 10:41:46.013180) Review request for mahout. Summary --

Re: [jira] [Commented] (MAHOUT-923) Row mean job for PCA

2011-12-12 Thread Dmitriy Lyubimov
if it's coherent with the rest of the code there, i guess it is benign to use it for this particular purpose. I can't think of a case where we'd want to pull exactly one vector into a MR job. On Mon, Dec 12, 2011 at 12:54 AM, Raphael Cendrillon wrote: > You've convinced me that this is probably

[jira] [Issue Comment Edited] (MAHOUT-797) MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A

2011-12-12 Thread Dmitriy Lyubimov (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167396#comment-13167396 ] Dmitriy Lyubimov edited comment on MAHOUT-797 at 12/12/11 8:54 AM: -

Re: [jira] [Commented] (MAHOUT-923) Row mean job for PCA

2011-12-12 Thread Raphael Cendrillon
You've convinced me that this is probably a bad idea. You never know when this might come back to bite later. On 12 Dec, 2011, at 12:50 AM, Dmitriy Lyubimov wrote: > Oh now i remember what the deal with NullWritable was. > > yes sequence file would read it as in > >Configuration conf = ne

Re: [jira] [Commented] (MAHOUT-923) Row mean job for PCA

2011-12-12 Thread Dmitriy Lyubimov
Oh now i remember what the deal with NullWritable was. yes sequence file would read it as in Configuration conf = new Configuration(); FileSystem fs = FileSystem.getLocal(new Configuration()); Path testPath = new Path("name.seq"); IntWritable iw = new IntWritable(); SequenceF

Re: [jira] [Commented] (MAHOUT-923) Row mean job for PCA

2011-12-12 Thread Dmitriy Lyubimov
See suggestion in the review board (if i use it correctly, i am still not sure what to do about it :) On Mon, Dec 12, 2011 at 12:28 AM, Raphael Cendrillon wrote: > Thanks Dmitry. I think I understand more clearly now. Are you saying I should > make a map only job and then just use some post-proc

[jira] [Commented] (MAHOUT-923) Row mean job for PCA

2011-12-12 Thread jirapos...@reviews.apache.org (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167406#comment-13167406 ] jirapos...@reviews.apache.org commented on MAHOUT-923: -- bq. On 201

Re: [jira] [Commented] (MAHOUT-923) Row mean job for PCA

2011-12-12 Thread Raphael Cendrillon
Thanks Dmitry. I think I understand more clearly now. Are you saying I should make a map only job and then just use some post-processing to manually combine the map outputs? How many rows should I process per map job? On Dec 12, 2011, at 12:13 AM, Dmitriy Lyubimov wrote: >> A combiner is defi

Re: [jira] [Commented] (MAHOUT-923) Row mean job for PCA

2011-12-12 Thread Dmitriy Lyubimov
> A combiner is definitely the next step. It is definitely not. Why do you need to sort??? > One question, is there already a writable for tuples of e.g. int and Vector, > or should I just write one from scratch? >From scratch. Or, you can save n as first element in the vector, why not. Your f

Re: [jira] [Commented] (MAHOUT-923) Row mean job for PCA

2011-12-12 Thread Lance Norskog
The person using this job knows the right vector to use. It may be that it gets a lot of sparse vectors but will become a dense vector. Or a vector that writes to a database. Or something else. In fact, I may just want to turn a vector from Dense to Sparse, and I could achieve that with this job.

Re: [jira] [Commented] (MAHOUT-923) Row mean job for PCA

2011-12-12 Thread Lance Norskog
To use a combiner, TupleWritable should be fine. I have not used it. But it will copy the entire vector. You would want to minimize this. If this is a big problem, you can do an ugly trick: you store the counter as the key value, but make a custom Writable that always returns 'this equals the othe

Re: Review Request: Row mean job for PCA

2011-12-12 Thread Raphael Cendrillon
Thanks Lance. That makes a lot of sense. You're right regarding the need for combiners. What's the best way to create an Int + Vector writable pair? Should I just define one from scratch or is there some framework already in Mahout I should reuse? Thanks again! On Dec 11, 2011, at 11:59 PM, L