Re: Review Request: Row mean job for PCA

2011-12-11 Thread Lance Norskog
There is NullWritable as the key between mapper and reducer, and as the first value in the pairs saved in a SequenceFile. As the mapper->reducer key, it works. In mahout, SequenceFile vectors and matrices are stored as pairs. Even though this job is in the middle of another job, it should follow

[jira] [Commented] (MAHOUT-797) MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A

2011-12-11 Thread Dmitriy Lyubimov (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167396#comment-13167396 ] Dmitriy Lyubimov commented on MAHOUT-797: - Thank you. rank deficiency of the inp

Re: Tests running time

2011-12-11 Thread Sean Owen
On Mon, Dec 12, 2011 at 2:49 AM, Ted Dunning wrote: > > > Why would the caller care? It's all random numbers, whether "reset" or > > not. > > > > The care is about determinism. > Completely agree, it's the tests that care, not the caller itself. > If the RandUtils notes that the current threa

[jira] [Commented] (MAHOUT-797) MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A

2011-12-11 Thread Ted Dunning (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167379#comment-13167379 ] Ted Dunning commented on MAHOUT-797: The problem you are having is that a rank deficie

Re: Review Request: Row mean job for PCA

2011-12-11 Thread Dmitriy Lyubimov
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3147/#review3838 --- Hm. I hope i did not read the code or miss something. 1 -- i am not

[jira] [Commented] (MAHOUT-923) Row mean job for PCA

2011-12-11 Thread Raphael Cendrillon (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167341#comment-13167341 ] Raphael Cendrillon commented on MAHOUT-923: --- Thanks Lance. A combiner is definit

Re: Review Request: Row mean job for PCA

2011-12-11 Thread Raphael Cendrillon
> On 2011-12-12 02:10:01, Dmitriy Lyubimov wrote: > > Hm. I hope i did not read the code or miss something. > > > > 1 -- i am not sure this will actually work as intended unless # of reducers > > is corced to 1, of which i see no mention in the code. > > 2 -- mappers do nothing, passing on al

[jira] [Commented] (MAHOUT-923) Row mean job for PCA

2011-12-11 Thread Lance Norskog (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167337#comment-13167337 ] Lance Norskog commented on MAHOUT-923: -- MatrixRowMeanJob writes but the convention f

[jira] [Commented] (MAHOUT-923) Row mean job for PCA

2011-12-11 Thread jirapos...@reviews.apache.org (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167338#comment-13167338 ] jirapos...@reviews.apache.org commented on MAHOUT-923: -- bq. On 201

[jira] [Commented] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

2011-12-11 Thread Dmitriy Lyubimov (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167329#comment-13167329 ] Dmitriy Lyubimov commented on MAHOUT-922: - and oh yeah i use double[][] for blocks

[jira] [Commented] (MAHOUT-797) MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A

2011-12-11 Thread Dmitriy Lyubimov (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167327#comment-13167327 ] Dmitriy Lyubimov commented on MAHOUT-797: - Hm... can't seem to get R to compute ei

Re: Tests running time

2011-12-11 Thread Ted Dunning
On Sun, Dec 11, 2011 at 6:48 PM, Lance Norskog wrote: > What about using ThreadLocal generators? > > On Sun, Dec 11, 2011 at 11:42 AM, Sean Owen wrote: > > On Sun, Dec 11, 2011 at 7:35 PM, Ted Dunning > wrote: > > > >> The right way to handle this is to have instances get a random number > >> g

[jira] [Commented] (MAHOUT-923) Row mean job for PCA

2011-12-11 Thread jirapos...@reviews.apache.org (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167312#comment-13167312 ] jirapos...@reviews.apache.org commented on MAHOUT-923: --

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

2011-12-11 Thread Raphael Cendrillon (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167309#comment-13167309 ] Raphael Cendrillon commented on MAHOUT-880: --- Thanks Dmitry. I've pulled the row

Re: [jira] [Issue Comment Edited] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

2011-12-11 Thread Raphael Cendrillon
Hi Dmitry, I've pulled this out as a separate issue under MAHOUT-923. Could you please take a look? Thanks! On Dec 8, 2011, at 11:38 AM, "Dmitriy Lyubimov (Issue Comment Edited) (JIRA)" wrote: > >[ > https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system

[jira] [Updated] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

2011-12-11 Thread Dmitriy Lyubimov (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov updated MAHOUT-922: Attachment: MAHOUT-922.patch > SSVD: ABt Job tweaks for extra sparse inputs > -

Re: Tests running time

2011-12-11 Thread Lance Norskog
What about using ThreadLocal generators? On Sun, Dec 11, 2011 at 11:42 AM, Sean Owen wrote: > On Sun, Dec 11, 2011 at 7:35 PM, Ted Dunning wrote: > >> The right way to handle this is to have instances get a random number >> generator that works like it should.  Magic resets in the middle of >> o

[jira] [Updated] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

2011-12-11 Thread Dmitriy Lyubimov (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov updated MAHOUT-922: Description: Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). AB' p

Jenkins build is still unstable: Mahout-Quality #1245

2011-12-11 Thread Apache Jenkins Server
See

[jira] [Commented] (MAHOUT-923) Row mean job for PCA

2011-12-11 Thread jirapos...@reviews.apache.org (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167292#comment-13167292 ] jirapos...@reviews.apache.org commented on MAHOUT-923: --

[jira] [Commented] (MAHOUT-923) Row mean job for PCA

2011-12-11 Thread jirapos...@reviews.apache.org (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167293#comment-13167293 ] jirapos...@reviews.apache.org commented on MAHOUT-923: --

Re: Review Request: Row mean job for PCA

2011-12-11 Thread Raphael Cendrillon
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3147/ --- (Updated 2011-12-12 00:30:24.091994) Review request for mahout. Summary --

Review Request: Row mean job for PCA

2011-12-11 Thread Raphael Cendrillon
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3147/ --- Review request for mahout. Summary --- Here's a patch with a simple job to

[jira] [Commented] (MAHOUT-923) Row mean job for PCA

2011-12-11 Thread Raphael Cendrillon (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167291#comment-13167291 ] Raphael Cendrillon commented on MAHOUT-923: --- It might be worthwhile to pull this

[jira] [Updated] (MAHOUT-923) Row mean job for PCA

2011-12-11 Thread Raphael Cendrillon (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raphael Cendrillon updated MAHOUT-923: -- Attachment: MAHOUT-923.patch > Row mean job for PCA > > >

[jira] [Created] (MAHOUT-923) Row mean job for PCA

2011-12-11 Thread Raphael Cendrillon (Created) (JIRA)
Row mean job for PCA Key: MAHOUT-923 URL: https://issues.apache.org/jira/browse/MAHOUT-923 Project: Mahout Issue Type: Improvement Components: Math Affects Versions: 0.6 Reporter: Raphael Cendri

[jira] [Commented] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

2011-12-11 Thread Sebastian Schelter (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167205#comment-13167205 ] Sebastian Schelter commented on MAHOUT-922: --- A few details on the testcase: I'm

[jira] [Updated] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

2011-12-11 Thread Dmitriy Lyubimov (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov updated MAHOUT-922: Description: Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). AB' p

[jira] [Created] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

2011-12-11 Thread Dmitriy Lyubimov (Created) (JIRA)
SSVD: ABt Job tweaks for extra sparse inputs Key: MAHOUT-922 URL: https://issues.apache.org/jira/browse/MAHOUT-922 Project: Mahout Issue Type: Improvement Components: Math Affects Ve

[jira] [Commented] (MAHOUT-840) Decision Forests should support Regression problems

2011-12-11 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167189#comment-13167189 ] Hudson commented on MAHOUT-840: --- Integrated in Mahout-Quality #1244 (See [https://builds.ap

Jenkins build is still unstable: Mahout-Quality #1244

2011-12-11 Thread Apache Jenkins Server
See

[jira] [Updated] (MAHOUT-921) FPG uses a lot of boxed primitives - this patch eliminates a bunch of List

2011-12-11 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-921: -- Attachment: MAHOUT-921.patch Note patch assumes MAHOUT-920 has been applied! > FPG use

[jira] [Created] (MAHOUT-921) FPG uses a lot of boxed primitives - this patch eliminates a bunch of List

2011-12-11 Thread tom pierce (Created) (JIRA)
FPG uses a lot of boxed primitives - this patch eliminates a bunch of List Key: MAHOUT-921 URL: https://issues.apache.org/jira/browse/MAHOUT-921 Project: Mahout

Re: Tests running time

2011-12-11 Thread Sean Owen
On Sun, Dec 11, 2011 at 7:35 PM, Ted Dunning wrote: > The right way to handle this is to have instances get a random number > generator that works like it should. Magic resets in the middle of > operation are not a good idea. > Why would the caller care? It's all random numbers, whether "reset"

Re: Tests running time

2011-12-11 Thread Ted Dunning
The right way to handle this is to have instances get a random number generator that works like it should. Magic resets in the middle of operation are not a good idea. I think we need a better way to inject generators that doesn't involve statics. On Sun, Dec 11, 2011 at 6:24 AM, Sean Owen wrot

[jira] [Commented] (MAHOUT-840) Decision Forests should support Regression problems

2011-12-11 Thread Deneche A. Hakim (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167151#comment-13167151 ] Deneche A. Hakim commented on MAHOUT-840: - the examples went all fine, I just comm

[jira] [Commented] (MAHOUT-913) Style changes / discussion

2011-12-11 Thread Sean Owen (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167136#comment-13167136 ] Sean Owen commented on MAHOUT-913: -- You mean it has a parameter "a"? I would not write an

[jira] [Issue Comment Edited] (MAHOUT-913) Style changes / discussion

2011-12-11 Thread Deneche A. Hakim (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167133#comment-13167133 ] Deneche A. Hakim edited comment on MAHOUT-913 at 12/11/11 4:51 PM: -

[jira] [Commented] (MAHOUT-913) Style changes / discussion

2011-12-11 Thread Deneche A. Hakim (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167133#comment-13167133 ] Deneche A. Hakim commented on MAHOUT-913: - about javadoc, if a method has an attri

Is there a bug in SpectralKMeansDriver?

2011-12-11 Thread 孙 明明
Hi, I found that the SpectralKMeansDriver writes: LanczosState state = new LanczosState(L, overshoot, numDims, solver.getInitialVector(L)); However, the LanczosState constructor is: LanczosState(VectorIterable corpus, int numCols, int desiredRank, Vector initialVector). I believe

[jira] [Updated] (MAHOUT-913) Style changes / discussion

2011-12-11 Thread Sean Owen (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-913: - Attachment: Sean.xml My personal IJ inspections config preferences > Style changes / dis

[jira] [Commented] (MAHOUT-913) Style changes / discussion

2011-12-11 Thread Deneche A. Hakim (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167102#comment-13167102 ] Deneche A. Hakim commented on MAHOUT-913: - Sean, could you post your IntelliJ conf

Re: Tests running time

2011-12-11 Thread Sean Owen
Yes that's exactly what's happening -- not why the tests aren't running fast, but why running them in parallel in one JVM results in non-deterministic results. If by "not use statics" you mean hold a static reference to a Random in client code, yes, that could help, except that you'd also have to

Re: Tests running time

2011-12-11 Thread Grant Ingersoll
As a point of reference, if I comment out the reset() code in useTestSeed for the math package, all tests pass w/ parallel execution and fork once. Of course, that's just one piece. I guess I don't understand why we need to do all that reset stuff there anyway. If you are using the test see

Re: Tests running time

2011-12-11 Thread Grant Ingersoll
In working through what I _think_ will be the primary viable way to make this stuff faster (parallel execution, fork once) it appears to me that the primary concurrency issue is due to how we initialize the test seed and the fact that we loop over all RandomWrapper objects and reset them. So, i