[jira] [Comment Edited] (MAHOUT-1200) Mahout tests depend on writing to /tmp/hadoop-$user

2013-04-30 Thread Isabel Drost (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646424#comment-13646424 ] Isabel Drost edited comment on MAHOUT-1200 at 5/1/13 6:52 AM: -

[jira] [Commented] (MAHOUT-1200) Mahout tests depend on writing to /tmp/hadoop-$user

2013-04-30 Thread Isabel Drost (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646422#comment-13646422 ] Isabel Drost commented on MAHOUT-1200: -- I'm happy with any naming schema, for add lo

[jira] [Commented] (MAHOUT-1200) Mahout tests depend on writing to /tmp/hadoop-$user

2013-04-30 Thread Isabel Drost (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646424#comment-13646424 ] Isabel Drost commented on MAHOUT-1200: -- I tried to loosly follow the ideas in http:/

Re: Cannot resolve symbol 'OpenIntObjectHashMap'

2013-04-30 Thread Ted Dunning
On Tue, Apr 30, 2013 at 2:59 PM, Andrew Musselman < andrew.mussel...@gmail.com> wrote: > Upgrading to IntelliJ 12 has fixed this prob! > Happy days.

Re: Cannot resolve symbol 'OpenIntObjectHashMap'

2013-04-30 Thread Andrew Musselman
Upgrading to IntelliJ 12 has fixed this prob! Thanks On Tue, Apr 30, 2013 at 2:39 PM, Andrew Musselman < andrew.mussel...@gmail.com> wrote: > PS IntelliJ Community Edition 11.1.5 for Linux > > > On Tue, Apr 30, 2013 at 2:36 PM, Andrew Musselman < > andrew.mussel...@gmail.com> wrote: > >> Get th

Re: Cannot resolve symbol 'OpenIntObjectHashMap'

2013-04-30 Thread Andrew Musselman
PS IntelliJ Community Edition 11.1.5 for Linux On Tue, Apr 30, 2013 at 2:36 PM, Andrew Musselman < andrew.mussel...@gmail.com> wrote: > Get the svn repo from here? > https://cwiki.apache.org/MAHOUT/buildingmahout.html > > svn co http://svn.apache.org/repos/asf/mahout/trunk > > > > On Tue, Apr 16

Re: Cannot resolve symbol 'OpenIntObjectHashMap'

2013-04-30 Thread Andrew Musselman
Get the svn repo from here? https://cwiki.apache.org/MAHOUT/buildingmahout.html svn co http://svn.apache.org/repos/asf/mahout/trunk On Tue, Apr 16, 2013 at 8:11 PM, Ted Dunning wrote: > This still isn't right. > > WHat happens if you clone mahout again (to get a clean copy) and then open > th

[jira] [Commented] (MAHOUT-916) Make Mahout's tests run in parallel

2013-04-30 Thread Isabel Drost (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645974#comment-13645974 ] Isabel Drost commented on MAHOUT-916: - As for runtime: At least on my machine (4 cores

[jira] [Commented] (MAHOUT-1200) Mahout tests depend on writing to /tmp/hadoop-$user

2013-04-30 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645969#comment-13645969 ] Robin Anil commented on MAHOUT-1200: Can't you create a new subdirectory for each qua

[jira] [Updated] (MAHOUT-1201) Some Mahout jobs do not pass user supplied Configuration object to sub jobs

2013-04-30 Thread Isabel Drost (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost updated MAHOUT-1201: - Attachment: MAHOUT-1201-solver.patch > Some Mahout jobs do not pass user supplied Configurat

[jira] [Updated] (MAHOUT-1201) Some Mahout jobs do not pass user supplied Configuration object to sub jobs

2013-04-30 Thread Isabel Drost (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost updated MAHOUT-1201: - Attachment: MAHOUT-1201-pfpgrowth.patch Changes related to the pfpgrowth implementation (make su

[jira] [Updated] (MAHOUT-1201) Some Mahout jobs do not pass user supplied Configuration object to sub jobs

2013-04-30 Thread Isabel Drost (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost updated MAHOUT-1201: - Attachment: MAHOUT-1201-entropy.patch Changes related to the entropy computation stuff (make sur

[jira] [Updated] (MAHOUT-1201) Some Mahout jobs do not pass user supplied Configuration object to sub jobs

2013-04-30 Thread Isabel Drost (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost updated MAHOUT-1201: - Attachment: MAHOUT-1201-clustering.patch Changes related to our clustering code.

[jira] [Updated] (MAHOUT-916) Make Mahout's tests run in parallel

2013-04-30 Thread Isabel Drost (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost updated MAHOUT-916: Attachment: MAHOUT-916.patch Updated version, note that you will also need MAHOUT-1200 and MAHOUT-1

[jira] [Updated] (MAHOUT-1200) Mahout tests depend on writing to /tmp/hadoop-$user

2013-04-30 Thread Isabel Drost (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost updated MAHOUT-1200: - Attachment: MAHOUT-1200.patch Fixes the tests only, make sure to first apply changes for MAHOUT-

[jira] [Created] (MAHOUT-1201) Some Mahout jobs do not pass user supplied Configuration object to sub jobs

2013-04-30 Thread Isabel Drost (JIRA)
Isabel Drost created MAHOUT-1201: Summary: Some Mahout jobs do not pass user supplied Configuration object to sub jobs Key: MAHOUT-1201 URL: https://issues.apache.org/jira/browse/MAHOUT-1201 Project:

[jira] [Created] (MAHOUT-1200) Mahout tests depend on writing to /tmp/hadoop-$user

2013-04-30 Thread Isabel Drost (JIRA)
Isabel Drost created MAHOUT-1200: Summary: Mahout tests depend on writing to /tmp/hadoop-$user Key: MAHOUT-1200 URL: https://issues.apache.org/jira/browse/MAHOUT-1200 Project: Mahout Issue Ty

Re: Review Request: MAHOUT-1192 [2]: Speed up Vector Operations

2013-04-30 Thread Dan Filimon
Yeah, we talked about 7 a while back in a chat, thanks for reminding me! As for Times, that's really weid. That extra code should pick commute the vectors if one is more dense than the other and the performance of Seq.fn(Dense) and Dense.fn(Seq) should be the same. And, even weirded, when I ran th

Re: Review Request: MAHOUT-1192 [2]: Speed up Vector Operations

2013-04-30 Thread Robin Anil
Before I forget. One more thing 7. Add test harness for functions. So if say a function says isLikeLeftPlus() == true. Take random values from Double Range -inf to +inf to make sure its true for those values. Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. On Tue, Apr 30, 2013 at

Re: Review Request: MAHOUT-1192 [2]: Speed up Vector Operations

2013-04-30 Thread Robin Anil
Tried with your changes pulled in. Other than maybe variance due to my process state (of my macbook). The benchmarks that had the regression don't show any marked improvement. See column Y https://docs.google.com/spreadsheet/ccc?key=0AochdzPoBmWodG9RTms1UG40YlNQd3ByUFpQY0FLWmc#gid=8 Robin Anil |

Build failed in Jenkins: Mahout-Examples-Cluster-Reuters-II #471

2013-04-30 Thread Apache Jenkins Server
See Changes: [robinanil] Disable generation of clusters and cluster benchmarks at a few more places [robinanil] Increase the lead time such that any operations faster than 1000/s gets JITed, (default threshold in O

Re: Review Request: MAHOUT-1192 [2]: Speed up Vector Operations

2013-04-30 Thread Dan Filimon
Robin, regarding Times, I think it should work the same now. I changed the swapping condition in AbstractVector.times to something more readable. As for norm1, it looks like it's exactly the same. I don't see what's causing the slowdown other than the indirection. Could you please try the new vers

Re: Review Request: MAHOUT-1192 [2]: Speed up Vector Operations

2013-04-30 Thread Ted Dunning
On Tue, Apr 30, 2013 at 9:01 AM, Robin Anil wrote: > Apart from this I am not too happy with names of isLikeLeftMult, > isLikeRightPlus etc. But I dont have a good alternate either. Please run > this by Ted. > I had a similar reaction. Ultimately, if a name is explainable (and these are), I don

Re: Review Request: MAHOUT-1192 [2]: Speed up Vector Operations

2013-04-30 Thread Robin Anil
4,5: I think its a question of correctness. For isEquals() the two vectors have to be exact. Otherwise clients should try to call something like isApproximatelyEquals method. For other places you have to check if Math.abs(map.apply(0.0) - 0.0) is exactly zero, again this has to be exact for t

Re: Review Request: MAHOUT-1192 [2]: Speed up Vector Operations

2013-04-30 Thread Dan Filimon
I'm looking at the norm1 and times regressions again, maybe there's something I missed. I agree with 1 through 3. About 4, 5, do you think we'd lose too much precision? About 6, you're giving examples of tests, not different special cases, right? As for the names, they're unfortunate, but I pic

Re: Writing tests to gain faimliarity

2013-04-30 Thread Robin Anil
Tests+++. Creating toy dataset expectations across multiple classifier will be a great thing. Right now we don't track if there are regressions in the codebase with integration tests. You could maybe try and create a set of smallish datasets and run each classifier, spit some metrics and put expect

Re: What about implementing ELM?

2013-04-30 Thread Ted Dunning
This really looks like a random projection followed by something like regularized regression. It is not news that many applications of neural nets don't need multiple layers, especially in large systems. Likewise, it isn't news that random project preserves approximate metrics and thus allows lea

Re: Review Request: MAHOUT-1192 [2]: Speed up Vector Operations

2013-04-30 Thread Robin Anil
I see that the end is tantalizingly near. Few other review comments: 1) Remove all unused code. 2) Do not allow construction of empty vectors. Just makes no sense (Unless someone strongly disagrees). 3) Comment all classes (AssignNonzerosIterateThisLookupThat etc). 4) Change < Constants.EPSILON ch

Re: Review Request: MAHOUT-1192 [2]: Speed up Vector Operations

2013-04-30 Thread Robin Anil
Yes the incrementQuick is a known speed booster (due to half the number of key hash generation). You can leave that to me. I can make it faster after you check this in. It might require some refactor of the increment quick interface. What about the regressions in SeqSparseVector norm1? and Dense.t

Re: What about implementing ELM?

2013-04-30 Thread Reto Matter
Hmm, this sounds like a cool idea On Tue, Apr 30, 2013 at 4:11 PM, Sean Owen wrote: > I've just skimmed it and so probably missed some key details, but this > looks like a hidden layer model where you just randomly pick values > for the hidden layer parameters, and then solve a simple linea

Re: What about implementing ELM?

2013-04-30 Thread Sean Owen
I've just skimmed it and so probably missed some key details, but this looks like a hidden layer model where you just randomly pick values for the hidden layer parameters, and then solve a simple linear regression model to predict outputs from the randomized hidden layer. The random values are neve

Re: What about implementing ELM?

2013-04-30 Thread Reto Matter
As far as I understand ELMs, the main difference is that learning in that particular setting comes down to 3 relatively simple steps and in fact no iteration as in other learning algos (e.g. Backpropagation) is needed. So, in that respect, the learning phase is blazingly fast compared to other appr

Re: What about implementing ELM?

2013-04-30 Thread Louis Hénault
I am not at home where I have my courses note about it, but you can have a look here for example: http://msrvideo.vo.msecnd.net/rmcvideos/144113/dl/144113.pdf page 50 you have a comparison between SVM and ELM, and ELM outperform SVM for the testing and training times. It is not easy to give theore

Re: What about implementing ELM?

2013-04-30 Thread Sean Owen
If you care to work on it, you should work on it. Implementations exist or don't exist because someone created it, or nobody was interested in creating it. I have never heard of 'extreme learning' and found this summary: http://www.slideshare.net/formatc666/extreme-learning-machinetheory-and-appli

What about implementing ELM?

2013-04-30 Thread Louis Hénault
Hi everybody, Many people are trying to integrate SVM to Mahout. I can understand since SVM are really efficient in a "small data" context. But, as you may know, SVM has: -a slow learning speed -a poor learning scalability In contrast, ELM give results which are usually at least as good as SVM's

[jira] [Commented] (MAHOUT-916) Make Mahout's tests run in parallel

2013-04-30 Thread Isabel Drost (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645369#comment-13645369 ] Isabel Drost commented on MAHOUT-916: - Unfortunately it isn't quite as easy - Hadoop s

Re: Review Request: MAHOUT-1192 [2]: Speed up Vector Operations

2013-04-30 Thread Dan Filimon
So now, it RandomAccessSparseVector seems to be the most affected. Pretty much every regression is related to RASV. Could it be that it's better to handle it as a non-constant time update Vector and have drop the in-place updates? Otherwise, the code that implements Minus is pretty much the same as