Re: [GSOC] Array of questions

2010-05-26 Thread Shannon Quinn
Hi again, Thanks for all the responses! Let's see if I can address what was said several hours ago... Blogging about your project is great, as is putting stuff on the wiki, but > make sure you post on this list a link to wherever you put it, because this > list is where all real communication bet

[jira] Updated: (MAHOUT-167) Convert clustering code to Hadoop 0.20 API

2010-05-26 Thread Jeff Eastman (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Eastman updated MAHOUT-167: Attachment: MAHOUT-167.patch Work in progress checkpoint update that needs work to make a MockReduc

Re: Hudson build is unstable: Mahout-Quality #28

2010-05-26 Thread Robin Anil
All dashboards fixed. Plenty of work ahead On Thu, May 27, 2010 at 4:03 AM, Apache Hudson Server wrote: > See > > >

Hudson build is unstable: Mahout-Quality #28

2010-05-26 Thread Apache Hudson Server
See

Re: Moving to new Hadoop APIs

2010-05-26 Thread Jeff Eastman
NVM, I think I've got DistanceMeasures sorted out. Still working on creating proper MockContexts to feed to the mapper and reducer tests. I'll post my patch in whatever status it is at the end of today. On 5/26/10 1:35 PM, Jeff Eastman wrote: I've got most of Canopy converted but am exploding m

Re: Hudson build is unstable: Mahout-Quality #26

2010-05-26 Thread Robin Anil
Checkstyle and findbugs werent coming up in the dashboard. Doing some tweakes. Fired another build just now. Please disregard hudson emails if it says something is broken

Hudson build is unstable: Mahout-Quality #26

2010-05-26 Thread Apache Hudson Server
See

Re: [jira] Updated: (MAHOUT-392) Test cases for logGamma, Distribution.normal and Distribution.beta, fix for Distribution.normal

2010-05-26 Thread Ted Dunning
You are my hero. On Wed, May 26, 2010 at 12:15 PM, Sean Owen (JIRA) wrote: > > [ > https://issues.apache.org/jira/browse/MAHOUT-392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] > > Sean Owen updated MAHOUT-392: > - > >Status: Resolved

Re: Hudson build is unstable: Mahout-Quality #24

2010-05-26 Thread Drew Farris
Also, from what I remember, Hudson will mark builds unstable until it has completed the build successfully a certain number of times in a row. On Wed, May 26, 2010 at 4:33 PM, Robin Anil wrote: > Success. It was not using the install target so was using older core > jar file from the repo. > > >

Re: Hudson build is unstable: Mahout-Quality #24

2010-05-26 Thread Sean Owen
Sounds good. Yeah now that I see how many checkstyle / findbugs warnings are there, I will also work to reduce them and/or suggest we disable a few rules. On Wed, May 26, 2010 at 9:33 PM, Robin Anil wrote: > Success. It was not using the install target so was using older core > jar file from the

Re: Moving to new Hadoop APIs

2010-05-26 Thread Jeff Eastman
I've got most of Canopy converted but am exploding my brain trying to figure out how best to coax DistanceMeasures to support a configure(Configuration) method. There's some subtle inheritance in the parameters package and I can't decide just where to touch it. I'm running out of time before I

Re: Hudson build is unstable: Mahout-Quality #24

2010-05-26 Thread Robin Anil
Success. It was not using the install target so was using older core jar file from the repo. its marked unstable because there are too many PMD, Findbugs warning. Let try and get it to stable over the summer Robin

Hudson build is unstable: Mahout-Quality #24

2010-05-26 Thread Apache Hudson Server
See

[jira] Resolved: (MAHOUT-342) [GSOC] Implement Map/Reduce Enabled Neural Networks

2010-05-26 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved MAHOUT-342. -- Resolution: Duplicate This became MAHOUT-374 > [GSOC] Implement Map/Reduce Enabled Neural Networks > -

[jira] Resolved: (MAHOUT-365) [GSoC] Proposal to implement SimHash clustering on MapReduce

2010-05-26 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved MAHOUT-365. -- Resolution: Later I'm going to mark this 'Later' for now, because it didn't materialize into a GSoC pr

[jira] Resolved: (MAHOUT-345) [GSOC] integrate Mahout with Drupal/PHP

2010-05-26 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved MAHOUT-345. -- Resolution: Won't Fix Let's close this as it did not become a GSoC project and seems more like a Drupa

[jira] Resolved: (MAHOUT-333) Implement a visualization tool to help a user visualize the output of clustering and other algorithms

2010-05-26 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved MAHOUT-333. -- Resolution: Won't Fix Same, didn't seem to materialize for GSoC > Implement a visualization tool to he

[jira] Resolved: (MAHOUT-332) Create adapters for MYSQL and NOSQL(hbase, cassandra) to access data for all the algorithms to use

2010-05-26 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved MAHOUT-332. -- Resolution: Later Archive this? it didn't turn into a GSoC project and there's not anyone obviously wo

Re: Build failed in Hudson: Mahout-Quality #23

2010-05-26 Thread Robin Anil
/home/hudson/tools/maven/latest/bin/mvn -f pom.xml -U clean javadoc:javadoc checkstyle:checkstyle findbugs:findbugs clover2:instrument clover2:aggregate clover2:clover pmd:pmd This is the whole command line. Its based on a fresh checkout

[jira] Resolved: (MAHOUT-328) Implement a cool clustering algorithm on map/reduce

2010-05-26 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved MAHOUT-328. -- Resolution: Won't Fix I believe this is closeable too. > Implement a cool clustering algorithm on map/

Re: Build failed in Hudson: Mahout-Quality #23

2010-05-26 Thread Sean Owen
Yes I get the mails too -- that was an unrelated change anyhow. Again, the compile error this reports is definitely wrong. I am looking at the source files now, which compile, and it is not consistent with what this build is doing. I think it's not quite using the latest artifacts somehow. On We

Re: Build failed in Hudson: Mahout-Quality #23

2010-05-26 Thread Robin Anil
Still failing.

Build failed in Hudson: Mahout-Quality #23

2010-05-26 Thread Apache Hudson Server
See Changes: [srowen] MAHOUT-392 -- [...truncated 7992 lines...] Generating

Re: Which configs to use

2010-05-26 Thread Robin Anil
I can do that, but need some pointer Branch Optional sonar.branch property. What is this value

Re: Build failed in Hudson: Mahout-Quality #22

2010-05-26 Thread Grant Ingersoll
On May 26, 2010, at 3:25 PM, Robin Anil wrote: > Ah our first report. Fun! > Should we have a separate email alias for build > related info ? -1. It's good to go to the dev@ so that it is noticed right away.

Re: Build failed in Hudson: Mahout-Quality #22

2010-05-26 Thread Robin Anil
Could be. There was a Java upgrade going on. I am running the job again. On Thu, May 27, 2010 at 1:09 AM, Sean Owen wrote: > Whatever you like I guess, but nightly checks seem OK. > > I am still claiming this compile error is faulty though, FWIW. > > On Wed, May 26, 2010 at 8:31 PM, Robin Anil w

[jira] Updated: (MAHOUT-371) [GSoC] Proposal to implement Distributed SVD++ Recommender using Hadoop

2010-05-26 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-371: - Assignee: Sean Owen Fix Version/s: 0.4 Due Date: 30/Aug/10 > [GSoC] Proposal to impleme

[jira] Resolved: (MAHOUT-260) An alternative approach to RNG management

2010-05-26 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved MAHOUT-260. -- Resolution: Fixed This was committed a while ago no? > An alternative approach to RNG management > ---

Re: Build failed in Hudson: Mahout-Quality #22

2010-05-26 Thread Sean Owen
Whatever you like I guess, but nightly checks seem OK. I am still claiming this compile error is faulty though, FWIW. On Wed, May 26, 2010 at 8:31 PM, Robin Anil wrote: > Should I configure a post commit hook? to trigger builds? or use poll scm? > > > On Thu, May 27, 2010 at 12:55 AM, Robin Anil

Re: Build failed in Hudson: Mahout-Quality #22

2010-05-26 Thread Robin Anil
Should I configure a post commit hook? to trigger builds? or use poll scm? On Thu, May 27, 2010 at 12:55 AM, Robin Anil wrote: > Ah our first report. Should we have a separate email alias for build > related info ? >

Re: Build failed in Hudson: Mahout-Quality #22

2010-05-26 Thread Sean Owen
I dispute this. This compiles fine for me and I scoured my directories for uncommitted changes. did this fail for anyone else? dev@ is fine. On Wed, May 26, 2010 at 8:25 PM, Robin Anil wrote: > Ah our first report. Should we have a separate email alias for build > related info ? >

Re: Which configs to use

2010-05-26 Thread Benson Margulies
we could also plug into sonar. On May 26, 2010, at 12:01 PM, Drew Farris wrote: Good to know that we don't need to depend on the site plugin for that stuff. Hooray for hudson. On Wed, May 26, 2010 at 1:54 PM, Robin Anil wrote: Finally!! Even though site:site doesn't work, Hudson plugi

Re: Build failed in Hudson: Mahout-Quality #22

2010-05-26 Thread Robin Anil
Ah our first report. Should we have a separate email alias for build related info ?

Build failed in Hudson: Mahout-Quality #22

2010-05-26 Thread Apache Hudson Server
See Changes: [srowen] Recommender and related jobs now exclusively use new Hadoop 0.20.x+ APIs -- [...truncated 6538 lines...] Generating

[jira] Resolved: (MAHOUT-346) [GSOC] Your Machine Learning Idea Here

2010-05-26 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved MAHOUT-346. -- Resolution: Won't Fix We can delete this right? > [GSOC] Your Machine Learning Idea Here > ---

[jira] Updated: (MAHOUT-392) Test cases for logGamma, Distribution.normal and Distribution.beta, fix for Distribution.normal

2010-05-26 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-392: - Status: Resolved (was: Patch Available) Resolution: Fixed I committed this > Test cases for log

[jira] Resolved: (MAHOUT-337) Don't serialize cached length squared in JSON vector representation

2010-05-26 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved MAHOUT-337. -- Resolution: Won't Fix > Don't serialize cached length squared in JSON vector representation > -

[jira] Resolved: (MAHOUT-143) Refactor Hadoop deprecations

2010-05-26 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved MAHOUT-143. -- Resolution: Fixed Resolving this after my last change, since I think what's left is covered in MAHOUT-

[jira] Updated: (MAHOUT-167) Convert clustering code to Hadoop 0.20 API

2010-05-26 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-167: - Component/s: (was: Collaborative Filtering) I've updated CF code. > Convert clustering code to Hadoo

Re: Which configs to use

2010-05-26 Thread Drew Farris
Good to know that we don't need to depend on the site plugin for that stuff. Hooray for hudson. On Wed, May 26, 2010 at 1:54 PM, Robin Anil wrote: > Finally!! Even though site:site doesn't work, Hudson plugins pull in > all the stats and gives a better looking dashboard > > > Check this out > ht

Re: Moving to new Hadoop APIs

2010-05-26 Thread Sean Owen
I'm happy to report I converted all the recommender-related jobs to Hadoop 0.20.x and sorted out the issue I had before, and simply reworked the jobs to not need one job to have two mapper inputs. I don't think I broke anything, but, can't be 100% sure since the tests aren't exhaustive. I suppose

[jira] Commented: (MAHOUT-231) Upgrade QM reports to use Clover 2.6

2010-05-26 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871886#action_12871886 ] Robin Anil commented on MAHOUT-231: --- Stopping the old MahoutQM job from building periodic

[jira] Resolved: (MAHOUT-231) Upgrade QM reports to use Clover 2.6

2010-05-26 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil resolved MAHOUT-231. --- Resolution: Fixed The reports are up with nice looking graphs http://hudson.zones.apache.org/hudson/j

[jira] Commented: (MAHOUT-396) Proposal for Implementing Hidden Markov Model

2010-05-26 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871881#action_12871881 ] Grant Ingersoll commented on MAHOUT-396: Would be great if you could fill in the HM

Re: Which configs to use

2010-05-26 Thread Robin Anil
Finally!! Even though site:site doesn't work, Hudson plugins pull in all the stats and gives a better looking dashboard Check this out http://hudson.zones.apache.org/hudson/job/Mahout-Quality/ http://hudson.zones.apache.org/hudson/job/Mahout-Quality/20/pmdResult/ http://hudson.zones.apache.org/hu

Re: [GSOC] Array of questions

2010-05-26 Thread Grant Ingersoll
On May 26, 2010, at 12:20 PM, Jeff Eastman wrote: > On 5/26/10 7:55 AM, Grant Ingersoll wrote: >> On May 25, 2010, at 11:51 PM, Jake Mannix wrote: >> >> >>> 2) In getting a feel for Mahout, I've been running a few of the examples on my own, and have noticed that if I supply the

Re: [GSOC] Array of questions

2010-05-26 Thread Jeff Eastman
On 5/26/10 9:20 AM, Jeff Eastman wrote: On 5/26/10 7:55 AM, Grant Ingersoll wrote: On May 25, 2010, at 11:51 PM, Jake Mannix wrote: 2) In getting a feel for Mahout, I've been running a few of the examples on my own, and have noticed that if I supply the "-h" argument by itself to some of the

Re: [GSOC] Array of questions

2010-05-26 Thread Jeff Eastman
On 5/26/10 7:55 AM, Grant Ingersoll wrote: On May 25, 2010, at 11:51 PM, Jake Mannix wrote: 2) In getting a feel for Mahout, I've been running a few of the examples on my own, and have noticed that if I supply the "-h" argument by itself to some of the available programs, I get an exc

Re: [jira] Commented: (MAHOUT-396) Proposal for Implementing Hidden Markov Model

2010-05-26 Thread Robin Anil
> Benson Margulies commented on MAHOUT-396: > - > > Why inside 'classifier'? Are you defining all sequence problems as > classification? Most of the application of sequence learning are used to recognize or predict. I would say it comes under classification

[jira] Commented: (MAHOUT-396) Proposal for Implementing Hidden Markov Model

2010-05-26 Thread Benson Margulies (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871782#action_12871782 ] Benson Margulies commented on MAHOUT-396: - Why inside 'classifier'? Are you definin

[jira] Commented: (MAHOUT-396) Proposal for Implementing Hidden Markov Model

2010-05-26 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871780#action_12871780 ] Robin Anil commented on MAHOUT-396: --- Few comments on the code tyle. See Mahout core clas

Re: [GSOC] Array of questions

2010-05-26 Thread Jeff Eastman
On 5/25/10 8:51 PM, Jake Mannix wrote: Hi Shannon, On Tue, May 25, 2010 at 8:10 PM, Shannon Quinn wrote: (snip) 2) In getting a feel for Mahout, I've been running a few of the examples on my own, and have noticed that if I supply the "-h" argument by itself to some of the available programs,

Re: [GSOC] Array of questions

2010-05-26 Thread Grant Ingersoll
On May 25, 2010, at 11:51 PM, Jake Mannix wrote: > >> >> 2) In getting a feel for Mahout, I've been running a few of the examples on >> my own, and have noticed that if I supply the "-h" argument by itself to >> some of the available programs, I get an exception, followed by the list of >> avai

Re: [GSOC] Timeline

2010-05-26 Thread Isabel Drost
On Tue Grant Ingersoll wrote: > OK, the coding period has started: > http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/timeline. > I'd encourage all involved to ask questions on the mailing lists and > in JIRA, etc. I'd like to second that - there are a lot of awesome people o

Re: PMC stuff

2010-05-26 Thread Isabel Drost
On Tue Grant Ingersoll wrote: > Many of us are new at this PMC stuff, so I thought I would send out a > few helpful pointers: Thanks. > At the same time, we have a few "old hats" around, so don't be afraid > to ask if you have questions. :) Good to know. > The other thing to note, most ever

Re: Which configs to use

2010-05-26 Thread Benson Margulies
On Wed, May 26, 2010 at 8:50 AM, Drew Farris wrote: > Ok, I suspected this was the case and had been running site:site > > I'm also a little confused about what goes in the plugin configuration in > either build/plugins or build/pluginManagement vs what goes in the reports > section of the pom. >

Re: Which configs to use

2010-05-26 Thread Drew Farris
Ok, I suspected this was the case and had been running site:site I'm also a little confused about what goes in the plugin configuration in either build/plugins or build/pluginManagement vs what goes in the reports section of the pom. Also, was wondering if the execution for checkstyle and pmd tha

[jira] Updated: (MAHOUT-396) Proposal for Implementing Hidden Markov Model

2010-05-26 Thread Max Heimel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Heimel updated MAHOUT-396: -- Attachment: MAHOUT-396.diff renamed patch file to stick with naming conventions :) > Proposal for Impl

[jira] Updated: (MAHOUT-396) Proposal for Implementing Hidden Markov Model

2010-05-26 Thread Max Heimel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Heimel updated MAHOUT-396: -- Attachment: (was: hmm_base.patch) > Proposal for Implementing Hidden Markov Model > ---

[jira] Updated: (MAHOUT-396) Proposal for Implementing Hidden Markov Model

2010-05-26 Thread Max Heimel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Heimel updated MAHOUT-396: -- Attachment: hmm_base.patch > Proposal for Implementing Hidden Markov Model > --

[jira] Updated: (MAHOUT-396) Proposal for Implementing Hidden Markov Model

2010-05-26 Thread Max Heimel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Heimel updated MAHOUT-396: -- Status: Patch Available (was: Open) Affects Version/s: 0.4 Fix Version/s: 0.4 T

[SOT] Tika + Hadoop

2010-05-26 Thread Grant Ingersoll
https://issues.apache.org/jira/browse/TIKA-433 might be of interest to those people looking to extract text from Office/PDF, etc. and then convert into Mahout vectors. -Grant

Re: Moving to new Hadoop APIs

2010-05-26 Thread Sean Owen
Yah that's what I expected, and that's what we'd settled on to date. I recently heard maybe that wasn't the case, and want to make sure the project doesn't get stuck using a bit of both for long. So now I'm on the new APIs, but checking in since that move seems to decrease consistency rather than

Re: Moving to new Hadoop APIs

2010-05-26 Thread Jake Mannix
I made sure to write all the matrix and decomposer stuff in the old mapred.* hiearchy, so that is not on 0.20+ yet. But I don't know about the rest of it, I've seen lots of 0.18-based code as I dug around. On Wed, May 26, 2010 at 1:10 AM, Sean Owen wrote: > So, I converted to use the new APIs

Re: Moving to new Hadoop APIs

2010-05-26 Thread Sean Owen
So, I converted to use the new APIs since Robin had mentioned that most all the other code uses it. But I took a glance, and actually I don't see anything using the new APIs at all except LDA and one FPM implementation. Are we talking about the same thing? Basically we are talking about using not

[jira] Commented: (MAHOUT-231) Upgrade QM reports to use Clover 2.6

2010-05-26 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871551#action_12871551 ] Robin Anil commented on MAHOUT-231: --- Clover running on hudson http://hudson.zones.apache

Re: Which configs to use

2010-05-26 Thread Robin Anil
checkstyle, pmd, javadoc and clover are running, you will have to goto the individual output folders to view them. site:site is crashing due to some dependency problem, so I removed it from the job pending further investigation Overall coverage view http://hudson.zones.apache.org/hudson/job/Mahou

Re: Which configs to use

2010-05-26 Thread Sean Owen
I don't mind removing it. It's there to detect substantial slow-downs in common implementations. On Wed, May 26, 2010 at 1:08 AM, Robin Anil wrote: > Oh Found it. the clover coverage instrumentation was slowing down the test > and there was a time check that was failing > >    assertTrue(timeMS <