Re: Mahout Release : 0.7

2012-06-06 Thread tom pierce
It's going to be tough, because we actually planned to have the release out already and none of us really is a YARN expert. If there's some common trap we're falling into with how we're specifying paths or setting up jobs and we can fix it easily, I'm guessing no one would object. Of course, we'd

Re: [VOTE] Mahout 0.7

2012-05-22 Thread tom pierce
+1 for D. Many of those seem like worthwhile improvements, and if a week or so allows a few to get resolved, that seems like a good tradeoff to me. I'd be inclined to vote C (or maybe A) on 6/1, though. -t On 05/22/2012 12:33 PM, Jeff Eastman wrote: A - punt all non-critical issues to 0.

[jira] [Updated] (MAHOUT-987) Our build is unstable - this should reduce our style warnings by >200

2012-05-15 Thread tom pierce (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-987: -- Resolution: Fixed Status: Resolved (was: Patch Available) Those 2 commits got us back to

[jira] [Commented] (MAHOUT-987) Our build is unstable - this should reduce our style warnings by >200

2012-05-14 Thread tom pierce (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13274827#comment-13274827 ] tom pierce commented on MAHOUT-987: --- Hi Folks - I've been heads-down on oth

[jira] [Updated] (MAHOUT-994) mahout script shouldn't rely on HADOOP_HOME since that was deprecated in all major Hadoop branches

2012-05-02 Thread tom pierce (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-994: -- Resolution: Fixed Fix Version/s: 0.7 Status: Resolved (was: Patch Available

[jira] [Assigned] (MAHOUT-994) mahout script shouldn't rely on HADOOP_HOME since that was deprecated in all major Hadoop branches

2012-04-27 Thread tom pierce (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce reassigned MAHOUT-994: - Assignee: tom pierce Anyone object the most recent patch? It's a tiny edit off Ro

[jira] [Updated] (MAHOUT-994) mahout script shouldn't rely on HADOOP_HOME since that was deprecated in all major Hadoop branches

2012-04-27 Thread tom pierce (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-994: -- Attachment: MAHOUT-994.patch > mahout script shouldn't rely on HADOOP_HOME since

Re: Review Request: mahout script shouldn't rely on HADOOP_HOME since that was deprecated in all major Hadoop branches

2012-04-25 Thread tom pierce
wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/4632/ > --- > > (Updated 2012-04-04 00:26:36) > > &g

[jira] [Commented] (MAHOUT-994) mahout script shouldn't rely on HADOOP_HOME since that was deprecated in all major Hadoop branches

2012-03-27 Thread tom pierce (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13240066#comment-13240066 ] tom pierce commented on MAHOUT-994: --- You bet - happy to re

[jira] [Commented] (MAHOUT-994) mahout script shouldn't rely on HADOOP_HOME since that was deprecated in all major Hadoop branches

2012-03-27 Thread tom pierce (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13240015#comment-13240015 ] tom pierce commented on MAHOUT-994: --- It would make a lot of sense to me to unify

[jira] [Created] (MAHOUT-993) Some vector dumper flags are expecting arguments.

2012-03-14 Thread tom pierce (Created) (JIRA)
Reporter: tom pierce Priority: Minor Fix For: 0.7 I ran VectorDumper from the command line like this: $MAHOUT_HOME/bin/mahout vectordump -i ${HDFS_VECTS} --csv -p > ${LOCAL_VECTS} I've used this command before to dump similar vectors, but I'm now get

Re: [jira] [Commented] (MAHOUT-822) Mahout needs to be made compatible with Hadoop .23 releases

2012-03-13 Thread tom pierce
when I run it for second time. Explicitly deleting the file might help in the test case ( I agree that this is not the best solution, but, I am not able to find the actual problem for such behavior ). Paritosh On 13-03-2012 19:12, tom pierce wrote: Unfortunately, I can't duplica

[jira] [Commented] (MAHOUT-822) Mahout needs to be made compatible with Hadoop .23 releases

2012-03-13 Thread tom pierce (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228388#comment-13228388 ] tom pierce commented on MAHOUT-822: --- Unfortunately, I can't duplicate this tes

Re: [jira] [Commented] (MAHOUT-822) Mahout needs to be made compatible with Hadoop .23 releases

2012-03-13 Thread tom pierce
Key: MAHOUT-822 URL: https://issues.apache.org/jira/browse/MAHOUT-822 Project: Mahout Issue Type: Improvement Components: build Affects Versions: 0.6 Reporter: Roman Shaposhnik Assignee: tom pierce

[jira] [Created] (MAHOUT-992) Audit DistributedCache use to support EMR

2012-03-12 Thread tom pierce (Created) (JIRA)
: tom pierce Priority: Minor Apparently some of our DistributedCache use is not EMR-safe. It would be great if someone could audit our uses of DC, and fix up this problem where it exists. For an example of problematic usage (and the fix), see MAHOUT-980. -- This message is

[jira] [Updated] (MAHOUT-980) Patch to make PFPGrowth run on Amazon MapReduce (also shows possible pattern to make other algorithms work in Amazon MapReduce)

2012-03-12 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-980: -- Resolution: Fixed Assignee: tom pierce Status: Resolved (was: Patch Available

[jira] [Updated] (MAHOUT-822) Mahout needs to be made compatible with Hadoop .23 releases

2012-03-12 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-822: -- Resolution: Fixed Assignee: tom pierce Status: Resolved (was: Patch Available) Thanks

Re: ongoing jenkins failures

2012-03-12 Thread tom pierce
hecks, as we know the code will be pretty for every release and hopefully it won't drift too much in the interim. My bikeshedding proposal, -Grant On Mar 9, 2012, at 12:44 AM, Jake Mannix wrote: On Thu, Mar 8, 2012 at 9:10 PM, tom pierce wrote: I could support this plan, but it might

Re: svn commit: r1299770 - in /mahout/trunk: ./ core/ core/src/main/java/org/apache/mahout/common/ core/src/test/java/org/apache/mahout/classifier/df/mapreduce/partial/ core/src/test/java/org/apache/m

2012-03-12 Thread tom pierce
Can someone hook me up with JIRA privs so I can close tickets? (Or, if that isn't something all committers get, someone pls mark -822 and -980 closed) -tom On 03/12/2012 02:25 PM, t...@apache.org wrote: Author: tcp Date: Mon Mar 12 18:25:45 2012 New Revision: 1299770 URL: http://svn.apache.

Re: Review Request: MAHOUT-822: Mahout needs to be made compatible with Hadoop .23 releases

2012-03-09 Thread tom pierce
y had been doing it 100% with new api found in Cloudera distros and were hacked back to support 0.20.2 which Mahout was shipped with. so i think i'll take a look at it at some other issue separately. On Fri, Mar 9, 2012 at 10:50 AM, Dmitriy Lyubimov wrote: On Thu, Mar 8, 2012 at 8:43 PM,

Re: Review Request: MAHOUT-822: Mahout needs to be made compatible with Hadoop .23 releases

2012-03-09 Thread tom pierce
i don't think we care > > about classic hadoop 0.20.2 anymore. > > tom pierce wrote: > Hmm, I am not sure, but I think this is a comment we can omit. Without > the new listStatus methods (but with the changes to other files), there is at > least one test that won

Re: ongoing jenkins failures

2012-03-08 Thread tom pierce
the fb/pmd/cs bar too low does us no service unless we are prepared to take those warnings seriously. Is it possible to raise the bar to where we are "ok" again and then agree to lower it periodically to get us to improve our hygiene index? On 3/7/12 7:04 PM, Tom Pierce wrote: Well we

Re: Review Request: MAHOUT-822: Mahout needs to be made compatible with Hadoop .23 releases

2012-03-08 Thread tom pierce
that gave a consistent answer between 0.20.203 and 0.23.1. In the other cases I think I made the tests a little better, but in this case I made the test pass. Is there someone you'd nominate as an additional reviewer? - tom ----------- Th

Review Request: Simple patch to reduce our checkstyle warnings

2012-03-07 Thread tom pierce
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4238/ --- Review request for mahout. Summary --- Generated with: find . -name *java

[jira] [Updated] (MAHOUT-987) Our build is unstable - this should reduce our style warnings by >200

2012-03-07 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-987: -- Status: Patch Available (was: Open) > Our build is unstable - this should reduce our st

[jira] [Updated] (MAHOUT-987) Our build is unstable - this should reduce our style warnings by >200

2012-03-07 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-987: -- Attachment: MAHOUT-987.patch > Our build is unstable - this should reduce our style warnings

[jira] [Created] (MAHOUT-987) Our build is unstable - this should reduce our style warnings by >200

2012-03-07 Thread tom pierce (Created) (JIRA)
ype: Improvement Affects Versions: 0.7 Reporter: tom pierce Fix For: 0.7 If we're going to keep these Jenkins style rules, let's get our build stable! Here's about 200 small fixes created by: find . -name \*java | xargs perl -pi -e 's/(if|w

Re: ongoing jenkins failures

2012-03-07 Thread Tom Pierce
Well we already have that in a sense - all the tests still run and we can see which fail even if findbugs/pmd/checkstyle have lots of complaints. My concern would be having 2 separate Jenkins tasks would make it even easier to ignore the non-test warnings. I'd much rather make "mvn test" fail whe

[jira] [Commented] (MAHOUT-822) Mahout needs to be made compatible with Hadoop .23 releases

2012-03-07 Thread tom pierce (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224918#comment-13224918 ] tom pierce commented on MAHOUT-822: --- It's posted on review board (it's

Review Request: MAHOUT-822: Mahout needs to be made compatible with Hadoop .23 releases

2012-03-07 Thread tom pierce
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4237/ --- Review request for mahout and Dmitriy Lyubimov. Summary --- This is the cur

Re: ongoing jenkins failures

2012-03-06 Thread tom pierce
ay bug Sean for his styles on that side of things. But if you run eclipse i'd really like to use one shared style so autoformat doesn't act differently on something that has already been formatted. -d On Mon, Mar 5, 2012 at 4:51 PM, tom pierce wrote: I am starting to agree that Jenkins

[jira] [Commented] (MAHOUT-822) Mahout needs to be made compatible with Hadoop .23 releases

2012-03-06 Thread tom pierce (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13223696#comment-13223696 ] tom pierce commented on MAHOUT-822: --- Thanks Bilung! All tests pass for me under

Re: ongoing jenkins failures

2012-03-05 Thread tom pierce
For some reason i concluded it was related to big number PMD/style warnings. but i may be wrong and i don't remember why i concluded that. I certainly know the least about Jenkins and ci stuff. On Mon, Mar 5, 2012 at 11:40 AM, Tom Pierce wrote: Hi folks, I spent a little time looking into o

ongoing jenkins failures

2012-03-05 Thread Tom Pierce
Hi folks, I spent a little time looking into our Jenkins failures. I am not very familiar with Jenkins, so this is probably a little remedial - pointers for other/better things to look at are appreciated! We've let this go for a long time: * Last stable build (#1218), 3 mo 3 days ago Before tha

[jira] [Updated] (MAHOUT-822) Mahout needs to be made compatible with Hadoop .23 releases

2012-03-02 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-822: -- Attachment: MAHOUT-822.patch Updated patch to cover a new test failure under Hadoop 0.23.1-SNAPSHOT

Re: Build failed in Jenkins: Mahout-Examples-Cluster-Reuters #58

2012-03-01 Thread tom pierce
What's the best way to even begin tracking this down? To my eye, this looks like something has gone wrong on the server (full disk, maybe?). -tom On 03/01/2012 03:06 PM, Apache Jenkins Server wrote: See Changes: [tcp]

Re: [VOTE] Next Mahout Release Goals and Release Name

2012-02-29 Thread Tom Pierce
> Release Goals: > - refactoring and cleanup of existing functionality +1 > - new functionality 0 (would love to see some, but let's not make it a prereq) > Release Name: [I'd actually prefer to defer this choice and see how much changes... Though I will go all in for 0.7 if there are incompati

[jira] [Commented] (MAHOUT-980) Patch to make PFPGrowth run on Amazon MapReduce (also shows possible pattern to make other algorithms work in Amazon MapReduce)

2012-02-29 Thread tom pierce (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13219679#comment-13219679 ] tom pierce commented on MAHOUT-980: --- Thanks! This has been commi

[jira] [Commented] (MAHOUT-980) Patch to make PFPGrowth run on Amazon MapReduce (also shows possible pattern to make other algorithms work in Amazon MapReduce)

2012-02-29 Thread tom pierce (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13219587#comment-13219587 ] tom pierce commented on MAHOUT-980: --- Thanks Matteo! I'm almost ready to co

Re: [jira] [Created] (MAHOUT-980) Patch to make PFPGrowth run on Amazon MapReduce (also shows patterns for making other algorithms work in Amazon MapReduce)

2012-02-27 Thread Tom Pierce
I'm catching up on some mail and I came across this patch - this looks OK to me (though I'm not too familiar with the nuances of running on EMR). I'm unit testing it now, but I wanted to ask what the policy on committing patches delivered via link is? Should I request a resubmit as a JIRA attachm

Re: New Committer: Tom Pierce

2012-02-22 Thread Tom Pierce
Thanks everyone! -tom On Wed, Feb 22, 2012 at 6:07 AM, Grant Ingersoll wrote: > Welcome aboard! > > On Feb 21, 2012, at 6:11 PM, Jeff Eastman wrote: > >> The Project Management Committee (PMC) for Apache Mahout has asked Tom >> Pierceto become a committer and we are pleased to announce that he

[jira] [Commented] (MAHOUT-946) Map-reduce job status often left unchecked

2012-02-08 Thread tom pierce (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203694#comment-13203694 ] tom pierce commented on MAHOUT-946: --- I agree that the cleanup stuff seems a bit awk

[jira] [Updated] (MAHOUT-947) Improvements to seqdumper

2012-01-28 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-947: -- Attachment: MAHOUT-947.patch Dropped the cluster dumping addition to VectorDumper

[jira] [Updated] (MAHOUT-946) Map-reduce job status often left unchecked

2012-01-27 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-946: -- Attachment: MAHOUT-946.patch I thought about this some more and realized in many cases you might want

[jira] [Updated] (MAHOUT-822) Mahout needs to be made compatible with Hadoop .23 releases

2012-01-26 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-822: -- Attachment: MAHOUT-822.patch I addressed the clustering test failures. Mostly that involved not

[jira] [Commented] (MAHOUT-947) Improvements to seqdumper

2012-01-17 Thread tom pierce (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187861#comment-13187861 ] tom pierce commented on MAHOUT-947: --- Hah - I agree on everything needing a quiet op

[jira] [Commented] (MAHOUT-946) Map-reduce job status often left unchecked

2012-01-17 Thread tom pierce (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187755#comment-13187755 ] tom pierce commented on MAHOUT-946: --- Interesting thought - right now it seems to me

[jira] [Commented] (MAHOUT-946) Map-reduce job status often left unchecked

2012-01-16 Thread tom pierce (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187264#comment-13187264 ] tom pierce commented on MAHOUT-946: --- Making sure the examples halt at appropr

[jira] [Updated] (MAHOUT-947) Improvements to seqdumper

2012-01-16 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-947: -- Attachment: MAHOUT-947-2.patch Adjusted to put vector options in VectorDumper. Also add ability to

[jira] [Commented] (MAHOUT-947) Improvements to seqdumper

2012-01-16 Thread tom pierce (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186935#comment-13186935 ] tom pierce commented on MAHOUT-947: --- Oh nice- I hadn't seen VectorDumper befor

[jira] [Updated] (MAHOUT-947) Improvements to seqdumper

2012-01-15 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-947: -- Attachment: MAHOUT-947.patch > Improvements to seqdum

[jira] [Updated] (MAHOUT-947) Improvements to seqdumper

2012-01-15 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-947: -- Status: Patch Available (was: Open) > Improvements to seqdum

[jira] [Created] (MAHOUT-947) Improvements to seqdumper

2012-01-15 Thread tom pierce (Created) (JIRA)
Improvements to seqdumper - Key: MAHOUT-947 URL: https://issues.apache.org/jira/browse/MAHOUT-947 Project: Mahout Issue Type: Improvement Reporter: tom pierce Priority: Minor I'v

[jira] [Updated] (MAHOUT-946) Map-reduce job status often left unchecked

2012-01-15 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-946: -- Attachment: MAHOUT-946.patch > Map-reduce job status often left unchec

[jira] [Updated] (MAHOUT-946) Map-reduce job status often left unchecked

2012-01-15 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-946: -- Affects Version/s: 0.6 Status: Patch Available (was: Open) > Map-reduce job sta

[jira] [Created] (MAHOUT-946) Map-reduce job status often left unchecked

2012-01-15 Thread tom pierce (Created) (JIRA)
Map-reduce job status often left unchecked -- Key: MAHOUT-946 URL: https://issues.apache.org/jira/browse/MAHOUT-946 Project: Mahout Issue Type: Bug Reporter: tom pierce I've run i

[jira] [Updated] (MAHOUT-890) Performance issue in FPGrowth

2011-12-30 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-890: -- Attachment: MAHOUT-890-3.patch I decided to add a couple tests based on the synthetic data, and found

[jira] [Updated] (MAHOUT-890) Performance issue in FPGrowth

2011-12-30 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-890: -- Attachment: MAHOUT-890-2.patch This patch (MAHOUT-890-2) adds the new implementation (under fpgrowth2

Re: Maturity level annotations

2011-12-27 Thread Tom Pierce
The users I'm talking about are often quite advanced in many ways - familiar with R, SAS, etc., capable of coding up their own implementations based on papers, etc. They don't know Mahout, they aren't eager to study a new API out of curiosity, but they would like to find a suite of super-scalable

Re: Maturity level annotations

2011-12-27 Thread Tom Pierce
Is there a plan to bubble these annotations out further? Say to the wiki or as command-line feedback? I think it would be really helpful (and promote uptake of Mahout) to have metadata and prominent documentation that describes the general scaling/stability properties of the different methods. I

[jira] [Commented] (MAHOUT-920) Remove a mapreduce job from parallel FPGrowth workflow

2011-12-27 Thread tom pierce (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176279#comment-13176279 ] tom pierce commented on MAHOUT-920: --- (Unfortunately, I don't think I can d

[jira] [Updated] (MAHOUT-920) Remove a mapreduce job from parallel FPGrowth workflow

2011-12-27 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-920: -- Attachment: MAHOUT-920.patch This is the correct patch for this issue- please disregard previous

[jira] [Commented] (MAHOUT-890) Performance issue in FPGrowth

2011-12-22 Thread tom pierce (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174825#comment-13174825 ] tom pierce commented on MAHOUT-890: --- There are patches in other issues - it woul

[jira] [Updated] (MAHOUT-927) FPG saves a mapping from from feature to mining group, when this can be calculated on the fly

2011-12-15 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-927: -- Status: Patch Available (was: Open) > FPG saves a mapping from from feature to mining group, w

[jira] [Updated] (MAHOUT-927) FPG saves a mapping from from feature to mining group, when this can be calculated on the fly

2011-12-15 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-927: -- Attachment: MAHOUT-927.patch This patch assumes MAHOUT-920 and MAHOUT-921 have already been applied

[jira] [Created] (MAHOUT-927) FPG saves a mapping from from feature to mining group, when this can be calculated on the fly

2011-12-15 Thread tom pierce (Created) (JIRA)
Project: Mahout Issue Type: Improvement Components: Frequent Itemset/Association Rule Mining Affects Versions: 0.6 Reporter: tom pierce Priority: Minor The group membership list can easily be computed on the fly, rather than written out and

[jira] [Updated] (MAHOUT-921) FPG uses a lot of boxed primitives - this patch eliminates a bunch of List

2011-12-11 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-921: -- Attachment: MAHOUT-921.patch Note patch assumes MAHOUT-920 has been applied! >

[jira] [Created] (MAHOUT-921) FPG uses a lot of boxed primitives - this patch eliminates a bunch of List

2011-12-11 Thread tom pierce (Created) (JIRA)
Issue Type: Improvement Components: Frequent Itemset/Association Rule Mining Affects Versions: 0.6 Reporter: tom pierce Priority: Minor TransactionTree uses List internally and as part of its API; this patch changes that and pushes the use of List

[jira] [Updated] (MAHOUT-920) Remove a mapreduce job from parallel FPGrowth workflow

2011-12-10 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-920: -- Status: Patch Available (was: Open) > Remove a mapreduce job from parallel FPGrowth workf

[jira] [Updated] (MAHOUT-920) Remove a mapreduce job from parallel FPGrowth workflow

2011-12-10 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-920: -- Attachment: MAHOUT-890.patch > Remove a mapreduce job from parallel FPGrowth workf

[jira] [Created] (MAHOUT-920) Remove a mapreduce job from parallel FPGrowth workflow

2011-12-10 Thread tom pierce (Created) (JIRA)
: Frequent Itemset/Association Rule Mining Affects Versions: 0.6 Reporter: tom pierce Priority: Minor The transaction sorting job could have been made map-only, and another mapreduce job follows, so it made sense to combine them. It would have been possible to use a

[jira] [Commented] (MAHOUT-890) Performance issue in FPGrowth

2011-12-09 Thread tom pierce (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166452#comment-13166452 ] tom pierce commented on MAHOUT-890: --- Sure - I can do this. I can also separate out

[jira] [Commented] (MAHOUT-890) Performance issue in FPGrowth

2011-12-05 Thread tom pierce (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162849#comment-13162849 ] tom pierce commented on MAHOUT-890: --- Thanks for the feedback. I agree with

[jira] [Updated] (MAHOUT-890) Performance issue in FPGrowth

2011-12-03 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-890: -- Status: Patch Available (was: Open) > Performance issue in FPGro

[jira] [Updated] (MAHOUT-890) Performance issue in FPGrowth

2011-12-03 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-890: -- Attachment: MAHOUT-890.patch > Performance issue in FPGro

[jira] [Commented] (MAHOUT-890) Performance issue in FPGrowth

2011-12-03 Thread tom pierce (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162230#comment-13162230 ] tom pierce commented on MAHOUT-890: --- I've prepared a patch which replaces th

[jira] [Updated] (MAHOUT-911) Naive Bayes trains models that are too large to apply

2011-12-02 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-911: -- Attachment: example.wiki.categories.txt > Naive Bayes trains models that are too large to ap

[jira] [Created] (MAHOUT-911) Naive Bayes trains models that are too large to apply

2011-12-02 Thread tom pierce (Created) (JIRA)
: Classification Affects Versions: 0.6 Reporter: tom pierce I'm seeing the same issue that Lyall Morrison mentioned on the user list not too long ago; I can train a model that apparently has too many classes (or is otherwise too large) to read back in and apply to new documents.

[jira] [Updated] (MAHOUT-895) Make Wikipedia example set maker easier to mod

2011-11-23 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-895: -- Status: Patch Available (was: Open) > Make Wikipedia example set maker easier to

[jira] [Updated] (MAHOUT-895) Make Wikipedia example set maker easier to mod

2011-11-23 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-895: -- Attachment: MAHOUT-895.patch > Make Wikipedia example set maker easier to

[jira] [Created] (MAHOUT-895) Make Wikipedia example set maker easier to mod

2011-11-23 Thread tom pierce (Created) (JIRA)
, Examples Affects Versions: 0.6 Reporter: tom pierce Priority: Minor The WikipediaDatasetCreator uses 2 mechanisms to scrape out the text of articles; first an XmlInputFormat is used with the "text" tags as start/end markers (which demarcate the article content)

[jira] [Updated] (MAHOUT-894) NB testclassifier runs in sequential mode by default

2011-11-23 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-894: -- Status: Patch Available (was: Open) > NB testclassifier runs in sequential mode by defa

[jira] [Updated] (MAHOUT-894) NB testclassifier runs in sequential mode by default

2011-11-23 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-894: -- Attachment: MAHOUT-894.patch > NB testclassifier runs in sequential mode by defa

[jira] [Created] (MAHOUT-894) NB testclassifier runs in sequential mode by default

2011-11-23 Thread tom pierce (Created) (JIRA)
: Classification Affects Versions: 0.6 Reporter: tom pierce NB classifiers can only be trained in MR mode, but evaluation happens in sequential mode by default. I think this violates the principle of least surprise - anyone trying this out is likely to expect the opposite. I'm attach

[jira] [Updated] (MAHOUT-890) Performance issue in FPGrowth

2011-11-22 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-890: -- Attachment: simpleFPG.patch > Performance issue in FPGro

[jira] [Commented] (MAHOUT-890) Performance issue in FPGrowth

2011-11-22 Thread tom pierce (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155489#comment-13155489 ] tom pierce commented on MAHOUT-890: --- I'm attaching another patch which adds a

[jira] [Commented] (MAHOUT-890) Performance issue in FPGrowth

2011-11-20 Thread tom pierce (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13153822#comment-13153822 ] tom pierce commented on MAHOUT-890: --- There's no fix patch up there (yet); it

[jira] [Updated] (MAHOUT-890) Performance issue in FPGrowth

2011-11-19 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-890: -- Attachment: logtrees.patch smallexample.dat addSynth.patch

[jira] [Created] (MAHOUT-890) Performance issue in FPGrowth

2011-11-19 Thread tom pierce (Created) (JIRA)
Versions: 0.6 Reporter: tom pierce I've encountered a dataset which indicates there is probably a performance bug lurking in the FPGrowth implementation. This set may be a bit of an unusual target for FPG - there's a relatively modest number itemsets, and many items wi

FP Growth oddities

2011-11-14 Thread Tom Pierce
Hi, I've been playing around a bit with the FPGrowth implementation, and I have some questions. Is it intentional that all frequent itemsets including a given item are mined for each (frequent) item?  For example, consider a simple example from FPGrowthTest.java: X      (occurs 12 times) Y      

[jira] [Updated] (MAHOUT-886) FPtree nodes multiply-added (becoming siblings in tree)

2011-11-14 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-886: -- Attachment: MAHOUT-886.patch Keep nodes from getting multiply added (becoming own siblings). There&#

[jira] [Updated] (MAHOUT-886) FPtree nodes multiply-added (becoming siblings in tree)

2011-11-14 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-886: -- Status: Patch Available (was: Open) > FPtree nodes multiply-added (becoming siblings in t

[jira] [Created] (MAHOUT-886) FPtree nodes multiply-added (becoming siblings in tree)

2011-11-14 Thread tom pierce (Created) (JIRA)
: Frequent Itemset/Association Rule Mining Affects Versions: 0.6 Reporter: tom pierce In FPGrowth#traverseAndBuildConditionalFPTreeData, while creating a conditional FPtree sometimes nodes are multiply-added as children of the same node, becoming siblings in the conditional tree

[jira] [Updated] (MAHOUT-885) Freq pattern growth advertises wrong value for default

2011-11-14 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-885: -- Labels: patch (was: ) Status: Patch Available (was: Open) Changes the default to be 1000

[jira] [Updated] (MAHOUT-885) Freq pattern growth advertises wrong value for default

2011-11-14 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-885: -- Attachment: MAHOUT-885.patch > Freq pattern growth advertises wrong value for defa

[jira] [Created] (MAHOUT-885) Freq pattern growth advertises wrong value for default

2011-11-14 Thread tom pierce (Created) (JIRA)
: Frequent Itemset/Association Rule Mining Affects Versions: 0.6 Reporter: tom pierce FPG advertises that numgroups will default to 1000, but it actually uses 50. If you do not override the default, only 50 reducers get work even if you have many more. -- This message is