Re: edition of hadoop for mahout 0.4

2010-08-18 Thread Cui tony
Sorry, I got it. It definitely used 0.20.2 hadoop api. I was wrong. When I checked kmeans source code, I still have some questions. It seems use fs to read cluster file from hdfs in every iteration process of map process. see: KMeansUtil.java line 73 reader = new SequenceFile.Reader(fs, path, conf

Re: [jira] Commented: (MAHOUT-467) Change Iterable in org.apache.mahout.math.hadoop.similarity.RowSimilarityJob.SimilarityReducer to list or array to improve the performance

2010-08-18 Thread Sean Owen
Yeah, that's certainly right. For me it gave me pause, because a "Driver" class seems like the canonical functional, utility, non-OO class, consisting mostly of a main() method and support methods. I don't know if I convinced whoever changed that... I somehow suspect that within 12 months I'll for

Hudson build is still unstable: Mahout-Quality #203

2010-08-18 Thread Apache Hudson Server
See

[jira] Commented: (MAHOUT-479) Streamline classification/ clustering data structures

2010-08-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900071#action_12900071 ] Hudson commented on MAHOUT-479: --- Integrated in Mahout-Quality #202 (See [https://hudson.apac

[jira] Commented: (MAHOUT-482) Defaulting $HADOOP_CONF_DIR to $HADOOP_HOME/conf

2010-08-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900070#action_12900070 ] Hudson commented on MAHOUT-482: --- Integrated in Mahout-Quality #202 (See [https://hudson.apac

Hudson build is unstable: Mahout-Quality #202

2010-08-18 Thread Apache Hudson Server
See

[jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob

2010-08-18 Thread Han Hui Wen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899927#action_12899927 ] Han Hui Wen commented on MAHOUT-473: - It can work now ,the -D parameter can pass to Ro

Re: [jira] Commented: (MAHOUT-467) Change Iterable in org.apache.mahout.math.hadoop.similarity.RowSimilarityJob.SimilarityReducer to list or array to improve the performance

2010-08-18 Thread Ted Dunning
I think that one of the style alarms goes off if you access a static via an object reference. For me, it pretty much comes down to intent. If the method really feels like it is applied to the object, then it should be non-static. If it is a pure function and is obviously such, then it should be

[jira] Commented: (MAHOUT-484) The RecommenderJob exit ,some sub-jobs can not be run.

2010-08-18 Thread Han Hui Wen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899916#action_12899916 ] Han Hui Wen commented on MAHOUT-484: - Yep .I see . I will talk about it using mail in

[jira] Resolved: (MAHOUT-484) The RecommenderJob exit ,some sub-jobs can not be run.

2010-08-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved MAHOUT-484. -- Fix Version/s: (was: 0.4) Resolution: Incomplete I don't really understand what you're sayin

[jira] Updated: (MAHOUT-484) The RecommenderJob exit ,some sub-jobs can not be run.

2010-08-18 Thread Han Hui Wen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Han Hui Wen updated MAHOUT-484: Attachment: screenshot-2.jpg > The RecommenderJob exit ,some sub-jobs can not be run. > ---

[jira] Updated: (MAHOUT-484) The RecommenderJob exit ,some sub-jobs can not be run.

2010-08-18 Thread Han Hui Wen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Han Hui Wen updated MAHOUT-484: Attachment: screenshot-1.jpg > The RecommenderJob exit ,some sub-jobs can not be run. > ---

[jira] Created: (MAHOUT-484) The RecommenderJob exit ,some sub-jobs can not be run.

2010-08-18 Thread Han Hui Wen (JIRA)
The RecommenderJob exit ,some sub-jobs can not be run. -- Key: MAHOUT-484 URL: https://issues.apache.org/jira/browse/MAHOUT-484 Project: Mahout Issue Type: Test Reporter: Han Hu

[jira] Updated: (MAHOUT-484) The RecommenderJob exit ,some sub-jobs can not be run.

2010-08-18 Thread Han Hui Wen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Han Hui Wen updated MAHOUT-484: Affects Version/s: 0.4 Fix Version/s: 0.4 Component/s: Collaborative Filtering >

[jira] Updated: (MAHOUT-483) Job RowSimilarityJob-Mapper-EntriesToVectorsReducer improvement

2010-08-18 Thread Han Hui Wen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Han Hui Wen updated MAHOUT-483: Attachment: screenshot-4.jpg > Job RowSimilarityJob-Mapper-EntriesToVectorsReducer improvement > -

[jira] Commented: (MAHOUT-483) Job RowSimilarityJob-Mapper-EntriesToVectorsReducer improvement

2010-08-18 Thread Han Hui Wen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899902#action_12899902 ] Han Hui Wen commented on MAHOUT-483: - Or we need decrease the output data size of the

[jira] Issue Comment Edited: (MAHOUT-483) Job RowSimilarityJob-Mapper-EntriesToVectorsReducer improvement

2010-08-18 Thread Han Hui Wen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899897#action_12899897 ] Han Hui Wen edited comment on MAHOUT-483 at 8/18/10 12:37 PM: --

[jira] Commented: (MAHOUT-480) Replace manual precondition checking with Precondition utility class from Guava

2010-08-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899898#action_12899898 ] Sean Owen commented on MAHOUT-480: -- Looking good, are you going further with it? > Replac

[jira] Commented: (MAHOUT-483) Job RowSimilarityJob-Mapper-EntriesToVectorsReducer improvement

2010-08-18 Thread Han Hui Wen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899897#action_12899897 ] Han Hui Wen commented on MAHOUT-483: - I remembered that we already sorted the key in

[jira] Commented: (MAHOUT-483) Job RowSimilarityJob-Mapper-EntriesToVectorsReducer improvement

2010-08-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899894#action_12899894 ] Sean Owen commented on MAHOUT-483: -- Why would this change be faster, I wonder? > Job RowS

Re: edition of hadoop for mahout 0.4

2010-08-18 Thread Ted Dunning
Are you looking at the current trunk? On Wed, Aug 18, 2010 at 1:39 AM, Cui tony wrote: > But many algorithms still used 0.19 api, for example k-means. > Is there any plan to re-write this algorithm one by one? > > > 2010/8/17 Drew Farris > > > Mahout currently depends on 0.20.2, and the new 0

[jira] Commented: (MAHOUT-483) Job RowSimilarityJob-Mapper-EntriesToVectorsReducer improvement

2010-08-18 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899891#action_12899891 ] Sebastian Schelter commented on MAHOUT-483: --- Han Hui, can you provide a patch an

[jira] Updated: (MAHOUT-483) Job RowSimilarityJob-Mapper-EntriesToVectorsReducer improvement

2010-08-18 Thread Han Hui Wen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Han Hui Wen updated MAHOUT-483: Attachment: (was: screenshot-2.jpg) > Job RowSimilarityJob-Mapper-EntriesToVectorsReducer impro

[jira] Updated: (MAHOUT-483) Job RowSimilarityJob-Mapper-EntriesToVectorsReducer improvement

2010-08-18 Thread Han Hui Wen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Han Hui Wen updated MAHOUT-483: Attachment: screenshot-3.jpg > Job RowSimilarityJob-Mapper-EntriesToVectorsReducer improvement > -

[jira] Updated: (MAHOUT-483) Job RowSimilarityJob-Mapper-EntriesToVectorsReducer improvement

2010-08-18 Thread Han Hui Wen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Han Hui Wen updated MAHOUT-483: Attachment: (was: screenshot-1.jpg) > Job RowSimilarityJob-Mapper-EntriesToVectorsReducer impro

[jira] Updated: (MAHOUT-483) Job RowSimilarityJob-Mapper-EntriesToVectorsReducer improvement

2010-08-18 Thread Han Hui Wen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Han Hui Wen updated MAHOUT-483: Attachment: screenshot-2.jpg > Job RowSimilarityJob-Mapper-EntriesToVectorsReducer improvement > -

[jira] Updated: (MAHOUT-483) Job RowSimilarityJob-Mapper-EntriesToVectorsReducer improvement

2010-08-18 Thread Han Hui Wen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Han Hui Wen updated MAHOUT-483: Attachment: screenshot-1.jpg > Job RowSimilarityJob-Mapper-EntriesToVectorsReducer improvement > -

[jira] Created: (MAHOUT-483) Job RowSimilarityJob-Mapper-EntriesToVectorsReducer improvement

2010-08-18 Thread Han Hui Wen (JIRA)
Job RowSimilarityJob-Mapper-EntriesToVectorsReducer improvement Key: MAHOUT-483 URL: https://issues.apache.org/jira/browse/MAHOUT-483 Project: Mahout Issue Type: Test

[jira] Updated: (MAHOUT-483) Job RowSimilarityJob-Mapper-EntriesToVectorsReducer improvement

2010-08-18 Thread Han Hui Wen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Han Hui Wen updated MAHOUT-483: Affects Version/s: 0.4 Fix Version/s: 0.4 Component/s: Collaborative Filtering >

Re: edition of hadoop for mahout 0.4

2010-08-18 Thread Jeff Eastman
Huh? k-means, and all other algorithms except maybe classification, have been converted to 0.20.2 On 8/18/10 1:39 AM, Cui tony wrote: But many algorithms still used 0.19 api, for example k-means. Is there any plan to re-write this algorithm one by one? 2010/8/17 Drew Farris Mahout cur

[jira] Commented: (MAHOUT-479) Streamline classification/ clustering data structures

2010-08-18 Thread Jeff Eastman (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899874#action_12899874 ] Jeff Eastman commented on MAHOUT-479: - I don't see AbstractVectorClassifier as a super-

Virtualized test environment

2010-08-18 Thread Saikat Kanjilal
Hi Folks,I saw a thread around freebsd jails being setup a week or so ago, I was wondering if you guys had thought about setting up a virtualized test environment so that folks working on patches can test their code on a virtualized hadoop cluster inside an actual data center. Is there a set

Re: [jira] Commented: (MAHOUT-467) Change Iterable in org.apache.mahout.math.hadoop.similarity.RowSimilarityJob.SimilarityReducer to list or array to improve the performance

2010-08-18 Thread Saikat Kanjilal
Got it, thanks for the clarification. Sent from my iPhone On Aug 18, 2010, at 6:53 AM, Sean Owen wrote: > Non-static methods invisibly pass 'this' as an argument. Java can > optimize invocation of method calls with less than 3 args. It can > matter -- but probably only in super-critical blocks.

Re: [jira] Commented: (MAHOUT-467) Change Iterable in org.apache.mahout.math.hadoop.similarity.RowSimilarityJob.SimilarityReducer to list or array to improve the performance

2010-08-18 Thread Sean Owen
Non-static methods invisibly pass 'this' as an argument. Java can optimize invocation of method calls with less than 3 args. It can matter -- but probably only in super-critical blocks. I almost shouldn't mention it. It really is tiny compared to the design-level concerns. On Wed, Aug 18, 2010 at

Re: [jira] Commented: (MAHOUT-467) Change Iterable in org.apache.mahout.math.hadoop.similarity.RowSimilarityJob.SimilarityReducer to list or array to improve the performance

2010-08-18 Thread Saikat Kanjilal
I understand that static methods are not associated with the this keyword since they are class level methods. In your initial post you had mentioned something about critical sections and how static methods improve performance, do you mean sections that are accessed in a serialized fashion or ar

Re: [jira] Commented: (MAHOUT-467) Change Iterable in org.apache.mahout.math.hadoop.similarity.RowSimilarityJob.SimilarityReducer to list or array to improve the performance

2010-08-18 Thread Sean Owen
There aren't thread safety issues with static vs instance methods per se, no. You could write thread-safe or -unsafe code either way. It only concerns whether the methods are attached to a 'this'. On Wed, Aug 18, 2010 at 2:26 PM, Saikat Kanjilal wrote: > On this topic there are repercussions with

Re: [jira] Commented: (MAHOUT-467) Change Iterable in org.apache.mahout.math.hadoop.similarity.RowSimilarityJob.SimilarityReducer to list or array to improve the performance

2010-08-18 Thread Saikat Kanjilal
On this topic there are repercussions with using static methods related to thread safety and unpredictable behavior when a static method modifies a data structure shared between threads and race conditions associated with this. In general do you all have guidelines as to when to use instance ve

[jira] Updated: (MAHOUT-482) Defaulting $HADOOP_CONF_DIR to $HADOOP_HOME/conf

2010-08-18 Thread Drew Farris (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Drew Farris updated MAHOUT-482: --- Assignee: Drew Farris > Defaulting $HADOOP_CONF_DIR to $HADOOP_HOME/conf > --

Re: Test failure

2010-08-18 Thread Ted Dunning
I was just grabbing for a nice big hammer that I could be sure would work. Happy to back it off a bit. On Tue, Aug 17, 2010 at 11:51 PM, Sean Owen wrote: > Where this had happened before we just used the "Locale.ENGLISH" > locale as a slightly more neutral alternative. Is that OK? > > On Wed, A

[jira] Updated: (MAHOUT-482) Defaulting $HADOOP_CONF_DIR to $HADOOP_HOME/conf

2010-08-18 Thread Drew Farris (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Drew Farris updated MAHOUT-482: --- Status: Resolved (was: Patch Available) Fix Version/s: 0.4 Resolution: Fixed Commi

Re: javax.persistence missing?

2010-08-18 Thread Steven Bourke
Ah this appears to be a ctrl-space mismatch! Everything is fine, nothing to see here. On 18 Aug 2010, at 14:01, Drew Farris wrote: > I'm not sure where that javax.persistence dependency is coming from, I > don't see it locally. Perhaps it is related to Steven's build > environment? Steven, cou

Re: javax.persistence missing?

2010-08-18 Thread Drew Farris
I'm not sure where that javax.persistence dependency is coming from, I don't see it locally. Perhaps it is related to Steven's build environment? Steven, could you provide any additional details about the error, for example which class or package (core, examples, etc..) is the source of the problem

Re: [jira] Commented: (MAHOUT-467) Change Iterable in org.apache.mahout.math.hadoop.similarity.RowSimilarityJob.SimilarityReducer to list or array to improve the performance

2010-08-18 Thread Sean Owen
This brings up an interesting side question. I think all methods that can be static (i.e. use no instance methods or fields) should be static -- unless explicitly intended to be overrideable. It simply reflects reality and adds flexibility. In critical sections it can improve performance. If such

Re: javax.persistence missing?

2010-08-18 Thread Sean Owen
Maven defines the entire build, including dependencies. It's the build system of record, and should manage all this. You can use an IDE but it really has to have Maven integration to make this work seamlessly. IntelliJ does and Eclipse does, and I assume but don't know if Netbeans does. (And, I sh

javax.persistence missing?

2010-08-18 Thread Steven Bourke
Hi, I've been using the compiled version of mahout 0.3. I now want to work from the source package so that I can incorporate some changes and so forth. Previously I just associated the JAR files from the compiled version of mahout with my IDE (Netbeans). It appears this is not so straight for

Hudson build is still unstable: Mahout-Quality #200

2010-08-18 Thread Apache Hudson Server
See

[jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob

2010-08-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899800#action_12899800 ] Hudson commented on MAHOUT-473: --- Integrated in Mahout-Quality #200 (See [https://hudson.apac

[jira] Commented: (MAHOUT-467) Change Iterable in org.apache.mahout.math.hadoop.similarity.RowSimilarityJob.SimilarityReducer to list or array to improve the performance

2010-08-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899801#action_12899801 ] Hudson commented on MAHOUT-467: --- Integrated in Mahout-Quality #200 (See [https://hudson.apac

[jira] Commented: (MAHOUT-460) Add "maxPreferencesPerItemConsidered" option to o.a.m.cf.taste.hadoop.similarity.item.ItemSimilarityJob

2010-08-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899799#action_12899799 ] Hudson commented on MAHOUT-460: --- Integrated in Mahout-Quality #200 (See [https://hudson.apac

Re: edition of hadoop for mahout 0.4

2010-08-18 Thread Cui tony
But many algorithms still used 0.19 api, for example k-means. Is there any plan to re-write this algorithm one by one? 2010/8/17 Drew Farris > Mahout currently depends on 0.20.2, and the new 0.20.x api is used in > many cases, so 0.19 is no longer an option for 0.4 > > On Tue, Aug 17, 2010 at

[jira] Commented: (MAHOUT-473) add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob

2010-08-18 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899761#action_12899761 ] Sebastian Schelter commented on MAHOUT-473: --- Should be working now, can you give

[jira] Updated: (MAHOUT-473) add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in RecommenderJob

2010-08-18 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-473: -- Attachment: MAHOUT-473.patch > add parameter -Dmapred.reduce.tasks when call job RowSim