Re: new distance metric

2011-03-29 Thread Sebastian Schelter
Hi Daniel, We would also need a "distributed" implementation of this new metric. Could you do that too? Shouldn't be too hard, just have a look at the other implementations in org.apache.mahout.math.hadoop.similarity.vector. --sebastian On 30.03.2011 00:40, Sean Owen wrote: Great, the be

[jira] [Commented] (MAHOUT-633) Add SequenceFileIterable; put Iterable stuff in one place

2011-03-29 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012847#comment-13012847 ] Dmitriy Lyubimov commented on MAHOUT-633: - Actually deep copy iterator was using c

[jira] [Commented] (MAHOUT-633) Add SequenceFileIterable; put Iterable stuff in one place

2011-03-29 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012846#comment-13012846 ] Dmitriy Lyubimov commented on MAHOUT-633: - Ok.. i see just two functional changes:

Re: TestClusterDumper...

2011-03-29 Thread Dmitriy Lyubimov
i am not sure. it just says running the test and then after a while bails out with 'build failed'. I looked breifly at surefire reports but did not immediately see an error. which makes me think it some sort of maven thing (which is why i asked about maven) Let me follow up on this. On Tue, Mar 2

Re: new distance metric

2011-03-29 Thread Ted Dunning
http://en.wikipedia.org/wiki/Taxicab_geometry On Tue, Mar 29, 2011 at 4:10 PM, Lance Norskog wrote: > Dennis, is there a cite somewhere explaining this algorithm? > > On Tue, Mar 29, 2011 at 3:55 PM, Ted Dunning > wrote: > > City block and Manhattan and L_1 metric are the names that I know for

Re: new distance metric

2011-03-29 Thread Lance Norskog
Dennis, is there a cite somewhere explaining this algorithm? On Tue, Mar 29, 2011 at 3:55 PM, Ted Dunning wrote: > City block and Manhattan and L_1 metric are the names that I know for it. > > On Tue, Mar 29, 2011 at 3:40 PM, Sean Owen wrote: > >> I know this as "Manhattan distance". Is that an

Re: new distance metric

2011-03-29 Thread Ted Dunning
City block and Manhattan and L_1 metric are the names that I know for it. On Tue, Mar 29, 2011 at 3:40 PM, Sean Owen wrote: > I know this as "Manhattan distance". Is that an Americanism or is that > actually the more common name to anyone? >

Re: new distance metric

2011-03-29 Thread Sean Owen
Great, the best place for this would be a JIRA issue: https://issues.apache.org/jira/browse/MAHOUT I think it needs a bit of style work. For example, it ought to be very much like TanimotoCoefficientSimilarity. If you copied that and edited a few key methods, you'd be a lot closer I think. I guess

[jira] [Commented] (MAHOUT-633) Add SequenceFileIterable; put Iterable stuff in one place

2011-03-29 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012741#comment-13012741 ] Grant Ingersoll commented on MAHOUT-633: You probably should take over the Sequenc

[jira] [Commented] (MAHOUT-633) Add SequenceFileIterable; put Iterable stuff in one place

2011-03-29 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012735#comment-13012735 ] Dmitriy Lyubimov commented on MAHOUT-633: - bq. In the tests I ran, not significant

[jira] [Issue Comment Edited] (MAHOUT-633) Add SequenceFileIterable; put Iterable stuff in one place

2011-03-29 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012728#comment-13012728 ] Dmitriy Lyubimov edited comment on MAHOUT-633 at 3/29/11 9:51 PM: --

[jira] [Commented] (MAHOUT-633) Add SequenceFileIterable; put Iterable stuff in one place

2011-03-29 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012728#comment-13012728 ] Dmitriy Lyubimov commented on MAHOUT-633: - I guess i need to clarify that i am not

[jira] [Commented] (MAHOUT-633) Add SequenceFileIterable; put Iterable stuff in one place

2011-03-29 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012724#comment-13012724 ] Dmitriy Lyubimov commented on MAHOUT-633: - bq.How much of that time was due to GC?

new distance metric

2011-03-29 Thread Daniel McEnnis
Dear, Here is a patch of a new distance metric for the collaborative filtering modules - CityBlockDistance. With the 0 - 1 binary split on preference. KLDistance, AHDistance, and Symmetric KLDistance don't make sense. Daniel McEnnis. Index: core/src/main/java/org/apache/mahout/cf/taste/impl/sim

generating ground truth

2011-03-29 Thread Daniel McEnnis
Dear, I just wanted to confirm that there does not exist a utility program for generating ground truth in a format Bayes classifier can use for building a model from either the 20 newsgroup data set or a related text data set. Daniel McEnnis.

[jira] [Commented] (MAHOUT-633) Add SequenceFileIterable; put Iterable stuff in one place

2011-03-29 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012658#comment-13012658 ] Ted Dunning commented on MAHOUT-633: {quote} Not in SSVD, it packs parts of massive sc

[jira] [Commented] (MAHOUT-633) Add SequenceFileIterable; put Iterable stuff in one place

2011-03-29 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012639#comment-13012639 ] Sean Owen commented on MAHOUT-633: -- Sounds like we just need a new flag or something to s

[jira] [Commented] (MAHOUT-633) Add SequenceFileIterable; put Iterable stuff in one place

2011-03-29 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012637#comment-13012637 ] Dmitriy Lyubimov commented on MAHOUT-633: - bq. Oh yes, it's already split up that

[jira] [Commented] (MAHOUT-633) Add SequenceFileIterable; put Iterable stuff in one place

2011-03-29 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012634#comment-13012634 ] Dmitriy Lyubimov commented on MAHOUT-633: - bq. And, in about half the cases, the c

[jira] [Issue Comment Edited] (MAHOUT-633) Add SequenceFileIterable; put Iterable stuff in one place

2011-03-29 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012629#comment-13012629 ] Dmitriy Lyubimov edited comment on MAHOUT-633 at 3/29/11 7:12 PM: --

[jira] [Commented] (MAHOUT-633) Add SequenceFileIterable; put Iterable stuff in one place

2011-03-29 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012629#comment-13012629 ] Dmitriy Lyubimov commented on MAHOUT-633: - I haven't looked thru the patch yet. Bu

[jira] [Commented] (MAHOUT-633) Add SequenceFileIterable; put Iterable stuff in one place

2011-03-29 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012626#comment-13012626 ] Sean Owen commented on MAHOUT-633: -- Oh yes, it's already split up that way and one delega

[jira] [Issue Comment Edited] (MAHOUT-633) Add SequenceFileIterable; put Iterable stuff in one place

2011-03-29 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012621#comment-13012621 ] Dmitriy Lyubimov edited comment on MAHOUT-633 at 3/29/11 7:01 PM: --

[jira] [Commented] (MAHOUT-633) Add SequenceFileIterable; put Iterable stuff in one place

2011-03-29 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012621#comment-13012621 ] Dmitriy Lyubimov commented on MAHOUT-633: - yes -- it is often the case -- esp. in

[jira] [Updated] (MAHOUT-641) DistributedRowMatrix hadoop jobs ignore Configuration set via setConf()

2011-03-29 Thread Jonathan Traupman (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Traupman updated MAHOUT-641: - Status: Patch Available (was: Open) Sorry for the duplicate message -- still figuring ou

Re: Build failed in Jenkins: Mahout-Quality #700

2011-03-29 Thread Dmitriy Lyubimov
Funny. One has to investigate failure reports these days to confirm one's commits are fine. On Mon, Mar 28, 2011 at 11:39 PM, Apache Hudson Server wrote: > See > > Changes: > > [dlyubimov] MAHOUT-638 first installment: the fix. I w

[jira] [Commented] (MAHOUT-637) Remove direct HBase dependency

2011-03-29 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012615#comment-13012615 ] Dmitriy Lyubimov commented on MAHOUT-637: - Yes ok. ( It looked like you wanted to

[jira] [Commented] (MAHOUT-641) DistributedRowMatrix hadoop jobs ignore Configuration set via setConf()

2011-03-29 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012611#comment-13012611 ] Ted Dunning commented on MAHOUT-641: Nice work Jonathan! On Tue, Mar 29, 2011 at 11:4

[jira] [Updated] (MAHOUT-641) DistributedRowMatrix hadoop jobs ignore Configuration set via setConf()

2011-03-29 Thread Jonathan Traupman (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Traupman updated MAHOUT-641: - Attachment: MAHOUT-641.patch Patch for MAHOUT-641. Diffed from revision 1086678. > Dist

Re: [jira] [Updated] (MAHOUT-641) DistributedRowMatrix hadoop jobs ignore Configuration set via setConf()

2011-03-29 Thread Ted Dunning
Nice work Jonathan! On Tue, Mar 29, 2011 at 11:42 AM, Jonathan Traupman (JIRA) wrote: > All unit tests pass. Added 3 new unit test cases to verify this bugfix.) >

[jira] [Updated] (MAHOUT-641) DistributedRowMatrix hadoop jobs ignore Configuration set via setConf()

2011-03-29 Thread Jonathan Traupman (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Traupman updated MAHOUT-641: - Fix Version/s: 0.5 Status: Patch Available (was: Open) Patch fixes configurat

[jira] [Updated] (MAHOUT-641) DistributedRowMatrix hadoop jobs ignore Configuration set via setConf()

2011-03-29 Thread Jonathan Traupman (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Traupman updated MAHOUT-641: - Comment: was deleted (was: Patch fixes configuration problem in our environment. I change

[jira] [Updated] (MAHOUT-641) DistributedRowMatrix hadoop jobs ignore Configuration set via setConf()

2011-03-29 Thread Jonathan Traupman (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Traupman updated MAHOUT-641: - Status: Open (was: Patch Available) > DistributedRowMatrix hadoop jobs ignore Configurat

[jira] [Created] (MAHOUT-641) DistributedRowMatrix hadoop jobs ignore Configuration set via setConf()

2011-03-29 Thread Jonathan Traupman (JIRA)
DistributedRowMatrix hadoop jobs ignore Configuration set via setConf() --- Key: MAHOUT-641 URL: https://issues.apache.org/jira/browse/MAHOUT-641 Project: Mahout Issue Type:

Re: Lucene's tests

2011-03-29 Thread Ted Dunning
Yeah... here at MapR, we have been doing a lot of fault injection as well. We don't have the community side of the fault injection, but that will likely come. As far as Mahout is concerned, this would help the SGD code a lot. I don't know how much it would help the recommendation side of the hou

Lucene's tests

2011-03-29 Thread Grant Ingersoll
Others here might find this interesting: http://blog.mikemccandless.com/2011/03/your-test-cases-should-sometimes-fail.html Lucene's test framework is pretty awesome. Lots of random stuff running something like 100 times per day on Jenkins. Not only that, but the framework is a standalone JAR.

Re: TestClusterDumper...

2011-03-29 Thread Grant Ingersoll
What's the error you are getting? On Mar 29, 2011, at 11:28 AM, Dmitriy Lyubimov wrote: > Yes I was perplexed as well as i couldnt immediately see a critical change > thhere but I confirmed manually that tests are passing before it and failing > thhere on. > > Please don't worry. I highly suspe

[jira] [Commented] (MAHOUT-637) Remove direct HBase dependency

2011-03-29 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012587#comment-13012587 ] Sean Owen commented on MAHOUT-637: -- I don't intend to do any more work like this so shoul

[jira] [Commented] (MAHOUT-637) Remove direct HBase dependency

2011-03-29 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012567#comment-13012567 ] Dmitriy Lyubimov commented on MAHOUT-637: - I think it might make sense to unify de

[jira] [Issue Comment Edited] (MAHOUT-637) Remove direct HBase dependency

2011-03-29 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012556#comment-13012556 ] Dmitriy Lyubimov edited comment on MAHOUT-637 at 3/29/11 5:28 PM: --

[jira] [Issue Comment Edited] (MAHOUT-637) Remove direct HBase dependency

2011-03-29 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012556#comment-13012556 ] Dmitriy Lyubimov edited comment on MAHOUT-637 at 3/29/11 5:27 PM: --

[jira] [Commented] (MAHOUT-637) Remove direct HBase dependency

2011-03-29 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012556#comment-13012556 ] Dmitriy Lyubimov commented on MAHOUT-637: - as of Mahout-622, it (hbase dep) is dec

[jira] [Commented] (MAHOUT-622) Mahout dependencies are unified under dependency management in parent pom

2011-03-29 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012553#comment-13012553 ] Dmitriy Lyubimov commented on MAHOUT-622: - bq. In fact there are a load of exclude

[jira] [Commented] (MAHOUT-622) Mahout dependencies are unified under dependency management in parent pom

2011-03-29 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012545#comment-13012545 ] Sean Owen commented on MAHOUT-622: -- OK I will undo my removal of the kfs exclude then in

[jira] [Commented] (MAHOUT-622) Mahout dependencies are unified under dependency management in parent pom

2011-03-29 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012544#comment-13012544 ] Ted Dunning commented on MAHOUT-622: There are a boatload of transitive dependencies i

[jira] [Commented] (MAHOUT-637) Remove direct HBase dependency

2011-03-29 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012537#comment-13012537 ] Sean Owen commented on MAHOUT-637: -- Charset.forName() won't throw a (checked) exception v

[jira] [Commented] (MAHOUT-622) Mahout dependencies are unified under dependency management in parent pom

2011-03-29 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012535#comment-13012535 ] Sean Owen commented on MAHOUT-622: -- I have a related question while looking at MAHOUT-637

[jira] [Commented] (MAHOUT-637) Remove direct HBase dependency

2011-03-29 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012529#comment-13012529 ] Ted Dunning commented on MAHOUT-637: Regarding lines like this: {code} items.a

Re: TestClusterDumper...

2011-03-29 Thread Dmitriy Lyubimov
Yes I was perplexed as well as i couldnt immediately see a critical change thhere but I confirmed manually that tests are passing before it and failing thhere on. Please don't worry. I highly suspect it is something specicfic to me. Just thought you might have a quick guess. One question though --

[jira] [Updated] (MAHOUT-633) Add SequenceFileIterable; put Iterable stuff in one place

2011-03-29 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-633: - Status: Patch Available (was: Open) > Add SequenceFileIterable; put Iterable stuff in one place > --

Re: TestClusterDumper...

2011-03-29 Thread Grant Ingersoll
What's the error you are getting? I don't have ready access to a Windows box. Also, I don't believe this commit touched TestClusterDumper. -Grant On Mar 28, 2011, at 11:45 PM, Dmitriy Lyubimov wrote: > Grant, > > these are Bisect results, this commit broke TestClusterDumper in tests > on wind

Re: TestClusterDumper...

2011-03-29 Thread Grant Ingersoll
I'll take a look On Mar 28, 2011, at 11:45 PM, Dmitriy Lyubimov wrote: > Grant, > > these are Bisect results, this commit broke TestClusterDumper in tests > on windows it seems.. My diagnostic might be wrong but that's what i > got. > > 86e6e1d64901cc0ce436d43a56fcadb8a2cb6c1d is the first bad

[jira] [Updated] (MAHOUT-637) Remove direct HBase dependency

2011-03-29 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-637: - Assignee: Sean Owen (was: Robin Anil) Status: Patch Available (was: Open) > Remove direct HBase d

[jira] [Updated] (MAHOUT-637) Remove direct HBase dependency

2011-03-29 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-637: - Attachment: MAHOUT-637.patch > Remove direct HBase dependency > -- > >

[jira] [Commented] (MAHOUT-637) Remove direct HBase dependency

2011-03-29 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012503#comment-13012503 ] Sean Owen commented on MAHOUT-637: -- PS, we still have references to KosmoFS, but I see no

[jira] [Commented] (MAHOUT-640) Implementation of refresh in SVDRecommender

2011-03-29 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012488#comment-13012488 ] Sean Owen commented on MAHOUT-640: -- Sounds fine to me, take a shot at a patch and post it

[jira] [Commented] (MAHOUT-633) Add SequenceFileIterable; put Iterable stuff in one place

2011-03-29 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012486#comment-13012486 ] Sean Owen commented on MAHOUT-633: -- On sorting the order of input -- Dmitriy I think I ke

[jira] [Commented] (MAHOUT-633) Add SequenceFileIterable; put Iterable stuff in one place

2011-03-29 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012482#comment-13012482 ] Sean Owen commented on MAHOUT-633: -- On new Writables: I agree, I don't think it can be fa

[jira] [Updated] (MAHOUT-633) Add SequenceFileIterable; put Iterable stuff in one place

2011-03-29 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-633: - Attachment: MAHOUT-633.patch Here comes a new version of the patch. Yes I have in here glob and list sup

[jira] [Created] (MAHOUT-640) Implementation of refresh in SVDRecommender

2011-03-29 Thread Chris Newell (JIRA)
Implementation of refresh in SVDRecommender --- Key: MAHOUT-640 URL: https://issues.apache.org/jira/browse/MAHOUT-640 Project: Mahout Issue Type: Improvement Components: Collaborative Filteri

[jira] [Created] (MAHOUT-639) Need special case to handle creating a new SequentialAccessSparseVector from a large (> 1M dims) random/hashed vector

2011-03-29 Thread Timothy Potter (JIRA)
Need special case to handle creating a new SequentialAccessSparseVector from a large (> 1M dims) random/hashed vector - Key: MAHOUT-639 URL: https:

Re: AbstractFactorizer is not currently refreshable

2011-03-29 Thread Sebastian Schelter
Good point! Hadn't thought about that, than we should make Factorizer implement Refreshable as you proposed. --sebastian On 29.03.2011 14:12, Chris Newell wrote: Sebastian, Would it not be easier to just have SVDRecommender refresh the DataModel first and then just make it recompute the facto

Re: AbstractFactorizer is not currently refreshable

2011-03-29 Thread Chris Newell
Sebastian, Would it not be easier to just have SVDRecommender refresh the DataModel first and then just make it recompute the factorization? The problem is that when the DataModel is refreshed the userIDMapping and itemIDMapping in the Factorizer also need to be refreshed and I don't think th

Re: AbstractFactorizer is not currently refreshable

2011-03-29 Thread Sebastian Schelter
Hi Chris, Nice to see you take that task. Could you open a JIRA issue for it? Would it not be easier to just have SVDRecommender refresh the DataModel first and then just make it recompute the factorization? --sebastian On 29.03.2011 12:23, Chris Newell wrote: AbstractFactorizer (in packag

AbstractFactorizer is not currently refreshable

2011-03-29 Thread Chris Newell
AbstractFactorizer (in package org.apache.mahout.cf.taste.impl.recommender.svd) does not currently implement refreshable. This makes it difficult to implement refresh in SVDRecommender (currently a "ToDo" which I'd like to fix). There are two options I can see: 1) remember which Factorizer Cl

[jira] [Commented] (MAHOUT-368) should package core ,math and collections to one Jar package for hadoop recommendations

2011-03-29 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012381#comment-13012381 ] Julien Nioche commented on MAHOUT-368: -- Moving the discussion to MAHOUT-621. > shou