Jenkins build is back to normal : mahout-nightly #1223

2013-05-07 Thread Apache Jenkins Server
See

Re: [jira] [Commented] (MAHOUT-1206) Add density-based clustering algorithms to mahout

2013-05-07 Thread 姜页希
There are several distinct implementations available online. Also, Yu Lee and I have some experience on developing hierarchical clustering (clustering the data with arbitrary shape through connectivity-based clustering) on hadoop. If you think this is OK, Yu Lee and I can take this task of impleme

Re: HBase backed matrices

2013-05-07 Thread Gokhan Capan
So if rows are small, blob is probably better; and if they get larger I can make blocks of blobs. I will experiment this. On Wed, May 8, 2013 at 1:06 AM, Ted Dunning wrote: > It really depends on your access patterns. > > Blob storage of rows will be much faster for scans and will take much les

[jira] [Commented] (MAHOUT-1206) Add density-based clustering algorithms to mahout

2013-05-07 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13651369#comment-13651369 ] Ted Dunning commented on MAHOUT-1206: - Do you know of scalable algorithms for these o

Re: HBase backed matrices

2013-05-07 Thread Ted Dunning
It really depends on your access patterns. Blob storage of rows will be much faster for scans and will take much less space. Column storage of values may or may not make things faster, but it is conceptually nicer to not have to update so much. In practice, I am not convinced that you will notic

Re: HBase backed matrices

2013-05-07 Thread Gokhan Capan
Nope, I simply thought that would make accessing and setting individual cells more difficult. Should I? Do you think it would perform better? And I would want to hear if you have more design choices in your mind. On Wed, May 8, 2013 at 12:22 AM, Ted Dunning wrote: > Have you experimented with

Re: HBase backed matrices

2013-05-07 Thread Ted Dunning
Have you experimented with, for instance, row number as id, value as binary serialized vector? On Tue, May 7, 2013 at 2:16 PM, Gokhan Capan wrote: > 2 options: > > 1- row index as the row key, column index as column identifier, and value > as value > 2- row index and column index combined as

Re: HBase backed matrices

2013-05-07 Thread Gokhan Capan
2 options: 1- row index as the row key, column index as column identifier, and value as value 2- row index and column index combined as the row key, and value in a column called "value" Row indices are kept in a member variable in memory, to make iteration fast. On Wed, May 8, 2013 at 12:11 AM

Re: HBase backed matrices

2013-05-07 Thread Ted Dunning
How did you store the matrix in HBase? On Tue, May 7, 2013 at 1:08 PM, Gokhan Capan wrote: > Hi, > > For taking large matrices as input and persisting large models (like factor > models), I created an HBase-backed version of Mahout matrix. > > It allows random access to cells and rows as well a

HBase backed matrices

2013-05-07 Thread Gokhan Capan
Hi, For taking large matrices as input and persisting large models (like factor models), I created an HBase-backed version of Mahout matrix. It allows random access to cells and rows as well as assignment, and iteration over rows. viewRow returns a view, and lazy loads actual data if a get is act

[jira] [Created] (MAHOUT-1206) Add density-based clustering algorithms to mahout

2013-05-07 Thread Yexi (JIRA)
Yexi created MAHOUT-1206: Summary: Add density-based clustering algorithms to mahout Key: MAHOUT-1206 URL: https://issues.apache.org/jira/browse/MAHOUT-1206 Project: Mahout Issue Type: Improvement

Build failed in Jenkins: Mahout-Examples-Cluster-Reuters-II #473

2013-05-07 Thread Apache Jenkins Server
See Changes: [ssc] MAHOUT-1205 ParallelALSFactorizationJob should leverage the distributed cache [ssc] MAHOUT-1205 ParallelALSFactorizationJob should leverage the distributed cache [ssc] adding missing changelog e

[jira] [Updated] (MAHOUT-1183) remove duplicate (masked) unused field

2013-05-07 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1183: -- Resolution: Fixed Fix Version/s: 0.8 Status: Resolved (was: Patch Available)

[jira] [Updated] (MAHOUT-775) L2 does not work with TrainAdaptiveLogisticRegression

2013-05-07 Thread Angel Martinez Gonzalez (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Angel Martinez Gonzalez updated MAHOUT-775: --- Attachment: MAHOUT-775.patch The problem is that the L2 variance ("s" member)

[jira] [Commented] (MAHOUT-1205) ParallelALSFactorizationJob should leverage the distributed cache

2013-05-07 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13650672#comment-13650672 ] Hudson commented on MAHOUT-1205: Integrated in Mahout-Quality #1984 (See [https://builds

Build failed in Jenkins: Mahout-Examples-Classify-20News #186

2013-05-07 Thread Apache Jenkins Server
See Changes: [ssc] MAHOUT-1205 ParallelALSFactorizationJob should leverage the distributed cache [ssc] MAHOUT-1205 ParallelALSFactorizationJob should leverage the distributed cache [ssc] adding missing changelog entri

Build failed in Jenkins: Mahout-Examples-Cluster-Reuters #301

2013-05-07 Thread Apache Jenkins Server
See Changes: [ssc] MAHOUT-1205 ParallelALSFactorizationJob should leverage the distributed cache [ssc] MAHOUT-1205 ParallelALSFactorizationJob should leverage the distributed cache [ssc] adding missing changelog entri

Re: Committing to mahout-git?

2013-05-07 Thread Isabel Drost-Fromm
On Thursday, May 02, 2013 12:57:53 PM Robin Anil wrote: > diffs from git can be applied on svn using > > patch -P1 < patch.file Some more background when dealing with patches that people might find useful: (linked from the "c

[jira] [Resolved] (MAHOUT-1205) ParallelALSFactorizationJob should leverage the distributed cache

2013-05-07 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1205. Resolution: Fixed > ParallelALSFactorizationJob should leverage the distribute