[ https://issues.apache.org/jira/browse/MAHOUT-322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved MAHOUT-322. ------------------------------ Resolution: Won't Fix Cleaning house. I don't think anyone's looked at this in 8 months, or will do more here. > DistributedRowMatrix should live in SequenceFile<Writable,VectorWritable> > instead of SequenceFile<IntWritable,VectorWritable> > ----------------------------------------------------------------------------------------------------------------------------- > > Key: MAHOUT-322 > URL: https://issues.apache.org/jira/browse/MAHOUT-322 > Project: Mahout > Issue Type: Improvement > Components: Math > Affects Versions: 0.3 > Reporter: Danny Leshem > Assignee: Jake Mannix > Priority: Minor > > Class documentation for org.apache.mahout.math.hadoop.DistributedRowMatrix > states that the matrix lives in SequenceFile<WritableComparable, > VectorWritable>. Implementation, however, assumes SequenceFile<IntWritable, > VectorWritable> is passed. > Currently, usage of this class inside Mahout is limited to Jake Mannix's SVD > package, mainly to perform PCA on a massive document corpus. Given such > corpus, it makes sense to not limit the user by forcing the document "key" to > be integer. Instead, users should be able to use Text keys (document name or > id) or keys made of any other arbitrary class. One may even argue that > forcing a WritableComparable key is too limiting, and a simple Writable key > should be assumed. > In fact, it would be best if DistributedRowMatrix did not read the > SequenceFile key at all, to allow user-specific classes (unknown to Mahout) > to be used as opaque keys even when their libraries are not available in > runtime. Currently DistributedRowMatrix calls "reader.next(i, v)"... but > reader has methods to query just the value, avoiding key deserialization > altogether. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira