Yeah, in particular, DistributedRowMatrix "is" simply a SequenceFile<IntWritable,VectorWritable>, when in its serialized form. As such, this "file" can be (and typically is) a series of part-* files in a directory (typically on HDFS).
-jake On Sun, Nov 13, 2011 at 10:23 PM, Dmitriy Lyubimov <[email protected]>wrote: > It's my understanding drm can be multifile. In fact, stuff like seq2sparse > will produce multifile output, being a MR job itself. > On Nov 12, 2011 3:23 PM, "Lance Norskog" <[email protected]> wrote: > > > Is there a convention for multi-file matrices? For example, the > > DistributedRowMatrix? > > > > -- > > Lance Norskog > > [email protected] > > >
