The ordering *can* be chosen to be that. But nothing in our api documentation implies we will always do this, and in fact it completely depends on whether the MR job used to create the matrix had reducer outputs creating row numbers sequentially.
-jake On Sun, Nov 13, 2011 at 11:28 PM, Lance Norskog <[email protected]> wrote: > So, a DRM is a set of one or more files, where each SequenceFile int/vector > pair is a row number and a fully wide vector? Then ordering is in the > IntWritable keys. > > On Sun, Nov 13, 2011 at 10:56 PM, Jake Mannix <[email protected]> > wrote: > > > I don't think we currently make any guarantees about sort-order of the > > parts > > themselves, or among the various part-files, as the may be created by any > > number of map-reduce jobs, and are then consumed by map-reduce jobs > > which have no inter-process communication. > > > > What would ordering even *mean* among map-inputs? Or are you just > > referring to in each chunk itself? Or for non-MR use of the files? > > > > -jake > > > > On Sun, Nov 13, 2011 at 10:38 PM, Ted Dunning <[email protected]> > > wrote: > > > > > Make sure that the files can be ordered, of course. Losing the > ordering > > > can be really bad. > > > > > > On Sun, Nov 13, 2011 at 10:34 PM, Jake Mannix <[email protected]> > > > wrote: > > > > > > > Yeah, in particular, DistributedRowMatrix "is" simply a > > > > SequenceFile<IntWritable,VectorWritable>, when in its serialized > form. > > > As > > > > such, > > > > this "file" can be (and typically is) a series of part-* files in a > > > > directory (typically > > > > on HDFS). > > > > > > > > -jake > > > > > > > > On Sun, Nov 13, 2011 at 10:23 PM, Dmitriy Lyubimov < > [email protected] > > > > >wrote: > > > > > > > > > It's my understanding drm can be multifile. In fact, stuff like > > > > seq2sparse > > > > > will produce multifile output, being a MR job itself. > > > > > On Nov 12, 2011 3:23 PM, "Lance Norskog" <[email protected]> > wrote: > > > > > > > > > > > Is there a convention for multi-file matrices? For example, the > > > > > > DistributedRowMatrix? > > > > > > > > > > > > -- > > > > > > Lance Norskog > > > > > > [email protected] > > > > > > > > > > > > > > > > > > > > > > > > -- > Lance Norskog > [email protected] >
