eg. kmeans input: 1,2,3,4 //text file kmeans output: point1, point2,point3(text file of center points)
I just thought of one reason. The input data should be storaged in vector(dense or sparse) format ,so a conversion step needs to be doned before algorithms deal with data. Is that right? 2014-11-04 23:56 GMT+08:00 Ted Dunning <ted.dunn...@gmail.com>: > What should the input be? > > > > On Tue, Nov 4, 2014 at 12:28 AM, Lee S <sle...@gmail.com> wrote: > > > Hi all: > > I'm wondering why the input and output of most algorithm like > > kmeans,naivebayes are all sequencefiles. One more step of conversion need > > to be done if we want the algorithm works.And > > I think the step is time consuming. Because it's also a mapreduce job. > > For the reason to deal with small files and compress to save disk > space? > > >