SequenceFile is/was also the standard for binary data on Hadoop. The question is rather : what else would you expect? Surely not a text format?
Bertrand On Fri, Nov 7, 2014 at 3:51 AM, Lee S <sle...@gmail.com> wrote: > any other reasons or can you give a thorough analysis? > > 2014-11-05 11:00 GMT+08:00 Ted Dunning <ted.dunn...@gmail.com>: > > > > > Yes, type conversion is a reason. > > > > Sent from my iPhone > > > > > On Nov 4, 2014, at 18:59, Lee S <sle...@gmail.com> wrote: > > > > > > eg. kmeans input: > > > 1,2,3,4 //text file > > > kmeans output > > > point1, point2,point3(text file of center points) > > > > > > > > > I just thought of one reason. The input data should be storaged in > > > vector(dense or sparse) format ,so a conversion step > > > needs to be doned before algorithms deal with data. Is that right? > > > > > > 2014-11-04 23:56 GMT+08:00 Ted Dunning <ted.dunn...@gmail.com>: > > > > > >> What should the input be? > > >> > > >> > > >> > > >>> On Tue, Nov 4, 2014 at 12:28 AM, Lee S <sle...@gmail.com> wrote: > > >>> > > >>> Hi all: > > >>> I'm wondering why the input and output of most algorithm like > > >>> kmeans,naivebayes are all sequencefiles. One more step of conversion > > need > > >>> to be done if we want the algorithm works.And > > >>> I think the step is time consuming. Because it's also a mapreduce > job. > > >>> For the reason to deal with small files and compress to save disk > > >> space? > > >> > > >