Re: Why do most algorithms use sequencefile as input and output?

Bertrand Dechoux Sun, 09 Nov 2014 16:07:26 -0800

SequenceFile is/was also the standard for binary data on Hadoop. The
question is rather : what else would you expect? Surely not a text format?


Bertrand

On Fri, Nov 7, 2014 at 3:51 AM, Lee S <sle...@gmail.com> wrote:

> any other reasons or can you give a thorough analysis?
>
> 2014-11-05 11:00 GMT+08:00 Ted Dunning <ted.dunn...@gmail.com>:
>
> >
> > Yes, type conversion is a reason.
> >
> > Sent from my iPhone
> >
> > > On Nov 4, 2014, at 18:59, Lee S <sle...@gmail.com> wrote:
> > >
> > > eg. kmeans input:
> > > 1,2,3,4  //text file
> > > kmeans output
> > > point1, point2,point3(text file of center points)
> > >
> > >
> > > I just thought of one reason. The input data should be storaged in
> > > vector(dense or sparse) format ,so a conversion step
> > > needs to be doned before algorithms deal with data. Is that right?
> > >
> > > 2014-11-04 23:56 GMT+08:00 Ted Dunning <ted.dunn...@gmail.com>:
> > >
> > >> What should the input be?
> > >>
> > >>
> > >>
> > >>> On Tue, Nov 4, 2014 at 12:28 AM, Lee S <sle...@gmail.com> wrote:
> > >>>
> > >>> Hi all:
> > >>>  I'm wondering why the input and output of most algorithm like
> > >>> kmeans,naivebayes are all sequencefiles. One more step of conversion
> > need
> > >>> to be done if we want the algorithm works.And
> > >>> I think the step is time consuming. Because it's also a mapreduce
> job.
> > >>>  For the reason to deal with small files and compress to save disk
> > >> space?
> > >>
> >
>

Re: Why do most algorithms use sequencefile as input and output?

Reply via email to