Also it's the easiest way to SerDe any complex stuff and get split + block
compression features since SeqFiles are splittable and could be compressed
by default. See the code, it has really complex stuff to transfer between
jobs.

2014-11-10 3:06 GMT+03:00 Bertrand Dechoux <decho...@gmail.com>:

> SequenceFile is/was also the standard for binary data on Hadoop. The
> question is rather : what else would you expect? Surely not a text format?
>
> Bertrand
>
> On Fri, Nov 7, 2014 at 3:51 AM, Lee S <sle...@gmail.com> wrote:
>
> > any other reasons or can you give a thorough analysis?
> >
> > 2014-11-05 11:00 GMT+08:00 Ted Dunning <ted.dunn...@gmail.com>:
> >
> > >
> > > Yes, type conversion is a reason.
> > >
> > > Sent from my iPhone
> > >
> > > > On Nov 4, 2014, at 18:59, Lee S <sle...@gmail.com> wrote:
> > > >
> > > > eg. kmeans input:
> > > > 1,2,3,4  //text file
> > > > kmeans output
> > > > point1, point2,point3(text file of center points)
> > > >
> > > >
> > > > I just thought of one reason. The input data should be storaged in
> > > > vector(dense or sparse) format ,so a conversion step
> > > > needs to be doned before algorithms deal with data. Is that right?
> > > >
> > > > 2014-11-04 23:56 GMT+08:00 Ted Dunning <ted.dunn...@gmail.com>:
> > > >
> > > >> What should the input be?
> > > >>
> > > >>
> > > >>
> > > >>> On Tue, Nov 4, 2014 at 12:28 AM, Lee S <sle...@gmail.com> wrote:
> > > >>>
> > > >>> Hi all:
> > > >>>  I'm wondering why the input and output of most algorithm like
> > > >>> kmeans,naivebayes are all sequencefiles. One more step of
> conversion
> > > need
> > > >>> to be done if we want the algorithm works.And
> > > >>> I think the step is time consuming. Because it's also a mapreduce
> > job.
> > > >>>  For the reason to deal with small files and compress to save disk
> > > >> space?
> > > >>
> > >
> >
>

Reply via email to