RE: File formats in Hadoop: Sequence files vs AVRO vs RC vs ORC

2013-09-30 Thread java8964 java8964
n hadoop. I also like its schema-less vs schema objects both options design. It gives us some flexibility in designing MR jobs. Thanks Yong > From: wolfgang.wyre...@hotmail.com > To: user@hadoop.apache.org > Subject: File formats in Hadoop: Sequence files vs AVRO vs RC vs ORC > Date: M

Re: File formats in Hadoop: Sequence files vs AVRO vs RC vs ORC

2013-09-30 Thread Rahul Bhattacharjee
Sequence files are language neutral as Avro. Yes , but not sure about the support of other language lib for processing seq files. Thanks, Rahul On Mon, Sep 30, 2013 at 11:10 PM, Peyman Mohajerian wrote: > It is not recommended to keep the data at rest in sequences format, > because it is Java

Re: File formats in Hadoop: Sequence files vs AVRO vs RC vs ORC

2013-09-30 Thread Peyman Mohajerian
It is not recommended to keep the data at rest in sequences format, because it is Java specific and you cannot share it with other none-java systems easily, it is ideal for running map/reduce jobs. On approach would be to bring all the data of different formats in HDFS as is and then convert them t

Re: File formats in Hadoop: Sequence files vs AVRO vs RC vs ORC

2013-09-30 Thread Raj K Singh
for xml files processing hadoop comes with a class for this purpose called StreamXmlRecordReader,You can use it by setting your input format to StreamInputFormat and setting the stream.recordreader.class property to org.apache.hadoop.streaming.StreamXmlRecordReader. for Json files, an open-source

File formats in Hadoop: Sequence files vs AVRO vs RC vs ORC

2013-09-30 Thread Wolfgang Wyremba
Hello, the file format topic is still confusing me and I would appreciate if you could share your thoughts and experience with me. >From reading different books/articles/websites I understand that - Sequence files (used frequently but not only for binary data), - AVRO, - RC (was developed to work