n hadoop. I also like its schema-less vs schema objects
both options design. It gives us some flexibility in designing MR jobs.
Thanks
Yong
> From: wolfgang.wyre...@hotmail.com
> To: user@hadoop.apache.org
> Subject: File formats in Hadoop: Sequence files vs AVRO vs RC vs ORC
> Date: M
Sequence files are language neutral as Avro. Yes , but not sure about the
support of other language lib for processing seq files.
Thanks,
Rahul
On Mon, Sep 30, 2013 at 11:10 PM, Peyman Mohajerian wrote:
> It is not recommended to keep the data at rest in sequences format,
> because it is Java
It is not recommended to keep the data at rest in sequences format, because
it is Java specific and you cannot share it with other none-java systems
easily, it is ideal for running map/reduce jobs. On approach would be to
bring all the data of different formats in HDFS as is and then convert them
t
for xml files processing hadoop comes with a class for this purpose called
StreamXmlRecordReader,You can use it by setting your input format to
StreamInputFormat and setting the
stream.recordreader.class property to
org.apache.hadoop.streaming.StreamXmlRecordReader.
for Json files, an open-source
Hello,
the file format topic is still confusing me and I would appreciate if you
could share your thoughts and experience with me.
>From reading different books/articles/websites I understand that
- Sequence files (used frequently but not only for binary data),
- AVRO,
- RC (was developed to work