Re: File formats in Hadoop

2011-03-22 Thread Weishung Chung
Thank you, I will definitely take a look. Also, the TFile spec below helps me to understand more, what an exciting work ! https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf

Re: File formats in Hadoop

2011-03-22 Thread Weishung Chung
My fellow superb hbase experts, Looking at the HFile specs and have some questions. How is a particular table cell in a HBase table being represented in the HFile? Does the key of the key value pair represent the rowkey+column family:qualifier+timestamp and the value represent the corresponding

Re: File formats in Hadoop

2011-03-22 Thread Weishung Chung
I also found this informative article http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.htmlis the key value pair be eg column family1 with one qualifier 1 with 2 versions key1 : rowkey1+column

Re: File formats in Hadoop

2011-03-22 Thread Weishung Chung
I found this useful article that explains the internal storage of HFile http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html On Tue, Mar 22, 2011 at 11:31 AM, Weishung Chung weish...@gmail.com wrote: I also

Re: File formats in Hadoop

2011-03-22 Thread Vivek Krishna
http://nosql.mypopescu.com/post/3220921756/hbase-internals-hfile-explained might help. Viv On Tue, Mar 22, 2011 at 11:43 AM, Weishung Chung weish...@gmail.com wrote: My fellow superb hbase experts, Looking at the HFile specs and have some questions. How is a particular table cell in a

Re: File formats in Hadoop

2011-03-22 Thread Ryan Rawson
Curious, why do you mention SequenceFile and TFile. Neither of those are either in the hbase.io, and TFile is not used anywhere in HBase. -ryan On Sat, Mar 19, 2011 at 9:01 AM, Weishung Chung weish...@gmail.com wrote: I am browsing through the hadoop.io package and was wondering what other

Re: File formats in Hadoop

2011-03-21 Thread Weishung Chung
I found this interesting article about sequence file, share it here http://www.cloudera.com/blog/2011/01/hadoop-io-sequence-map-set-array-bloommap-files/ On Sun, Mar 20, 2011 at 6:04 AM, Niels Basjes ni...@basjes.nl wrote: And then there is the matter of how you put the data in the file. I've

Re: File formats in Hadoop

2011-03-20 Thread Niels Basjes
And then there is the matter of how you put the data in the file. I've heard that some people write the data as protocolbuffers into the sequence file. 2011/3/19 Harsh J qwertyman...@gmail.com: Hello, On Sat, Mar 19, 2011 at 9:31 PM, Weishung Chung weish...@gmail.com wrote: I am browsing

File formats in Hadoop

2011-03-19 Thread Weishung Chung
I am browsing through the hadoop.io package and was wondering what other file formats are available in hadoop other than SequenceFile and TFile? Is all data written through hadoop including those from hbase saved in the above formats? It seems like SequenceFile is in key value pair format. Thank

Re: File formats in Hadoop

2011-03-19 Thread Harsh J
Hello, On Sat, Mar 19, 2011 at 9:31 PM, Weishung Chung weish...@gmail.com wrote: I am browsing through the hadoop.io package and was wondering what other file formats are available in hadoop other than SequenceFile and TFile? Additionally, on Hadoop, there're MapFiles/SetFiles (Derivative of