I do not what is sotred in the hbase after inject a website. When I use the hbase shell $ scan 'webpage' , there are : hbase(main):028:0> scan '1_webpage' ROW COLUMN+CELL com.xinhuanet.www:http/ column=f:fi, timestamp=1371110099941, value=\x00'\x8D\x00 com.xinhuanet.www:http/ column=f:ts, timestamp=1371110099941, value=\x00\x00\x01?<\x87\xBA\x0A com.xinhuanet.www:http/ column=mk:_injmrk_, timestamp=1371110099941, value=y com.xinhuanet.www:http/ column=mk:dist, timestamp=1371110099941, value=0 com.xinhuanet.www:http/ column=mtdt:_csh_, timestamp=1371110099941, value=?\x80\x00\x00 com.xinhuanet.www:http/ column=s:s, timestamp=1371110099941, value=?\x80\x00\x00 1 row(s) in 0.0300 seconds
So, is only 6 column are setted in the hbase ? And what is the real data stored in it? I find that in the source code, there is a WebPage Class. I could not understand all, but I think there should be 24 fileds in the hbase for each webside. public static final String[] _ALL_FIELDS = {"baseUrl","status","fetchTime","prevFetchTime","fetchInterval","retriesSinceFetch","modifiedTime","prevModifiedTime","protocolStatus","content","contentType","prevSignature","signature","title","text","parseStatus","score","reprUrl","headers","outlinks","inlinks","markers","metadata","batchId",}; Thanks HeChuan