I do not what is sotred in the hbase after inject a website.
When I use the hbase shell  $ scan 'webpage'  , there are :
hbase(main):028:0> scan '1_webpage'
ROW                                  COLUMN+CELL                                
                                                            
 com.xinhuanet.www:http/             column=f:fi, timestamp=1371110099941, 
value=\x00'\x8D\x00                                              
 com.xinhuanet.www:http/             column=f:ts, timestamp=1371110099941, 
value=\x00\x00\x01?<\x87\xBA\x0A                                 
 com.xinhuanet.www:http/             column=mk:_injmrk_, 
timestamp=1371110099941, value=y                                                
   
 com.xinhuanet.www:http/             column=mk:dist, timestamp=1371110099941, 
value=0                                                       
 com.xinhuanet.www:http/             column=mtdt:_csh_, 
timestamp=1371110099941, value=?\x80\x00\x00                                    
    
 com.xinhuanet.www:http/             column=s:s, timestamp=1371110099941, 
value=?\x80\x00\x00                                               
1 row(s) in 0.0300 seconds


So, is only 6 column are setted in the hbase ? And what is the real data stored 
in it?
I find that in the source code, there is a WebPage Class.  I could not 
understand all, but I think there should be 24 fileds in the hbase for each 
webside.  
  public static final String[] _ALL_FIELDS = 
{"baseUrl","status","fetchTime","prevFetchTime","fetchInterval","retriesSinceFetch","modifiedTime","prevModifiedTime","protocolStatus","content","contentType","prevSignature","signature","title","text","parseStatus","score","reprUrl","headers","outlinks","inlinks","markers","metadata","batchId",};


Thanks 
HeChuan

Reply via email to