Under Phoenix 4.11 we are seeing some storage discrepancies in hbase between a 
load via psql and a bulk load.

To illustrate in a simple case we have modified the example table from the load 
reference https://phoenix.apache.org/bulk_dataload.html

CREATE TABLE example (
    my_pk bigint not null,
    m.first_name varchar(50),
    m.last_name varchar(50) 
    CONSTRAINT pk PRIMARY KEY (my_pk))
    IMMUTABLE_ROWS=true,
    IMMUTABLE_STORAGE_SCHEME = SINGLE_CELL_ARRAY_WITH_OFFSETS,
    COLUMN_ENCODED_BYTES = 1;

Hbase Rows when Loading via PSQL

 \\x80\\x00\\x00\\x00\\x00\\x0009     column=M:\\x00\\x00\\x00\\x00, 
timestamp=1524109827690, value=x              
 \\x80\\x00\\x00\\x00\\x00\\x0009     column=M:1, timestamp=1524109827690, 
value=xJohnDoe\\x00\\x00\\x00\\x01\\x00\\x05\\x00\\x00\\x00\\x08\\x00\\x00\\x00\\x03\\x02
              
 \\x80\\x00\\x00\\x00\\x00\\x01\\x092  column=M:\\x00\\x00\\x00\\x00, 
timestamp=1524109827690, value=x              
 \\x80\\x00\\x00\\x00\\x00\\x01\\x092  column=M:1, timestamp=1524109827690, 
value=xMaryPoppins\\x00\\x00\\x00\\x01\\x00\\x05\\x00\\x00\\x00\\x0C\\x00\\x00\\x00\\x03\\x02
              

Hbase Rows when Loading via MapReduce using CsvBulkLoadTool 

 \\x80\\x00\\x00\\x00\\x00\\x0009     column=M:1, timestamp=1524110486638, 
value=xJohnDoe\\x00\\x00\\x00\\x01\\x00\\x05\\x00\\x00\\x00\\x08\\x00\\x00\\x00\\x03\\x02
              
 \\x80\\x00\\x00\\x00\\x00\\x01\\x092  column=M:1, timestamp=1524110486638, 
value=xMaryPoppins\\x00\\x00\\x00\\x01\\x00\\x05\\x00\\x00\\x00\\x0C\\x00\\x00\\x00\\x03\\x02
              


So, the bulk loaded tables have 4 cells for the two rows loaded via psql 
whereas a bulk load is missing two cells since it lacks the cells with col 
qualifier :\\x00\\x00\\x00\\x00 Is this behavior correct?  Thanks much for any 
insight. 
____________________________________________________________
How To "Remove" Dark Spots
Gundry MD
http://thirdpartyoffers.netzero.net/TGL3231/5ad818ce6211c18ce6b13st04vuc

Reply via email to