The upsert statement appears the same as the psql results - i.e. extra cells. I 
will try the master branch next. Thanks for the tip.

---------- Original Message ----------
From: Sergey Soldatov <sergeysolda...@gmail.com>
To: user@phoenix.apache.org
Subject: Re: hbase cell storage different bewteen bulk load and direct api
Date: Thu, 19 Apr 2018 12:26:25 +0600


Hi Lew,no. 1st one looks line incorrect. You may file a bug on that ( I believe 
that the second case is correct, but you may also check with uploading data 
using regular upserts). Also, you may check whether the master branch has this 
issue. Thanks,Sergey
On Thu, Apr 19, 2018 at 10:19 AM, Lew Jackman <lew9...@netzero.net> wrote:
Under Phoenix 4.11 we are seeing some storage discrepancies in hbase between a 
load via psql and a bulk load.

To illustrate in a simple case we have modified the example table from the load 
reference https://phoenix.apache.org/bulk_dataload.html

CREATE TABLE example (
 Â Â Â my_pk bigint not null,
 Â Â Â m.first_name varchar(50),
 Â Â Â m.last_name varchar(50) 
 Â Â Â CONSTRAINT pk PRIMARY KEY (my_pk))
 Â Â Â IMMUTABLE_ROWS=true,
 Â Â Â IMMUTABLE_STORAGE_SCHEME = SINGLE_CELL_ARRAY_WITH_OFFSETS,
 Â Â Â COLUMN_ENCODED_BYTES = 1;

Hbase Rows when Loading via PSQL

 \\\\x80\\\\x00\\\\x00\\\\x00\\\\x00\\\\x0009 Â Â Â Â 
column=M:\\\\x00\\\\x00\\\\x00\\\\x00, timestamp=1524109827690, value=x     
         
 \\\\x80\\\\x00\\\\x00\\\\x00\\\\x00\\\\x0009 Â Â Â Â column=M:1, 
timestamp=1524109827690, 
value=xJohnDoe\\\\x00\\\\x00\\\\x00\\\\x01\\\\x00\\\\x05\\\\x00\\\\x00\\\\x00\\\\x08\\\\x00\\\\x00\\\\x00\\\\x03\\\\x02
 Â Â Â Â Â Â Â Â Â Â Â Â Â 
 \\\\x80\\\\x00\\\\x00\\\\x00\\\\x00\\\\x01\\\\x092 Â 
column=M:\\\\x00\\\\x00\\\\x00\\\\x00, timestamp=1524109827690, value=x     
         
 \\\\x80\\\\x00\\\\x00\\\\x00\\\\x00\\\\x01\\\\x092 Â column=M:1, 
timestamp=1524109827690, 
value=xMaryPoppins\\\\x00\\\\x00\\\\x00\\\\x01\\\\x00\\\\x05\\\\x00\\\\x00\\\\x00\\\\x0C\\\\x00\\\\x00\\\\x00\\\\x03\\\\x02
 Â Â Â Â Â Â Â Â Â Â Â Â Â 

Hbase Rows when Loading via MapReduce using CsvBulkLoadTool 

 \\\\x80\\\\x00\\\\x00\\\\x00\\\\x00\\\\x0009 Â Â Â Â column=M:1, 
timestamp=1524110486638, 
value=xJohnDoe\\\\x00\\\\x00\\\\x00\\\\x01\\\\x00\\\\x05\\\\x00\\\\x00\\\\x00\\\\x08\\\\x00\\\\x00\\\\x00\\\\x03\\\\x02
 Â Â Â Â Â Â Â Â Â Â Â Â Â 
 \\\\x80\\\\x00\\\\x00\\\\x00\\\\x00\\\\x01\\\\x092 Â column=M:1, 
timestamp=1524110486638, 
value=xMaryPoppins\\\\x00\\\\x00\\\\x00\\\\x01\\\\x00\\\\x05\\\\x00\\\\x00\\\\x00\\\\x0C\\\\x00\\\\x00\\\\x00\\\\x03\\\\x02
 Â Â Â Â Â Â Â Â Â Â Â Â Â 


So, the bulk loaded tables have 4 cells for the two rows loaded via psql 
whereas a bulk load is missing two cells since it lacks the cells with col 
qualifier :\\\\x00\\\\x00\\\\x00\\\\x00 Is this behavior correct? Thanks 
much for any insight. 

____________________________________________________________
How To "Remove" Dark Spots
Gundry MD
http://thirdpartyoffers.netzero.net/TGL3232/5ad818ce6211c18ce6b13st04vuc

Reply via email to