I believe we still rely on that empty key value, even for compact storage formats (though theoretically it could likely be made so we don't - JIRA, please?) A quick test would confirm: - upsert a row with no last_name or first_name - select * from T where last_name IS NULL If the row isn't returned, then we need that empty key value.
Thanks, James On Thu, Apr 19, 2018 at 1:58 PM, Sergey Soldatov <sergeysolda...@gmail.com> wrote: > Heh. That looks like a bug actually. This is a 'dummy' KV ( > https://phoenix.apache.org/faq.html#Why_empty_key_value), but I have some > doubts that we need it for compacted rows. > > Thanks, > Sergey > > On Thu, Apr 19, 2018 at 11:30 PM, Lew Jackman <lew9...@netzero.net> wrote: > >> I have not tried the master yet branch yet, however on Phoenix 4.13 this >> storage discrepancy in hbase is still present with the extra >> column=M:\x00\x00\x00\x00 cells in hbase when using psql or sqlline. >> >> Does anyone have an understanding of the meaning of the column qualifier >> \x00\x00\x00\x00 ? >> >> >> ---------- Original Message ---------- >> From: "Lew Jackman" <lew9...@netzero.net> >> To: user@phoenix.apache.org >> Cc: user@phoenix.apache.org >> Subject: Re: hbase cell storage different bewteen bulk load and direct api >> Date: Thu, 19 Apr 2018 13:59:16 GMT >> >> The upsert statement appears the same as the psql results - i.e. extra >> cells. I will try the master branch next. Thanks for the tip. >> >> ---------- Original Message ---------- >> From: Sergey Soldatov <sergeysolda...@gmail.com> >> To: user@phoenix.apache.org >> Subject: Re: hbase cell storage different bewteen bulk load and direct api >> Date: Thu, 19 Apr 2018 12:26:25 +0600 >> >> Hi Lew, >> no. 1st one looks line incorrect. You may file a bug on that ( I believe >> that the second case is correct, but you may also check with uploading data >> using regular upserts). Also, you may check whether the master branch has >> this issue. >> >> Thanks, >> Sergey >> >> On Thu, Apr 19, 2018 at 10:19 AM, Lew Jackman <lew9...@netzero.net> >> wrote: >> >>> Under Phoenix 4.11 we are seeing some storage discrepancies in hbase >>> between a load via psql and a bulk load. >>> >>> To illustrate in a simple case we have modified the example table from >>> the load reference https://phoenix.apache.org/bulk_dataload.html >>> >>> CREATE TABLE example ( >>>    my_pk bigint not null, >>>    m.first_name varchar(50), >>>    m.last_name varchar(50) >>>    CONSTRAINT pk PRIMARY KEY (my_pk)) >>>    IMMUTABLE_ROWS=true, >>>    IMMUTABLE_STORAGE_SCHEME = SINGLE_CELL_ARRAY_WITH_OFFSETS, >>>    COLUMN_ENCODED_BYTES = 1; >>> >>> Hbase Rows when Loading via PSQL >>> >>> \\\\\\\\x80\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x0009 >>>     column=M:\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00, >>> timestamp=1524109827690, value=x              >>> \\\\\\\\x80\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x0009 >>>     column=M:1, timestamp=1524109827690, value=xJohnDoe\\\\\\\\x00\\\\\ >>> \\\x00\\\\\\\\x00\\\\\\\\x01\\\\\\\\x00\\\\\\\\x05\\\\\\\\x0 >>> 0\\\\\\\\x00\\\\\\\\x00\\\\\\\\x08\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x03\\\\\\\\x02 >>>              >>> \\\\\\\\x80\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x01\\\\\\\\x092 >>>  column=M:\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00, >>> timestamp=1524109827690, value=x              >>> \\\\\\\\x80\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x01\\\\\\\\x092 >>>  column=M:1, timestamp=1524109827690, value=xMaryPoppins\\\\\\\\x00\ >>> \\\\\\\x00\\\\\\\\x00\\\\\\\\x01\\\\\\\\x00\\\\\\\\x05\\\\\\ >>> \\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x0C\\\\\\\\x00\\\\\\\\x00 >>> \\\\\\\\x00\\\\\\\\x03\\\\\\\\x02              >>> >>> Hbase Rows when Loading via MapReduce using CsvBulkLoadTool >>> >>> \\\\\\\\x80\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x0009 >>>     column=M:1, timestamp=1524110486638, value=xJohnDoe\\\\\\\\x00\\\\\ >>> \\\x00\\\\\\\\x00\\\\\\\\x01\\\\\\\\x00\\\\\\\\x05\\\\\\\\x0 >>> 0\\\\\\\\x00\\\\\\\\x00\\\\\\\\x08\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x03\\\\\\\\x02 >>>              >>> \\\\\\\\x80\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x01\\\\\\\\x092 >>>  column=M:1, timestamp=1524110486638, value=xMaryPoppins\\\\\\\\x00\ >>> \\\\\\\x00\\\\\\\\x00\\\\\\\\x01\\\\\\\\x00\\\\\\\\x05\\\\\\ >>> \\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x0C\\\\\\\\x00\\\\\\\\x00 >>> \\\\\\\\x00\\\\\\\\x03\\\\\\\\x02              >>> >>> >>> So, the bulk loaded tables have 4 cells for the two rows loaded via psql >>> whereas a bulk load is missing two cells since it lacks the cells with col >>> qualifier :\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00 >>>  >>> Is this behavior correct? >>>  >>> Thanks much for any insight. >>>  >>> >>> >>> ____________________________________________________________ >>> *How To "Remove" Dark Spots* >>> Gundry MD >>> >>> <http://thirdpartyoffers.netzero.net/TGL3232/5ad818ce6211c18ce6b13st04vuc> >>> http://thirdpartyoffers.netzero.net/TGL3232/5ad818ce6211c18ce6b13st04vuc >>> [image: SponsoredBy Content.Ad] >> >> >