I observer that there is some extra mutations in batch for every my UPSERTs
For example if app call executeUpdate() only 5 times then on commit there will
be "DEBUG MutationState:1046 - Sent batch of 10"
Can’t figure out where this extra mutations comes from and why.
This is mean that “useful” batch size is phoenix.mutate.batchSize / 2.
> * What does your table DDL look like?
CREATE TABLE IF NOT EXISTS TABLE_CODES (
"id" VARCHAR NOT NULL PRIMARY KEY,
"d"."tg" VARCHAR,
"d"."drip" VARCHAR,
"d"."s" UNSIGNED_TINYINT,
"d"."se" UNSIGNED_TINYINT,
"d"."rle" UNSIGNED_TINYINT,
"d"."dme" TIMESTAMP,
"d"."dpa" TIMESTAMP,
"d"."p" VARCHAR,
"d"."pt" UNSIGNED_TINYINT,
"d"."x" VARCHAR,
"d"."pn" VARCHAR,
"d"."b" VARCHAR,
"d"."hc" VARCHAR ARRAY,
"d"."ns" VARCHAR(16),
"d"."tv" VARCHAR(10),
"d"."vcp" VARCHAR,
"d"."et" UNSIGNED_TINYINT,
"d"."xoa" BINARY(16),
"d"."j" VARCHAR
) SALT_BUCKETS=30, COLUMN_ENCODED_BYTES=NONE;
CREATE INDEX "IDX_CIS_O" ON "TABLE_CODES" ("d"."x", "d"."dme")
INCLUDE("d"."tg", "d"."rle", "d"."pt" ... ) SALT_BUCKETS=30;
CREATE INDEX "IDX_CIS_PRID" ON "TABLE_CODES" ("d"."drip", "d"."dme")
INCLUDE("d"."tg", "d"."rle", "d"."pt" ...) SALT_BUCKETS=30;
For my case SALT_BUCKET=30 every batch with default settings will carry only 50
“useful” rows and they will be splitted across 30 servers, so every server will
get only 1-2 rows.
> * How large is one mutation you're writing (in bytes)?
Any idea how to calculate it?
https://phoenix.apache.org/metrics.html
<https://phoenix.apache.org/metrics.html> will give me total mutations count
and total size in bytes of batch. But as I mentioned before there is “extra”
mutation that will corrupt statistics
> * How much data ends up being sent to a RegionServer in one RPC?
Where I can get this metric?
> On 3 Sep 2019, at 17:19, Josh Elser <[email protected]> wrote:
>
> Hey Alexander,
>
> Was just poking at the code for this: it looks like this is really just
> determining the number of mutations that get "processed together" (as opposed
> to a hard limit).
>
> Since you have done some work, I'm curious if you could generate some data to
> help back up your suggestion:
>
> * What does your table DDL look like?
> * How large is one mutation you're writing (in bytes)?
> * How much data ends up being sent to a RegionServer in one RPC?
>
> You're right in that we would want to make sure that we're sending an
> adequate amount of data to a RegionServer in an RPC, but this is tricky to
> balance for all cases (thus, setting a smaller value to avoid sending batches
> that are too large is safer).
>
> On 9/3/19 8:03 AM, Alexander Batyrshin wrote:
>> Hello all,
>> 1) There is bug in documentation - http://phoenix.apache.org/tuning.html
>> phoenix.mutate.batchSize is not 1000, but only 100 by default
>> https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/query/QueryServicesOptions.java#L164
>> Changed for https://issues.apache.org/jira/browse/PHOENIX-541
>> 2) I want to discuss this default value. From PHOENIX-541
>> <https://issues.apache.org/jira/browse/PHOENIX-541> I read about issue with
>> MR and wide rows (2MB per row) and it looks like rare case. But in most
>> common cases we can get much better write perfomance with batchSize = 1000
>> especially if it used with SALT table