Maybe there are differences in default
values on different clusters running different hadoop versions for
USE_STATS_FOR_PARALLELIZATION.
With hadoop 3.1.x and phoenix 5.0 useStatsForParallelization is
true by default and the number of splits = guidepost count +
I have done something similar in another case with subclassing
DBInputFormat but I have no idea how this could be done with
PhoenixInputFormat without losing data locality (which is guaranteed as
long as one split represents one region).
Could this be achieved somehow? (the keys are salted)
Am 31
Please do not take this advice lightly. Adding (or increasing) salt
buckets can have a serious impact on the execution of your queries.
On 1/30/19 5:33 PM, venkata subbarayudu wrote:
You may recreate the table with salt_bucket table option to have
reasonable regions and you may try having a sec
As Thomas said, no. of splits will be equal to the number of guideposts
available for the table or the ones required to cover the filter.
if you are seeing one split per region then either stats are disabled or
guidePostwidth is set higher than the size of the region , so try reducing
the guidepost
You may recreate the table with salt_bucket table option to have reasonable
regions and you may try having a secondary index to make the query run
faster incase if your Mapreduce job performing specific filters
On Thu 31 Jan, 2019, 12:09 AM Thomas D'Silva If stats are enabled PhoenixInputFormat w
If stats are enabled PhoenixInputFormat will generate a split per
guidepost.
On Wed, Jan 30, 2019 at 7:31 AM Josh Elser wrote:
> You can extend/customize the PhoenixInputFormat with your own code to
> increase the number of InputSplits and Mappers.
>
> On 1/30/19 6:43 AM, Edwin Litterst wrote:
>
You can extend/customize the PhoenixInputFormat with your own code to
increase the number of InputSplits and Mappers.
On 1/30/19 6:43 AM, Edwin Litterst wrote:
Hi,
I am using PhoenixInputFormat as input source for mapreduce jobs.
The split count (which determines how many mappers are used for t
Hi,
I am using PhoenixInputFormat as input source for mapreduce jobs.
The split count (which determines how many mappers are used for the job) is always equal to the number of regions of the table from where I select the input.
Is there a way to increase the number of splits? My job is runnin