I have done something similar in another case with subclassing
DBInputFormat but I have no idea how this could be done with
PhoenixInputFormat without losing data locality (which is guaranteed as
long as one split represents one region).
Could this be achieved somehow? (the keys are salted)


Am 31.01.2019 um 00:24 schrieb Josh Elser:
> Please do not take this advice lightly. Adding (or increasing) salt
> buckets can have a serious impact on the execution of your queries.
>
> On 1/30/19 5:33 PM, venkata subbarayudu wrote:
>> You may recreate the table with salt_bucket table option to have
>> reasonable regions and you may try having a secondary index to make
>> the query run faster incase if your Mapreduce job performing specific
>> filters
>>
>> On Thu 31 Jan, 2019, 12:09 AM Thomas D'Silva <tdsi...@salesforce.com
>> <mailto:tdsi...@salesforce.com> wrote:
>>
>>     If stats are enabled PhoenixInputFormat will generate a split per
>>     guidepost.
>>
>>     On Wed, Jan 30, 2019 at 7:31 AM Josh Elser <els...@apache.org
>>     <mailto:els...@apache.org>> wrote:
>>
>>         You can extend/customize the PhoenixInputFormat with your own
>>         code to
>>         increase the number of InputSplits and Mappers.
>>
>>         On 1/30/19 6:43 AM, Edwin Litterst wrote:
>>          > Hi,
>>          > I am using PhoenixInputFormat as input source for
>> mapreduce jobs.
>>          > The split count (which determines how many mappers are used
>>         for the job)
>>          > is always equal to the number of regions of the table from
>>         where I
>>          > select the input.
>>          > Is there a way to increase the number of splits? My job is
>>         running too
>>          > slow with only one mapper for every region.
>>          > (Increasing the number of regions is no option.)
>>          > regards,
>>          > Eddie
>>
>

Reply via email to