Re: Does Sqoop support small queries?

Erzsebet Szilagyi Fri, 01 Jul 2016 07:49:31 -0700

Hi Wei,
Let us know if fine tuning the number of map tasks solved your problem or
we should dig further into it.
Thanks,
Liz


On Fri, Jul 1, 2016 at 7:57 AM, Wei Yan <[email protected]> wrote:

> Thanks, Erzsebet and Markus. Tuning the number of map tasks can be a
> reasonal solution here, and I'll try that.
> As Sqoop 1 is a MapReduce job, I think it's hard to have both (1) many
> small queries and (2) limited concurrent executing queries.
>
> -Wei
>
> On Thu, Jun 30, 2016 at 3:50 PM, Erzsebet Szilagyi <
> [email protected]> wrote:
>
>> Hi Wei,
>> Markus (in CC) offered the following explanation:
>>
>> "
>> The Sqoop1 default is 4 map tasks.  When working with customers I usually
>> start with 1 and double the number of map tasks (e.g. 1, 2, 4, 8) until
>> finding a performance sweet spot while keeping in mind the potential rdbms
>> impact.
>>
>> Estimating the real rdbms impact is often challenging for some of the
>> following reasons:
>> 1. DBAs are often not present
>> 2. Jobs are often reviewed in isolation (excluding other simultaneous
>> Sqoop or non-sqoop workloads)
>> 3. Tests are often performed against smaller data volumes and/or virtual
>> resources than what will be in production (includes rdbms, network and had
>> pop cluster)
>> 4. There is not a uniform way to monitor/analyze impact across rdbms
>> vendors.
>> 4.1. I have not really tried to review Sqoop console debug from a dB
>> impact context, perhaps it could be used.
>> 5. Once deployed production job volumes often change
>>
>> Thanks, Markus
>> "
>>
>> On Wed, Jun 29, 2016 at 7:35 PM, Wei Yan <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> Would like to check whether Sqoop supports this type of ingestion:
>>> consider we have records with range [1,12], and we have 3 mappers. So in
>>> default, the 3 mappers will be assigned [1,4], [5, 8], [9, 12].
>>>
>>> Not sure whether we can split the range to smaller one, like, [1], [2],
>>> [3], ..., [12]. But still using 3 mappers instead of 12 mappers. We want
>>> this feature because: (1) if configured smaller mapper number, each mapper
>>> will be assigned a larger range and take much longer time to finish, and
>>> the infra may kill long running query; (2) But if we configured a larger
>>> mapper number, each mapper has a smaller range, but meanwhile we generates
>>> lots of network traffic to the database, which will also be bad. One good
>>> way we want is: still 12 ranges, but 3 mappers, and at most 3 concurrent
>>> connections at most.
>>>
>>> Appreciate any help here.
>>>
>>> -Wei
>>>
>>
>>
>>
>> --
>> Erzsebet Szilagyi
>> Software Engineer
>> [image: www.cloudera.com] <http://www.cloudera.com>
>>
>
>


-- 
Erzsebet Szilagyi
Software Engineer
[image: www.cloudera.com] <http://www.cloudera.com>

Re: Does Sqoop support small queries?

Reply via email to