Hello Wei,

In addition to Liz's request, can you share the volumes (depth and width)
of data you are working with?

I am curious to know if they are really as small (12 rows) as previously
noted.


Markus Kemper
Customer Operations Engineer
[image: www.cloudera.com] <http://www.cloudera.com>


On Fri, Jul 1, 2016 at 10:48 AM, Erzsebet Szilagyi <
[email protected]> wrote:

> Hi Wei,
> Let us know if fine tuning the number of map tasks solved your problem or
> we should dig further into it.
> Thanks,
> Liz
>
>
> On Fri, Jul 1, 2016 at 7:57 AM, Wei Yan <[email protected]> wrote:
>
>> Thanks, Erzsebet and Markus. Tuning the number of map tasks can be a
>> reasonal solution here, and I'll try that.
>> As Sqoop 1 is a MapReduce job, I think it's hard to have both (1) many
>> small queries and (2) limited concurrent executing queries.
>>
>> -Wei
>>
>> On Thu, Jun 30, 2016 at 3:50 PM, Erzsebet Szilagyi <
>> [email protected]> wrote:
>>
>>> Hi Wei,
>>> Markus (in CC) offered the following explanation:
>>>
>>> "
>>> The Sqoop1 default is 4 map tasks.  When working with customers I
>>> usually start with 1 and double the number of map tasks (e.g. 1, 2, 4, 8)
>>> until finding a performance sweet spot while keeping in mind the potential
>>> rdbms impact.
>>>
>>> Estimating the real rdbms impact is often challenging for some of the
>>> following reasons:
>>> 1. DBAs are often not present
>>> 2. Jobs are often reviewed in isolation (excluding other simultaneous
>>> Sqoop or non-sqoop workloads)
>>> 3. Tests are often performed against smaller data volumes and/or virtual
>>> resources than what will be in production (includes rdbms, network and had
>>> pop cluster)
>>> 4. There is not a uniform way to monitor/analyze impact across rdbms
>>> vendors.
>>> 4.1. I have not really tried to review Sqoop console debug from a dB
>>> impact context, perhaps it could be used.
>>> 5. Once deployed production job volumes often change
>>>
>>> Thanks, Markus
>>> "
>>>
>>> On Wed, Jun 29, 2016 at 7:35 PM, Wei Yan <[email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> Would like to check whether Sqoop supports this type of ingestion:
>>>> consider we have records with range [1,12], and we have 3 mappers. So in
>>>> default, the 3 mappers will be assigned [1,4], [5, 8], [9, 12].
>>>>
>>>> Not sure whether we can split the range to smaller one, like, [1], [2],
>>>> [3], ..., [12]. But still using 3 mappers instead of 12 mappers. We want
>>>> this feature because: (1) if configured smaller mapper number, each mapper
>>>> will be assigned a larger range and take much longer time to finish, and
>>>> the infra may kill long running query; (2) But if we configured a larger
>>>> mapper number, each mapper has a smaller range, but meanwhile we generates
>>>> lots of network traffic to the database, which will also be bad. One good
>>>> way we want is: still 12 ranges, but 3 mappers, and at most 3 concurrent
>>>> connections at most.
>>>>
>>>> Appreciate any help here.
>>>>
>>>> -Wei
>>>>
>>>
>>>
>>>
>>> --
>>> Erzsebet Szilagyi
>>> Software Engineer
>>> [image: www.cloudera.com] <http://www.cloudera.com>
>>>
>>
>>
>
>
> --
> Erzsebet Szilagyi
> Software Engineer
> [image: www.cloudera.com] <http://www.cloudera.com>
>

Reply via email to