Markus, the [1,12] is an example :)
The table has 100+ millions records.

On Fri, Jul 1, 2016 at 7:57 AM, Markus Kemper <[email protected]> wrote:

> Hello Wei,
>
> In addition to Liz's request, can you share the volumes (depth and width)
> of data you are working with?
>
> I am curious to know if they are really as small (12 rows) as previously
> noted.
>
>
> Markus Kemper
> Customer Operations Engineer
> [image: www.cloudera.com] <http://www.cloudera.com>
>
>
> On Fri, Jul 1, 2016 at 10:48 AM, Erzsebet Szilagyi <
> [email protected]> wrote:
>
>> Hi Wei,
>> Let us know if fine tuning the number of map tasks solved your problem or
>> we should dig further into it.
>> Thanks,
>> Liz
>>
>>
>> On Fri, Jul 1, 2016 at 7:57 AM, Wei Yan <[email protected]> wrote:
>>
>>> Thanks, Erzsebet and Markus. Tuning the number of map tasks can be a
>>> reasonal solution here, and I'll try that.
>>> As Sqoop 1 is a MapReduce job, I think it's hard to have both (1) many
>>> small queries and (2) limited concurrent executing queries.
>>>
>>> -Wei
>>>
>>> On Thu, Jun 30, 2016 at 3:50 PM, Erzsebet Szilagyi <
>>> [email protected]> wrote:
>>>
>>>> Hi Wei,
>>>> Markus (in CC) offered the following explanation:
>>>>
>>>> "
>>>> The Sqoop1 default is 4 map tasks.  When working with customers I
>>>> usually start with 1 and double the number of map tasks (e.g. 1, 2, 4, 8)
>>>> until finding a performance sweet spot while keeping in mind the potential
>>>> rdbms impact.
>>>>
>>>> Estimating the real rdbms impact is often challenging for some of the
>>>> following reasons:
>>>> 1. DBAs are often not present
>>>> 2. Jobs are often reviewed in isolation (excluding other simultaneous
>>>> Sqoop or non-sqoop workloads)
>>>> 3. Tests are often performed against smaller data volumes and/or
>>>> virtual resources than what will be in production (includes rdbms, network
>>>> and had pop cluster)
>>>> 4. There is not a uniform way to monitor/analyze impact across rdbms
>>>> vendors.
>>>> 4.1. I have not really tried to review Sqoop console debug from a dB
>>>> impact context, perhaps it could be used.
>>>> 5. Once deployed production job volumes often change
>>>>
>>>> Thanks, Markus
>>>> "
>>>>
>>>> On Wed, Jun 29, 2016 at 7:35 PM, Wei Yan <[email protected]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Would like to check whether Sqoop supports this type of ingestion:
>>>>> consider we have records with range [1,12], and we have 3 mappers. So in
>>>>> default, the 3 mappers will be assigned [1,4], [5, 8], [9, 12].
>>>>>
>>>>> Not sure whether we can split the range to smaller one, like, [1],
>>>>> [2], [3], ..., [12]. But still using 3 mappers instead of 12 mappers. We
>>>>> want this feature because: (1) if configured smaller mapper number, each
>>>>> mapper will be assigned a larger range and take much longer time to 
>>>>> finish,
>>>>> and the infra may kill long running query; (2) But if we configured a
>>>>> larger mapper number, each mapper has a smaller range, but meanwhile we
>>>>> generates lots of network traffic to the database, which will also be bad.
>>>>> One good way we want is: still 12 ranges, but 3 mappers, and at most 3
>>>>> concurrent connections at most.
>>>>>
>>>>> Appreciate any help here.
>>>>>
>>>>> -Wei
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Erzsebet Szilagyi
>>>> Software Engineer
>>>> [image: www.cloudera.com] <http://www.cloudera.com>
>>>>
>>>
>>>
>>
>>
>> --
>> Erzsebet Szilagyi
>> Software Engineer
>> [image: www.cloudera.com] <http://www.cloudera.com>
>>
>
>

Reply via email to