Hello Wei, In addition to Liz's request, can you share the volumes (depth and width) of data you are working with?
I am curious to know if they are really as small (12 rows) as previously noted. Markus Kemper Customer Operations Engineer [image: www.cloudera.com] <http://www.cloudera.com> On Fri, Jul 1, 2016 at 10:48 AM, Erzsebet Szilagyi < [email protected]> wrote: > Hi Wei, > Let us know if fine tuning the number of map tasks solved your problem or > we should dig further into it. > Thanks, > Liz > > > On Fri, Jul 1, 2016 at 7:57 AM, Wei Yan <[email protected]> wrote: > >> Thanks, Erzsebet and Markus. Tuning the number of map tasks can be a >> reasonal solution here, and I'll try that. >> As Sqoop 1 is a MapReduce job, I think it's hard to have both (1) many >> small queries and (2) limited concurrent executing queries. >> >> -Wei >> >> On Thu, Jun 30, 2016 at 3:50 PM, Erzsebet Szilagyi < >> [email protected]> wrote: >> >>> Hi Wei, >>> Markus (in CC) offered the following explanation: >>> >>> " >>> The Sqoop1 default is 4 map tasks. When working with customers I >>> usually start with 1 and double the number of map tasks (e.g. 1, 2, 4, 8) >>> until finding a performance sweet spot while keeping in mind the potential >>> rdbms impact. >>> >>> Estimating the real rdbms impact is often challenging for some of the >>> following reasons: >>> 1. DBAs are often not present >>> 2. Jobs are often reviewed in isolation (excluding other simultaneous >>> Sqoop or non-sqoop workloads) >>> 3. Tests are often performed against smaller data volumes and/or virtual >>> resources than what will be in production (includes rdbms, network and had >>> pop cluster) >>> 4. There is not a uniform way to monitor/analyze impact across rdbms >>> vendors. >>> 4.1. I have not really tried to review Sqoop console debug from a dB >>> impact context, perhaps it could be used. >>> 5. Once deployed production job volumes often change >>> >>> Thanks, Markus >>> " >>> >>> On Wed, Jun 29, 2016 at 7:35 PM, Wei Yan <[email protected]> wrote: >>> >>>> Hi, >>>> >>>> Would like to check whether Sqoop supports this type of ingestion: >>>> consider we have records with range [1,12], and we have 3 mappers. So in >>>> default, the 3 mappers will be assigned [1,4], [5, 8], [9, 12]. >>>> >>>> Not sure whether we can split the range to smaller one, like, [1], [2], >>>> [3], ..., [12]. But still using 3 mappers instead of 12 mappers. We want >>>> this feature because: (1) if configured smaller mapper number, each mapper >>>> will be assigned a larger range and take much longer time to finish, and >>>> the infra may kill long running query; (2) But if we configured a larger >>>> mapper number, each mapper has a smaller range, but meanwhile we generates >>>> lots of network traffic to the database, which will also be bad. One good >>>> way we want is: still 12 ranges, but 3 mappers, and at most 3 concurrent >>>> connections at most. >>>> >>>> Appreciate any help here. >>>> >>>> -Wei >>>> >>> >>> >>> >>> -- >>> Erzsebet Szilagyi >>> Software Engineer >>> [image: www.cloudera.com] <http://www.cloudera.com> >>> >> >> > > > -- > Erzsebet Szilagyi > Software Engineer > [image: www.cloudera.com] <http://www.cloudera.com> >
