David, Well I think sqoop is looking at "mapred.map.tasks". Do you have that set in mapred-site.xml? I thought that defaults to 2.
-Abe On Wed, Jun 19, 2013 at 4:31 PM, Abraham Elmahrek <[email protected]> wrote: > David, > > I've created https://issues.apache.org/jira/browse/SQOOP-1093 to track > the documentation issue. Thanks for bringing this to the community's > attention! > > -Abe > > > On Wed, Jun 19, 2013 at 4:21 PM, Abraham Elmahrek <[email protected]>wrote: > >> Hey David, >> >> With oracle, the BigDecimalSplitter will be used by default for all >> number types. >> >> -Abe >> >> >> On Wed, Jun 19, 2013 at 4:05 PM, David Kincaid <[email protected]>wrote: >> >>> Abe, the database is Oracle. >>> >>> >>> On Wed, Jun 19, 2013 at 5:48 PM, Abraham Elmahrek <[email protected]>wrote: >>> >>>> David, >>>> >>>> What database are you importing from? The description I gave was for >>>> datatypes that map to the BigDecimal Splitter. The userguide might be >>>> referring to the IntegerSplitter which will add the remainder to the last >>>> value. >>>> >>>> -Abe >>>> >>>> >>>> On Wed, Jun 19, 2013 at 1:23 PM, David Kincaid >>>> <[email protected]>wrote: >>>> >>>>> Thanks. We didn't specify the number of mappers, so it's giving us 4. >>>>> I understand your explanation, but it seems to conflict with the Sqoop >>>>> user >>>>> guide ( >>>>> http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_controlling_parallelism >>>>> ): >>>>> >>>>> "When performing parallel imports, Sqoop needs a criterion by which >>>>> it can split the workload. Sqoop uses a *splitting column* to split >>>>> the workload. By default, Sqoop will identify the primary key column (if >>>>> present) in a table and use it as the splitting column. The low and high >>>>> values for the splitting column are retrieved from the database, and the >>>>> map tasks operate on evenly-sized components of the total range. For >>>>> example, if you had a table with a primary key column of id whose >>>>> minimum value was 0 and maximum value was 1000, and Sqoop was directed to >>>>> use 4 tasks, Sqoop would run four processes which each execute SQL >>>>> statements of the form SELECT * FROM sometable WHERE id >= lo AND id >>>>> < hi, with (lo, hi) set to (0, 250), (250, 500), (500, 750), and >>>>> (750, 1001) in the different tasks." >>>>> >>>>> >>>>> On Wed, Jun 19, 2013 at 3:14 PM, Abraham Elmahrek >>>>> <[email protected]>wrote: >>>>> >>>>>> Hey David, >>>>>> >>>>>> Here's the algorithm: >>>>>> Split lengths are defined by (max - min)/(# mappers) and whatever is >>>>>> left is tacked on at the end. So in this case, (288272191-2110)/3 = >>>>>> 96090027.33... So I'm assuming the .33... is rounded down and split >>>>>> lengths >>>>>> will be of length 96090027. Sqoop will then create splits with the >>>>>> following points: (min) + (range length)*(n). We can see that 2110 + >>>>>> 96090027*0 >>>>>> = 2110, 2110 + 96090027*1 = 96092137, 2110 + 96090027*2 = 192182164, >>>>>> and 2110 + 96090027*3 = 288272191 will be generated based off of >>>>>> this algorithm. The last point to be added will be 288272192 because >>>>>> the max value is not part of the generated split points. Then sqoop will >>>>>> distributed accordingly based off of these points as you've pointed out >>>>>> above. >>>>>> >>>>>> Just to be sure, did you configure sqoop to use 3 mappers? >>>>>> >>>>>> Hope this helps, >>>>>> -Abe >>>>>> >>>>>> >>>>>> On Wed, Jun 19, 2013 at 8:33 AM, David Kincaid < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> We're seeing a strange thing happen with a sqoop import job with the >>>>>>> way the key range is getting distributed among the 4 mappers that are >>>>>>> running. The minimum key value is 2110 and the maximum value is >>>>>>> 288272191. >>>>>>> We are getting one mapper that is only getting one record to import. >>>>>>> Here >>>>>>> is the distribution among the mappers: >>>>>>> >>>>>>> [2110, 96092137) >>>>>>> [96092137, 192182164) >>>>>>> [192182164, 288272191) >>>>>>> [288272191, 288272192) >>>>>>> >>>>>>> you can see that the fourth mapper is given a range with only one >>>>>>> value in it. Could someone help me understand what is going on? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Dave >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >
