hi Raymond

Postgresql also have a direct mode that don't need any split by.
Direct mode  is usually a good idea since it does not load the source
database
with multiple threads and as better performance


2018-08-14 10:42 GMT+02:00 Szabolcs Vasas <[email protected]>:

> Hi Raymond,
>
> Yes, I can confirm that splitting by a string field can cause issues in
> Sqoop that is why the org.apache.sqoop.splitter.allow_text_splitter property
> was introduced (see: SQOOP-2910).
> Which RDBMS do you use? If you use Oracle then you are lucky because the
> Oracle direct connector does not require a split-by column otherwise I am
> afraid there is no real solution to this problem currently.
>
> Regards,
> Szabolcs
>
> On Wed, Jul 11, 2018 at 4:06 PM Raymond Xie <[email protected]> wrote:
>
>> Good day,
>>
>> In my situation my table has billion rows, it doesn't come with an
>> integer column as its key, that means if I use sqoop to do the import (into
>> hive), I would not be able to use multiple mapper.
>> As table's size is big, it is not realistic to add an extra new integer
>> field to it.
>>
>> I do come across a post from hortonworks which seems to suggest it is
>> possible however was commented that:
>>
>> 1. no guarantees though that sqoop splits your records evenly over your
>> mappers though.
>> 2. For huge number of row the above options will cause duplicates in the
>> results set.
>>
>> https://community.hortonworks.com/questions/26961/sqoop-
>> split-by-on-a-string-varchar-column.html
>>
>>
>> Any thought?
>>
>>
>> Thank you very much.
>>
>> *------------------------------------------------*
>> *Sincerely yours,*
>>
>>
>> *Raymond*
>>
>> * <http://www.cloudera.com>*
>>
>

Reply via email to