Hi Raymond,

Yes, I can confirm that splitting by a string field can cause issues in
Sqoop that is why the org.apache.sqoop.splitter.allow_text_splitter property
was introduced (see: SQOOP-2910).
Which RDBMS do you use? If you use Oracle then you are lucky because the
Oracle direct connector does not require a split-by column otherwise I am
afraid there is no real solution to this problem currently.

Regards,
Szabolcs

On Wed, Jul 11, 2018 at 4:06 PM Raymond Xie <[email protected]> wrote:

> Good day,
>
> In my situation my table has billion rows, it doesn't come with an integer
> column as its key, that means if I use sqoop to do the import (into hive),
> I would not be able to use multiple mapper.
> As table's size is big, it is not realistic to add an extra new integer
> field to it.
>
> I do come across a post from hortonworks which seems to suggest it is
> possible however was commented that:
>
> 1. no guarantees though that sqoop splits your records evenly over your
> mappers though.
> 2. For huge number of row the above options will cause duplicates in the
> results set.
>
>
> https://community.hortonworks.com/questions/26961/sqoop-split-by-on-a-string-varchar-column.html
>
>
> Any thought?
>
>
> Thank you very much.
>
> *------------------------------------------------*
> *Sincerely yours,*
>
>
> *Raymond*
>
> * <http://www.cloudera.com>*
>

Reply via email to