Hi Raymond, Yes, I can confirm that splitting by a string field can cause issues in Sqoop that is why the org.apache.sqoop.splitter.allow_text_splitter property was introduced (see: SQOOP-2910). Which RDBMS do you use? If you use Oracle then you are lucky because the Oracle direct connector does not require a split-by column otherwise I am afraid there is no real solution to this problem currently.
Regards, Szabolcs On Wed, Jul 11, 2018 at 4:06 PM Raymond Xie <[email protected]> wrote: > Good day, > > In my situation my table has billion rows, it doesn't come with an integer > column as its key, that means if I use sqoop to do the import (into hive), > I would not be able to use multiple mapper. > As table's size is big, it is not realistic to add an extra new integer > field to it. > > I do come across a post from hortonworks which seems to suggest it is > possible however was commented that: > > 1. no guarantees though that sqoop splits your records evenly over your > mappers though. > 2. For huge number of row the above options will cause duplicates in the > results set. > > > https://community.hortonworks.com/questions/26961/sqoop-split-by-on-a-string-varchar-column.html > > > Any thought? > > > Thank you very much. > > *------------------------------------------------* > *Sincerely yours,* > > > *Raymond* > > * <http://www.cloudera.com>* >
