hi Raymond Postgresql also have a direct mode that don't need any split by. Direct mode is usually a good idea since it does not load the source database with multiple threads and as better performance
2018-08-14 10:42 GMT+02:00 Szabolcs Vasas <[email protected]>: > Hi Raymond, > > Yes, I can confirm that splitting by a string field can cause issues in > Sqoop that is why the org.apache.sqoop.splitter.allow_text_splitter property > was introduced (see: SQOOP-2910). > Which RDBMS do you use? If you use Oracle then you are lucky because the > Oracle direct connector does not require a split-by column otherwise I am > afraid there is no real solution to this problem currently. > > Regards, > Szabolcs > > On Wed, Jul 11, 2018 at 4:06 PM Raymond Xie <[email protected]> wrote: > >> Good day, >> >> In my situation my table has billion rows, it doesn't come with an >> integer column as its key, that means if I use sqoop to do the import (into >> hive), I would not be able to use multiple mapper. >> As table's size is big, it is not realistic to add an extra new integer >> field to it. >> >> I do come across a post from hortonworks which seems to suggest it is >> possible however was commented that: >> >> 1. no guarantees though that sqoop splits your records evenly over your >> mappers though. >> 2. For huge number of row the above options will cause duplicates in the >> results set. >> >> https://community.hortonworks.com/questions/26961/sqoop- >> split-by-on-a-string-varchar-column.html >> >> >> Any thought? >> >> >> Thank you very much. >> >> *------------------------------------------------* >> *Sincerely yours,* >> >> >> *Raymond* >> >> * <http://www.cloudera.com>* >> >
