Avoiding skew and determining optimal number of mappers in SQOOP import.

sreejesh s Sun, 21 Jun 2015 02:39:20 -0700

Hi,


 If there is a primary key on the source table, SQOOP import would generate no 
skewed data... What if there is no primary key defined on the table and we have 
to use --split-by parameter to split records among multiple mappers. There are 
high chances of skewed data depending on the column we select to --split-by. 
Could you please help me understand how to avoid skewing in such scenarios and 
also how to determine the optimal number of mappers to be used for any SQOOP 
import.
It helps if you can explain how many mappers you have used in your use case 
along with the size and format of data imported.. 
 Thanks

Avoiding skew and determining optimal number of mappers in SQOOP import.

Reply via email to