Hi, If there is a primary key on the source table, SQOOP import would generate
no skewed data... What if there is no primary key defined on the table and we
have to use --split-by parameter to split records among multiple mappers. There
are high chances of skewed data depending on the column we select to
--split-by. Could you please help me understand how to avoid skewing in such
scenarios and also how to determine the optimal number of mappers to be used
for any SQOOP import.
It helps if you can explain how many mappers you have used in your use case
along with the size and format of data imported..
Thanks