Pratik,

Sqoop calculates the split condition by firing this

select min(split_by_col), max(split_by_col) from table;

The max and min is calculated by sorting the split column and string sorting 
could be different from numeric sorting.

After retrieving the min and max value of the column, split size is calculated:
split_size = (max – min) / no_of_mappers


From: pratik khadloya [mailto:[email protected]]
Sent: Thursday, September 18, 2014 4:12 PM
To: [email protected]
Subject: Re: Complex free form queries

Thanks Venkat. Do you know of any example for "a complex query with a split by 
column that can generate incorrect data in each of the mappers".
I haven't yet understood the corner case when sqoop will not work. If we have 
knowledge about it then we can avoid that pitfall and also enlighten others 
precisely to not fall into it.

Thanks & Regards,
Pratik

On Thu, Sep 18, 2014 at 3:43 PM, Venkat Ranganathan 
<[email protected]<mailto:[email protected]>> wrote:
There are a few scenarios where we warn against inconsistencies.   Using a 
character column as a split by column, using complex queries with split by 
column that can potentially generate incorrect data in each of the mappers than 
what is intended.

If you use -m 1 option, then you don't have the inconsistency issues.

Venkat

On Thu, Sep 18, 2014 at 2:40 PM, pratik khadloya 
<[email protected]<mailto:[email protected]>> wrote:
Am not facing any problem. Am checking to see what are the reservations against 
not supporting complex joins with OR conditions.
I would like to know when it could create a problem and would the problem be 
solvable by using a "view" or limiting the number of mappers to just 1.
I would like to know if the problem if any is due to the parallelism which 
comes with increasing the number of mappers?

~Pratik

On Thu, Sep 18, 2014 at 1:23 PM, Sambit Tripathy (RBEI/PJ-NBS) 
<[email protected]<mailto:[email protected]>> wrote:
Pratik,

Are you facing a problem or trying to make a recommendation?


Regards,
Sambit.


From: pratik khadloya [mailto:[email protected]<mailto:[email protected]>]
Sent: Thursday, September 18, 2014 1:09 PM
To: [email protected]<mailto:[email protected]>
Subject: Complex free form queries

The sqoop docs say:

The facility of using free-form query in the current version of Sqoop is 
limited to simple queries where there are no ambiguous projections and no OR 
conditions in the WHERE clause. Use of complex queries such as queries that 
have sub-queries or joins leading to ambiguous projections can lead to 
unexpected results.

Does anyone know why such is case is not supported and can it be avoided by:

a) Using only 1 mapper
or
b) Creating a view out of the complex query

I have tested a hive textfile import for a very complex query and verified the 
data and it seems to be correct. I checked the number of words, number of lines 
and file sizes of the dump from mysql vs the text file imported onto hdfs by 
sqoop.
My query does have OR conditions. I have attached an obfuscated version of the 
query, and that screenprint is still 1/2 of the complete query.

Any info on this will be helpful.

Thanks,
Pratik



CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.

Reply via email to