RE: Joins in Sqoop

Sambit Tripathy (RBEI/PJ-NBS) Sun, 02 Feb 2014 05:21:45 -0800

I tried  Option 2 and things are working fine. Thank you all for your valuable 
suggestions.

Meanwhile, I came across group_concat() in Impala which does exactly the same 
what you can with MySql. So in case you have Impala configured in your cluster, 
you can try this

1.      Import all the tables to Hive using Sqoop

2.      Make proper delimeter adjustment during import

3.      Run the query in Impala shell

Regards,
Sambit.

From: Chalcy [mailto:[email protected]]
Sent: Tuesday, January 21, 2014 10:37 PM
To: [email protected]
Subject: Re: Joins in Sqoop

I thought so since you did not have space in MySql db.

You have to find a way to break the data into smaller buckets and then try to 
get it to hive or like I suggested originally, bring the data into hive and do 
what you have to do there,

The options for hive are,
1. Get only the joined data into hive and in hive concatenate the columns and 
store into another table and then do a group by that concatenated field.
2. Get tables individually into hive, do the join and also concatenate the 
fields and insert them into a new table and then run group by.
3.  Write a simple hive udf to do what the mysql function does after bringing 
the join tables into hive
4. Bring the join tables into hive and write a mapreduce to do what you want to 
do.

Hope you find a way to do it and post it here.

Thanks,
Chalcy

On Tue, Jan 21, 2014 at 11:47 AM, Sambit Tripathy (RBEI/PJ-NBS) 
<[email protected]<mailto:[email protected]>> wrote:
I have tried it but no luck. Got the same error again. Errcode: 28 - No space 
left on device i.e MySQL server is running out of space. Looks like the 
temporary table created also takes up a lot of storage space.

1. I created a temp table

sqoop eval  --connect jdbc:mysql://10.xx.xx.xx:3306/db_name --username user1 
--password ******  --query "CREATE TEMPORARY TABLE IF NOT EXISTS 
LIFECYCLE_DAT_TEMP AS (SELECT * FROM VIEW_JOIN)"

where VIEW_JOIN --> the view that contains the JOINs

2.  Then the import command

sqoop import --connect jdbc:mysql://10.xx.xx.xx:3306/db_name --table 
LIFECYCLE_DAT_TEMP --username user1 --password ******* --split-by timestamp

@ Jarcec: Is this what you have suggested?

Cloudera is providing the Teradata Connector for Sqoop which does the same 
thing as default but could not find anything for MySQL.

Regards,
Sambit

-----Original Message-----
From: Sambit Tripathy (RBEI/PJ-NBS) 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Saturday, January 18, 2014 12:30 AM
To: [email protected]<mailto:[email protected]>
Subject: RE: Joins in Sqoop

That sounds good and in the past I have used temp tables for some other stuff 
and they work. I will try it out and post my observations.

-----Original Message-----
From: Jarek Jarcec Cecho [mailto:[email protected]<mailto:[email protected]>]
Sent: Friday, January 17, 2014 10:25 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: Joins in Sqoop

Large joins are better performed on the database side and stored in temporary 
table (CREATE TABLE temp_tbl AS SELECT ...) that can be subsequently imported 
by Sqoop without creating large temp files.

Jarcec

On Thu, Jan 16, 2014 at 09:20:04PM +0800, Sambit Tripathy (RBEI/PJ-NBS) wrote:
> Hi,
>
> I have written query which has 5 Join clauses and I am passing this query in 
> Sqoop import.
>
> Problem: This produces a large temp file in the MySQL server temp directory 
> and throws back an error saying No Space left on the device. Yes this can be 
> fixed by increasing the size of the temp directory in the MySQL server, but 
> what if you actually don't have any space left on MySQL server. Are there any 
> workarounds for this? I mean something like a batch import which does not 
> create a big temp file in the server.
>
>
> Regards,
> Sambit.
>

RE: Joins in Sqoop

Reply via email to