Re: exporting data from sequence files back into an RDBMS

Eric Hernandez Wed, 24 Jul 2013 18:12:18 -0700

Hi Jarcec,

If sequence files are not supported for sqoop import does that mean its not 
supported for sqoop export which was my original question?


A little background on my use case.

I inherited a hadoop project from another team and they have been moving data 
from SQL Server to hive using sqoop and now I need to get some of it back out 
to SQL Server for simplicity I am just trying to get it into MySQL first. My 
test data "tablea" is a tiny table with 1000 rows. My production data is over a 
terabyte and its all stored as a sequence file in hive.

To answer your question this is how I believe my predecessors got the data into 
hadoop.

Initial Import:
/usr/bin/sqoop import -D mapred.task.timeout=0 --connect 
"jdbc:sqlserver://sqlSERVERIP;database=somedb;user=sanitized;password=sanitized"
 --table tableA_201212 --where "DateOccurred >= convert(datetime, '20121201', 
112) and DateOccurred < convert(datetime, '20130101', 112)" -m 1 
--fields-terminated-by '\001' --target-dir 
/hiveexternal/dbo_tableA_2012_12_sqooped

 create external table dbo_tableA_2012_12_sqooped (
  `LogId` bigint,
  `BucketName` string,
  `YearMonth` string,
  `CalendarDate` string,
  `DateOccurred` string,
  `DateRecorded` string,
  `event_type_key` string,
  `track_key` string,
  `space_key` string,
  `page_id` int,
  `promo_id` int,
  `promo_sub_code` string,
  `event_log_id` int,
  `event_type_id` int,
  `track_id` int,
  `space_id` int,
  `ETLLoadDate` string
 )
 location '/hiveexternal/dbo_tableA_2012_12_sqooped';

Final table:
CREATE TABLE dbo_tableA STORED AS SEQUENCEFILE AS SELECT * FROM 
dbo_tableA_2012_12_sqooped;

Clean up:
DROP TABLE dbo_tableA_2012_12_sqooped;

Thanks,
-Eric


On Jul 24, 2013, at 5:29 PM, Jarek Jarcec Cecho 
<[email protected]<mailto:[email protected]>> wrote:

Hi Eric,
would you mind sharing with us your entire data flow? Starting with the exact 
Sqoop import command, Hive transformations if you are doing any and finally 
with the Sqoop export command?

Importing data into Hive using the SequenceFile format is not supported by 
Sqoop, so I would like to make sure that we are understanding you use case 
correctly.

Jarcec

On Wed, Jul 24, 2013 at 05:17:30PM -0700, Eric Hernandez wrote:
Here are my logs

sqoop export --connect 'jdbc:mysql://mysqlServer:3306/hadoop' --username=hadoop 
-P --table=dbo_tablea --export-dir /hive/dbo_tablea -m 1 
--input-fields-terminated-by  '\001'
Enter password:
13/07/24 17:07:58 INFO manager.MySQLManager: Preparing to use a MySQL streaming 
resultset.
13/07/24 17:07:58 INFO tool.CodeGenTool: Beginning code generation
13/07/24 17:07:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* 
FROM `dbo_tablea` AS t LIMIT 1
13/07/24 17:07:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* 
FROM `dbo_tablea` AS t LIMIT 1
13/07/24 17:07:58 INFO orm.CompilationManager: HADOOP_HOME is /usr/lib/hadoop
Note: /tmp/sqoop-erich/compile/5287b2ea7807ccef31ae33420fbbb7a0/dbo_tablea.java 
uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
13/07/24 17:08:00 INFO orm.CompilationManager: Writing jar file: 
/tmp/sqoop-erich/compile/5287b2ea7807ccef31ae33420fbbb7a0/dbo_tablea.jar
13/07/24 17:08:00 INFO mapreduce.ExportJobBase: Beginning export of dbo_tablea
13/07/24 17:08:02 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
the arguments. Applications should implement Tool for the same.
13/07/24 17:08:02 INFO input.FileInputFormat: Total input paths to process : 1
13/07/24 17:08:02 INFO input.FileInputFormat: Total input paths to process : 1
13/07/24 17:08:03 INFO mapred.JobClient: Running job: job_201302261137_303267
13/07/24 17:08:04 INFO mapred.JobClient:  map 0% reduce 0%
13/07/24 17:08:20 INFO mapred.JobClient: Task Id : 
attempt_201302261137_303267_m_000000_0, Status : FAILED
java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast 
to org.apache.hadoop.io.LongWritable
at 
org.apache.sqoop.mapreduce.CombineShimRecordReader.getCurrentKey(CombineShimRecordReader.java:95)
at 
org.apache.sqoop.mapreduce.CombineShimRecordReader.getCurrentKey(CombineShimRecordReader.java:38)
at 
org.apache.sqoop.mapreduce.CombineFileRecordReader.getCurrentKey(CombineFileRecordReader.java:77)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getCurrentKey(MapTask.java:436)
at 
org.apache.hadoop.mapreduce.task.MapContextImpl.getCurrentKey(MapContextImpl.java:66)
at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.getCurrentKey(WrappedMapper.java:75)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at 
org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:182)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.Child$4.run(Child.ja
13/07/24 17:08:30 INFO mapred.JobClient: Task Id : 
attempt_201302261137_303267_m_000000_1, Status : FAILED
java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast 
to org.apache.hadoop.io.LongWritable
at 
org.apache.sqoop.mapreduce.CombineShimRecordReader.getCurrentKey(CombineShimRecordReader.java:95)
at 
org.apache.sqoop.mapreduce.CombineShimRecordReader.getCurrentKey(CombineShimRecordReader.java:38)
at 
org.apache.sqoop.mapreduce.CombineFileRecordReader.getCurrentKey(CombineFileRecordReader.java:77)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getCurrentKey(MapTask.java:436)
at 
org.apache.hadoop.mapreduce.task.MapContextImpl.getCurrentKey(MapContextImpl.java:66)
at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.getCurrentKey(WrappedMapper.java:75)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at 
org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:182)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.Child$4.run(Child.ja
13/07/24 17:08:38 INFO mapred.JobClient: Task Id : 
attempt_201302261137_303267_m_000000_2, Status : FAILED
java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast 
to org.apache.hadoop.io.LongWritable
at 
org.apache.sqoop.mapreduce.CombineShimRecordReader.getCurrentKey(CombineShimRecordReader.java:95)
at 
org.apache.sqoop.mapreduce.CombineShimRecordReader.getCurrentKey(CombineShimRecordReader.java:38)
at 
org.apache.sqoop.mapreduce.CombineFileRecordReader.getCurrentKey(CombineFileRecordReader.java:77)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getCurrentKey(MapTask.java:436)
at 
org.apache.hadoop.mapreduce.task.MapContextImpl.getCurrentKey(MapContextImpl.java:66)
at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.getCurrentKey(WrappedMapper.java:75)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at 
org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:182)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.Child$4.run(Child.ja
13/07/24 17:08:48 INFO mapred.JobClient: Job complete: job_201302261137_303267
13/07/24 17:08:48 INFO mapred.JobClient: Counters: 8
13/07/24 17:08:48 INFO mapred.JobClient:   Job Counters
13/07/24 17:08:48 INFO mapred.JobClient:     Failed map tasks=1
13/07/24 17:08:48 INFO mapred.JobClient:     Launched map tasks=4
13/07/24 17:08:48 INFO mapred.JobClient:     Data-local map tasks=1
13/07/24 17:08:48 INFO mapred.JobClient:     Rack-local map tasks=2
13/07/24 17:08:48 INFO mapred.JobClient:     Total time spent by all maps in 
occupied slots (ms)=30893
13/07/24 17:08:48 INFO mapred.JobClient:     Total time spent by all reduces in 
occupied slots (ms)=0
13/07/24 17:08:48 INFO mapred.JobClient:     Total time spent by all maps 
waiting after reserving slots (ms)=0
13/07/24 17:08:48 INFO mapred.JobClient:     Total time spent by all reduces 
waiting after reserving slots (ms)=0
13/07/24 17:08:48 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 46.3496 
seconds (0 bytes/sec)
13/07/24 17:08:48 WARN mapreduce.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead
13/07/24 17:08:48 INFO mapreduce.ExportJobBase: Exported 0 records.
13/07/24 17:08:48 ERROR tool.ExportTool: Error during export: Export job failed!





On Jul 24, 2013, at 4:19 PM, Eric  wrote:

Yes I can get the logs but, first I am going to have to mock it up in my lab 
with some dummy data and credentials.  I should be able to provide full logs 
tomorrow.

My darn signature leaked out on my last reply. If anybody can scrub my last 
post and remove my signature that would be awesome.

Thanks,
-Eric


On Jul 24, 2013, at 3:51 PM, Abraham   wrote:

Eric,

The middle command seems right. Could you provide the rest of your logs? It 
will help us understand where in the process sqoop fails.

-Abe



I have tried many different variations all with the same result

sqoop export --connect 'jdbc:mysql://mysqlIP:3306/hadoop' --username=hadoop 
--password='sanitized' --table=tableA --export-dir /hive/tableA -m 1 
--fields-terminated-by '\001'

sqoop export --connect 'jdbc:mysql://mysqlIP:3306/hadoop' --username=hadoop 
--password='sanitized' --table=tableA --export-dir /hive/tableA -m 1 
--input-fields-terminated-by  '\001'

sqoop export --connect 'jdbc:mysql://mysqlIP:3306/hadoop' --username=hadoop 
--password='sanitized' --table=tableA --export-dir /hive/tableA -m 1





Hey Eric,

I believe its possible. Can you provide the command you are using?

-Abe


On Wed, Jul 24, 2013 at 2:54 PM, Eric Hernandez  wrote:
Hi,
Is it possible to sqoop data out of hive back into an RDBMS like MyQL or SQL 
Server when it has been imported via sqoop as a sequence file?

I have been trying all day to get data back out of hive and I keep getting this 
error no matter what I try

"java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be 
cast to org.apache.hadoop.io.LongWritable"

I am using Sqoop 1.4.1-cdh4.1.2

Thanks,

Eric H

Re: exporting data from sequence files back into an RDBMS

Reply via email to