Hi Jarcec, If sequence files are not supported for sqoop import does that mean its not supported for sqoop export which was my original question?
A little background on my use case. I inherited a hadoop project from another team and they have been moving data from SQL Server to hive using sqoop and now I need to get some of it back out to SQL Server for simplicity I am just trying to get it into MySQL first. My test data "tablea" is a tiny table with 1000 rows. My production data is over a terabyte and its all stored as a sequence file in hive. To answer your question this is how I believe my predecessors got the data into hadoop. Initial Import: /usr/bin/sqoop import -D mapred.task.timeout=0 --connect "jdbc:sqlserver://sqlSERVERIP;database=somedb;user=sanitized;password=sanitized" --table tableA_201212 --where "DateOccurred >= convert(datetime, '20121201', 112) and DateOccurred < convert(datetime, '20130101', 112)" -m 1 --fields-terminated-by '\001' --target-dir /hiveexternal/dbo_tableA_2012_12_sqooped create external table dbo_tableA_2012_12_sqooped ( `LogId` bigint, `BucketName` string, `YearMonth` string, `CalendarDate` string, `DateOccurred` string, `DateRecorded` string, `event_type_key` string, `track_key` string, `space_key` string, `page_id` int, `promo_id` int, `promo_sub_code` string, `event_log_id` int, `event_type_id` int, `track_id` int, `space_id` int, `ETLLoadDate` string ) location '/hiveexternal/dbo_tableA_2012_12_sqooped'; Final table: CREATE TABLE dbo_tableA STORED AS SEQUENCEFILE AS SELECT * FROM dbo_tableA_2012_12_sqooped; Clean up: DROP TABLE dbo_tableA_2012_12_sqooped; Thanks, -Eric On Jul 24, 2013, at 5:29 PM, Jarek Jarcec Cecho <[email protected]<mailto:[email protected]>> wrote: Hi Eric, would you mind sharing with us your entire data flow? Starting with the exact Sqoop import command, Hive transformations if you are doing any and finally with the Sqoop export command? Importing data into Hive using the SequenceFile format is not supported by Sqoop, so I would like to make sure that we are understanding you use case correctly. Jarcec On Wed, Jul 24, 2013 at 05:17:30PM -0700, Eric Hernandez wrote: Here are my logs sqoop export --connect 'jdbc:mysql://mysqlServer:3306/hadoop' --username=hadoop -P --table=dbo_tablea --export-dir /hive/dbo_tablea -m 1 --input-fields-terminated-by '\001' Enter password: 13/07/24 17:07:58 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 13/07/24 17:07:58 INFO tool.CodeGenTool: Beginning code generation 13/07/24 17:07:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dbo_tablea` AS t LIMIT 1 13/07/24 17:07:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dbo_tablea` AS t LIMIT 1 13/07/24 17:07:58 INFO orm.CompilationManager: HADOOP_HOME is /usr/lib/hadoop Note: /tmp/sqoop-erich/compile/5287b2ea7807ccef31ae33420fbbb7a0/dbo_tablea.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 13/07/24 17:08:00 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-erich/compile/5287b2ea7807ccef31ae33420fbbb7a0/dbo_tablea.jar 13/07/24 17:08:00 INFO mapreduce.ExportJobBase: Beginning export of dbo_tablea 13/07/24 17:08:02 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/07/24 17:08:02 INFO input.FileInputFormat: Total input paths to process : 1 13/07/24 17:08:02 INFO input.FileInputFormat: Total input paths to process : 1 13/07/24 17:08:03 INFO mapred.JobClient: Running job: job_201302261137_303267 13/07/24 17:08:04 INFO mapred.JobClient: map 0% reduce 0% 13/07/24 17:08:20 INFO mapred.JobClient: Task Id : attempt_201302261137_303267_m_000000_0, Status : FAILED java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast to org.apache.hadoop.io.LongWritable at org.apache.sqoop.mapreduce.CombineShimRecordReader.getCurrentKey(CombineShimRecordReader.java:95) at org.apache.sqoop.mapreduce.CombineShimRecordReader.getCurrentKey(CombineShimRecordReader.java:38) at org.apache.sqoop.mapreduce.CombineFileRecordReader.getCurrentKey(CombineFileRecordReader.java:77) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getCurrentKey(MapTask.java:436) at org.apache.hadoop.mapreduce.task.MapContextImpl.getCurrentKey(MapContextImpl.java:66) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.getCurrentKey(WrappedMapper.java:75) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:182) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.ja 13/07/24 17:08:30 INFO mapred.JobClient: Task Id : attempt_201302261137_303267_m_000000_1, Status : FAILED java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast to org.apache.hadoop.io.LongWritable at org.apache.sqoop.mapreduce.CombineShimRecordReader.getCurrentKey(CombineShimRecordReader.java:95) at org.apache.sqoop.mapreduce.CombineShimRecordReader.getCurrentKey(CombineShimRecordReader.java:38) at org.apache.sqoop.mapreduce.CombineFileRecordReader.getCurrentKey(CombineFileRecordReader.java:77) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getCurrentKey(MapTask.java:436) at org.apache.hadoop.mapreduce.task.MapContextImpl.getCurrentKey(MapContextImpl.java:66) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.getCurrentKey(WrappedMapper.java:75) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:182) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.ja 13/07/24 17:08:38 INFO mapred.JobClient: Task Id : attempt_201302261137_303267_m_000000_2, Status : FAILED java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast to org.apache.hadoop.io.LongWritable at org.apache.sqoop.mapreduce.CombineShimRecordReader.getCurrentKey(CombineShimRecordReader.java:95) at org.apache.sqoop.mapreduce.CombineShimRecordReader.getCurrentKey(CombineShimRecordReader.java:38) at org.apache.sqoop.mapreduce.CombineFileRecordReader.getCurrentKey(CombineFileRecordReader.java:77) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getCurrentKey(MapTask.java:436) at org.apache.hadoop.mapreduce.task.MapContextImpl.getCurrentKey(MapContextImpl.java:66) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.getCurrentKey(WrappedMapper.java:75) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:182) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.ja 13/07/24 17:08:48 INFO mapred.JobClient: Job complete: job_201302261137_303267 13/07/24 17:08:48 INFO mapred.JobClient: Counters: 8 13/07/24 17:08:48 INFO mapred.JobClient: Job Counters 13/07/24 17:08:48 INFO mapred.JobClient: Failed map tasks=1 13/07/24 17:08:48 INFO mapred.JobClient: Launched map tasks=4 13/07/24 17:08:48 INFO mapred.JobClient: Data-local map tasks=1 13/07/24 17:08:48 INFO mapred.JobClient: Rack-local map tasks=2 13/07/24 17:08:48 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=30893 13/07/24 17:08:48 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0 13/07/24 17:08:48 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/07/24 17:08:48 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/07/24 17:08:48 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 46.3496 seconds (0 bytes/sec) 13/07/24 17:08:48 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 13/07/24 17:08:48 INFO mapreduce.ExportJobBase: Exported 0 records. 13/07/24 17:08:48 ERROR tool.ExportTool: Error during export: Export job failed! On Jul 24, 2013, at 4:19 PM, Eric wrote: Yes I can get the logs but, first I am going to have to mock it up in my lab with some dummy data and credentials. I should be able to provide full logs tomorrow. My darn signature leaked out on my last reply. If anybody can scrub my last post and remove my signature that would be awesome. Thanks, -Eric On Jul 24, 2013, at 3:51 PM, Abraham wrote: Eric, The middle command seems right. Could you provide the rest of your logs? It will help us understand where in the process sqoop fails. -Abe I have tried many different variations all with the same result sqoop export --connect 'jdbc:mysql://mysqlIP:3306/hadoop' --username=hadoop --password='sanitized' --table=tableA --export-dir /hive/tableA -m 1 --fields-terminated-by '\001' sqoop export --connect 'jdbc:mysql://mysqlIP:3306/hadoop' --username=hadoop --password='sanitized' --table=tableA --export-dir /hive/tableA -m 1 --input-fields-terminated-by '\001' sqoop export --connect 'jdbc:mysql://mysqlIP:3306/hadoop' --username=hadoop --password='sanitized' --table=tableA --export-dir /hive/tableA -m 1 Hey Eric, I believe its possible. Can you provide the command you are using? -Abe On Wed, Jul 24, 2013 at 2:54 PM, Eric Hernandez wrote: Hi, Is it possible to sqoop data out of hive back into an RDBMS like MyQL or SQL Server when it has been imported via sqoop as a sequence file? I have been trying all day to get data back out of hive and I keep getting this error no matter what I try "java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast to org.apache.hadoop.io.LongWritable" I am using Sqoop 1.4.1-cdh4.1.2 Thanks, Eric H
