Folks,
So I'm working on https://issues.apache.org/jira/browse/SQOOP-384,
trying to make sqoop backwards compatible with apache hadoop 0.20.x
clusters.
When I build sqoop, ivy downloads a bunch of jars, including a bunch of
hadoop 0.23 snapshot jars:
hadoop-annotations-0.23.0-SNAPSHOT.jar
hadoop-auth-0.23.0-SNAPSHOT.jar
hadoop-common-0.23.0-SNAPSHOT.jar
hadoop-common-0.23.0-SNAPSHOT-tests.jar
hadoop-hdfs-0.23.0-SNAPSHOT.jar
hadoop-hdfs-0.23.0-SNAPSHOT-tests.jar
hadoop-mapreduce-client-common-0.23.0-SNAPSHOT.jar
hadoop-mapreduce-client-core-0.23.0-SNAPSHOT.jar
There is a binary incompatibility around JobContext. It turned into an
interface in 0.23. I get this stack trace when I run my own built
version of sqoop against either CDH3 or apache hadoop 0.20:
Exception in thread "main" java.lang.IncompatibleClassChangeError: Found
class org.apache.hadoop.mapreduce.JobContext, but interface was expected
at
org.apache.sqoop.config.ConfigurationHelper.getJobNumMaps(ConfigurationHelper.java:49)
at
com.cloudera.sqoop.config.ConfigurationHelper.getJobNumMaps(ConfigurationHelper.java:37)
at
org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.getSplits(DataDrivenDBInputFormat.java:120)
at
org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:944)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:961)
at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:880)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:476)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:506)
at
org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:119)
at
org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:179)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:413)
at
org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:97)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:380)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453)
at org.apache.sqoop.Sqoop.run(Sqoop.java:146)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:182)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:221)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:230)
at org.apache.sqoop.Sqoop.main(Sqoop.java:239)
This is due to hadoop 0.23 breaking binary compatibility with the prior
versions. From the web, "If you publish a public library, you should
avoid making incompatible binary changes as much as possible to preserve
what's known as "binary backward compatibility". Updating dependency
jars alone ideally shouldn't break the application or the build."
This is going to be a problem, because if we ship a jar built against
hadoop 0.23.x, it won't run against anything that doesn't have
JobContext as an interface. Perhaps the best solution is to rename the
JobContext interface in hadoop? Not sure what will break if that
happens, but at least it won't be a runtime error. More ideas?
--- wad