Fwiw - here is are some changes that a friend said should make C*'s Hadoop support work with CDH4 - for ColumnFamilyRecordReader. https://gist.github.com/jeromatron/4967799
On Feb 16, 2013, at 8:23 AM, Edward Capriolo <edlinuxg...@gmail.com> wrote: > Here is the deal. > > http://wiki.apache.org/hadoop/Defining%20Hadoop > > INAPPROPRIATE: Automotive Joe's Crankshaft: 100% compatible with Hadoop > > Bad, because "100% compatible" is a meaningless statement. Even Apache > releases have regressions; cases were versions are incompatible *even > when the Java interfaces don't change*. A statement about > compatibility ought to be qualilified "Certified by Joe's brother Bob > as 100% compatible with Apache Hadoop(TM)". In the US, the marketing > team may be able to get way with the "100% compatible" claim, but in > some EU countries, sticking that statement up your web site is a claim > that residents can demand the vendor justifies, or take it down. > > So as a result, if you are running something NOT apache hadoop, CDH, > DSE, or whatever they are NOT compatible with hadoop or each other by > definition. > > Anyway, I have been using hadoop for years, and its biggest problem is > that it has never become happy with its own codebase. Old api, new > api, jobtracker, yarn, all these thing change, there is really no > upgrade/downgrade path because there are so many branches etc.Open > source products move swiftly and end users are normally left holding > the ball in figuring it our how to do it sanely. With Cassandra + > Hadoop it is "double trouble". > > All that being said I think it is unrealistic to count on vendors 100% > to solve your problems. If something throws you and exception like.. > > org.apache.cassandra.hadoop.ConfigHelper.setRpcPort(Lorg/apache/hadoop/conf/Configuration;Ljava/lang/String;)V > > Guess what? It is time to get out your compiler. > > On Sat, Feb 16, 2013 at 3:39 AM, Yang Song <xfil...@gmail.com> wrote: >> Thanks Michael. I attached the reply I got back from CDH4 user group from >> Harsh. Hope to share the experience. >> " >> In CDH4, the MR1 and MR2 APIs are both fully compatible (such that >> moving to YARN in future would require no recompilation from MR1 >> produced jars). You can consider it "2.0" API in binary form, and not >> 0.20 exactly (i.e. its not backwards compatible with CDH3). >> >> Cassandra is distributing binaries built on MR1 (Apache Hadoop 1, >> CDH3, etc.), which wouldn't work on your CDH4 platform. You will have >> to recompile against the proper platform to get binary-compatible >> jars/etc.." >> >> Interesting. Has anyone have issue with CDH4 with the newly released C* >> 1.21? >> >> Thanks >> >> 2013/2/15 Michael Kjellman <mkjell...@barracuda.com> >>> >>> Sorry. I meant to say even though there *wasnt* a major change between >>> 1.0.x and 0.22. The big change was 0.20 to 0.22. Sorry for the confusion. >>> >>> On Feb 15, 2013, at 9:53 PM, "Michael Kjellman" <mkjell...@barracuda.com> >>> wrote: >>> >>> There were pretty big changes in Hadoop between 0.20 and 0.22 (which is >>> now known as 1.0.x) even though there were major change between 0.22 and >>> 1.0.x. Cloudera hadn't yet upgraded to 0.22 which uses the new map reduce >>> framework instead of the old mapred API. I don't see the C* project back >>> porting their code at this time and if anything Cloudera should update their >>> release!! >>> >>> On Feb 15, 2013, at 9:48 PM, "Yang Song" <xfil...@gmail.com> wrote: >>> >>> It is interesting though. I am using CDH4 which contains hadoop 0.20, and >>> I am using Cassandra 1.20. >>> The previous mentioned errors still occur. Any suggestions? Thanks. >>> >>> 2013/2/15 Michael Kjellman <mkjell...@barracuda.com> >>>> >>>> That bug is kinda wrong though. 1.0.x is current for like a year now and >>>> C* works great with it :) >>>> >>>> On Feb 15, 2013, at 7:38 PM, "Dave Brosius" <dbros...@mebigfatguy.com> >>>> wrote: >>>> >>>> see https://issues.apache.org/jira/browse/CASSANDRA-5201 >>>> >>>> >>>> On 02/15/2013 10:05 PM, Yang Song wrote: >>>> >>>> Hi, >>>> >>>> Does anyone use CDH4's Hadoop with Cassandra to interact? The goal is >>>> simply read/write to Cassandra from Hadoop direclty using >>>> ColumnFamilyInput(Output)Format, but seems a bit compatibility issue. There >>>> are two java exceptions >>>> >>>> 1. java.lang.IncompatibleClassChangeError: Found interface >>>> org.apache.hadoop.mapreduce.JobContext, but class was expected >>>> This shows when I run hadoop jar file to read directly from Cassandra. >>>> Seems that there is a change on Hadoop that JobContext was changed from >>>> class to interface. Has anyone have similar issue? >>>> Does it mean the Hadoop version in CDH4 is old? >>>> >>>> 2. Another error is java.lang.NoSuchMethodError: >>>> org.apache.cassandra.hadoop.ConfigHelper.setRpcPort(Lorg/apache/hadoop/conf/Configuration;Ljava/lang/String;)V >>>> This shows when the jar file contains rpc port for remote Cassandra >>>> cluster. >>>> >>>> Does anyone have similiar experience? Any comments are welcome. thanks! >>>> >>>> >>> >>