Re: Spark 1.1 / cdh4 stuck using old hadoop client?

Christian Chua Tue, 16 Sep 2014 00:39:13 -0700

Is 1.0.8 working for you ?

You indicated your last known good version is 1.0.0


Maybe we can track down where it broke. 



> On Sep 16, 2014, at 12:25 AM, Paul Wais <pw...@yelp.com> wrote:
> 
> Thanks Christian!  I tried compiling from source but am still getting the 
> same hadoop client version error when reading from HDFS.  Will have to poke 
> deeper... perhaps I've got some classpath issues.  FWIW I compiled using:
> 
> $ MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m" mvn 
> -Phadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests clean package 
> 
> and hadoop 2.3 / cdh5 from 
> http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.3.0-cdh5.0.0.tar.gz
> 
> 
> 
> 
> 
>> On Mon, Sep 15, 2014 at 6:49 PM, Christian Chua <cc8...@icloud.com> wrote:
>> Hi Paul.
>> 
>> I would recommend building your own 1.1.0 distribution.
>> 
>>      ./make-distribution.sh --name hadoop-personal-build-2.4 --tgz -Pyarn 
>> -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests
>> 
>> 
>> 
>> I downloaded the "Pre-build for Hadoop 2.4" binary, and it had this strange 
>> behavior where
>> 
>>      spark-submit --master yarn-cluster ...
>> 
>> will work, but
>> 
>>      spark-submit --master yarn-client ...
>> 
>> will fail.
>> 
>> 
>> But on the personal build obtained from the command above, both will then 
>> work.
>> 
>> 
>> -Christian
>> 
>> 
>> 
>> 
>>> On Sep 15, 2014, at 6:28 PM, Paul Wais <pw...@yelp.com> wrote:
>>> 
>>> Dear List,
>>> 
>>> I'm having trouble getting Spark 1.1 to use the Hadoop 2 API for
>>> reading SequenceFiles.  In particular, I'm seeing:
>>> 
>>> Exception in thread "main" org.apache.hadoop.ipc.RemoteException:
>>> Server IPC version 7 cannot communicate with client version 4
>>>        at org.apache.hadoop.ipc.Client.call(Client.java:1070)
>>>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>>>        at com.sun.proxy.$Proxy7.getProtocolVersion(Unknown Source)
>>>        ...
>>> 
>>> When invoking JavaSparkContext#newAPIHadoopFile().  (With args
>>> validSequenceFileURI, SequenceFileInputFormat.class, Text.class,
>>> BytesWritable.class, new Job().getConfiguration() -- Pretty close to
>>> the unit test here:
>>> https://github.com/apache/spark/blob/f0f1ba09b195f23f0c89af6fa040c9e01dfa8951/core/src/test/java/org/apache/spark/JavaAPISuite.java#L916
>>> )
>>> 
>>> 
>>> This error indicates to me that Spark is using an old hadoop client to
>>> do reads.  Oddly I'm able to do /writes/ ok, i.e. I'm able to write
>>> via JavaPairRdd#saveAsNewAPIHadoopFile() to my hdfs cluster.
>>> 
>>> 
>>> Do I need to explicitly build spark for modern hadoop??  I previously
>>> had an hdfs cluster running hadoop 2.3.0 and I was getting a similar
>>> error (server is using version 9, client is using version 4).
>>> 
>>> 
>>> I'm using Spark 1.1 cdh4 as well as hadoop cdh4 from the links posted
>>> on spark's site:
>>> * http://d3kbcqa49mib13.cloudfront.net/spark-1.1.0-bin-cdh4.tgz
>>> * http://d3kbcqa49mib13.cloudfront.net/hadoop-2.0.0-cdh4.2.0.tar.gz
>>> 
>>> 
>>> What distro of hadoop is used at Data Bricks?  Are there distros of
>>> Spark 1.1 and hadoop that should work together out-of-the-box?
>>> (Previously I had Spark 1.0.0 and Hadoop 2.3 working fine..)
>>> 
>>> Thanks for any help anybody can give me here!
>>> -Paul
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>

Re: Spark 1.1 / cdh4 stuck using old hadoop client?

Reply via email to