Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

roni Wed, 25 Mar 2015 14:56:02 -0700

Is there any way that I can install the new one and remove previous version.
I installed spark 1.3 on my EC2 master and set teh spark home to the new
one.
But when I start teh spark-shell I get -
 java.lang.UnsatisfiedLinkError:
org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V
    at
org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative(Native
Method)


Is There no way to upgrade without creating new cluster?
Thanks
Roni



On Wed, Mar 25, 2015 at 1:18 PM, Dean Wampler <deanwamp...@gmail.com> wrote:

> Yes, that's the problem. The RDD class exists in both binary jar files,
> but the signatures probably don't match. The bottom line, as always for
> tools like this, is that you can't mix versions.
>
> Dean Wampler, Ph.D.
> Author: Programming Scala, 2nd Edition
> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
> Typesafe <http://typesafe.com>
> @deanwampler <http://twitter.com/deanwampler>
> http://polyglotprogramming.com
>
> On Wed, Mar 25, 2015 at 3:13 PM, roni <roni.epi...@gmail.com> wrote:
>
>> My cluster is still on spark 1.2 and in SBT I am using 1.3.
>> So probably it is compiling with 1.3 but running with 1.2 ?
>>
>> On Wed, Mar 25, 2015 at 12:34 PM, Dean Wampler <deanwamp...@gmail.com>
>> wrote:
>>
>>> Weird. Are you running using SBT console? It should have the spark-core
>>> jar on the classpath. Similarly, spark-shell or spark-submit should work,
>>> but be sure you're using the same version of Spark when running as when
>>> compiling. Also, you might need to add spark-sql to your SBT dependencies,
>>> but that shouldn't be this issue.
>>>
>>> Dean Wampler, Ph.D.
>>> Author: Programming Scala, 2nd Edition
>>> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
>>> Typesafe <http://typesafe.com>
>>> @deanwampler <http://twitter.com/deanwampler>
>>> http://polyglotprogramming.com
>>>
>>> On Wed, Mar 25, 2015 at 12:09 PM, roni <roni.epi...@gmail.com> wrote:
>>>
>>>> Thanks Dean and Nick.
>>>> So, I removed the ADAM and H2o from my SBT as I was not using them.
>>>> I got the code to compile  - only for fail while running with -
>>>> SparkContext: Created broadcast 1 from textFile at
>>>> kmerIntersetion.scala:21
>>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>>> org/apache/spark/rdd/RDD$
>>>>         at preDefKmerIntersection$.main(kmerIntersetion.scala:26)
>>>>
>>>> This line is where I do a "JOIN" operation.
>>>> val hgPair = hgfasta.map(_.split (",")).map(a=> (a(0),
>>>> a(1).trim().toInt))
>>>>                  val filtered = hgPair.filter(kv => kv._2 == 1)
>>>>                  val bedPair = bedFile.map(_.split (",")).map(a=>
>>>> (a(0), a(1).trim().toInt))
>>>>             *     val joinRDD = bedPair.join(filtered)   *
>>>> Any idea whats going on?
>>>> I have data on the EC2 so I am avoiding creating the new cluster , but
>>>> just upgrading and changing the code to use 1.3 and Spark SQL
>>>> Thanks
>>>> Roni
>>>>
>>>>
>>>>
>>>> On Wed, Mar 25, 2015 at 9:50 AM, Dean Wampler <deanwamp...@gmail.com>
>>>> wrote:
>>>>
>>>>> For the Spark SQL parts, 1.3 breaks backwards compatibility, because
>>>>> before 1.3, Spark SQL was considered experimental where API changes were
>>>>> allowed.
>>>>>
>>>>> So, H2O and ADA compatible with 1.2.X might not work with 1.3.
>>>>>
>>>>> dean
>>>>>
>>>>> Dean Wampler, Ph.D.
>>>>> Author: Programming Scala, 2nd Edition
>>>>> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
>>>>> Typesafe <http://typesafe.com>
>>>>> @deanwampler <http://twitter.com/deanwampler>
>>>>> http://polyglotprogramming.com
>>>>>
>>>>> On Wed, Mar 25, 2015 at 9:39 AM, roni <roni.epi...@gmail.com> wrote:
>>>>>
>>>>>> Even if H2o and ADA are dependent on 1.2.1 , it should be backword
>>>>>> compatible, right?
>>>>>> So using 1.3 should not break them.
>>>>>> And the code is not using the classes from those libs.
>>>>>> I tried sbt clean compile .. same errror
>>>>>> Thanks
>>>>>> _R
>>>>>>
>>>>>> On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath <
>>>>>> nick.pentre...@gmail.com> wrote:
>>>>>>
>>>>>>> What version of Spark do the other dependencies rely on (Adam and
>>>>>>> H2O?) - that could be it
>>>>>>>
>>>>>>> Or try sbt clean compile
>>>>>>>
>>>>>>> —
>>>>>>> Sent from Mailbox <https://www.dropbox.com/mailbox>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 25, 2015 at 5:58 PM, roni <roni.epi...@gmail.com> wrote:
>>>>>>>
>>>>>>>>     I have a EC2 cluster created using spark version 1.2.1.
>>>>>>>> And I have a SBT project .
>>>>>>>> Now I want to upgrade to spark 1.3 and use the new features.
>>>>>>>> Below are issues .
>>>>>>>> Sorry for the long post.
>>>>>>>> Appreciate your help.
>>>>>>>> Thanks
>>>>>>>> -Roni
>>>>>>>>
>>>>>>>> Question - Do I have to create a new cluster using spark 1.3?
>>>>>>>>
>>>>>>>> Here is what I did -
>>>>>>>>
>>>>>>>> In my SBT file I  changed to -
>>>>>>>> libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.0"
>>>>>>>>
>>>>>>>> But then I started getting compilation error. along with
>>>>>>>> Here are some of the libraries that were evicted:
>>>>>>>> [warn]  * org.apache.spark:spark-core_2.10:1.2.0 -> 1.3.0
>>>>>>>> [warn]  * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0)
>>>>>>>> -> 2.6.0
>>>>>>>> [warn] Run 'evicted' to see detailed eviction warnings
>>>>>>>>
>>>>>>>>  constructor cannot be instantiated to expected type;
>>>>>>>> [error]  found   : (T1, T2)
>>>>>>>> [error]  required: org.apache.spark.sql.catalyst.expressions.Row
>>>>>>>> [error]                                 val ty =
>>>>>>>> joinRDD.map{case(word, (file1Counts, file2Counts)) => 
>>>>>>>> KmerIntesect(word,
>>>>>>>> file1Counts,"xyz")}
>>>>>>>> [error]                                                          ^
>>>>>>>>
>>>>>>>> Here is my SBT and code --
>>>>>>>> SBT -
>>>>>>>>
>>>>>>>> version := "1.0"
>>>>>>>>
>>>>>>>> scalaVersion := "2.10.4"
>>>>>>>>
>>>>>>>> resolvers += "Sonatype OSS Snapshots" at "
>>>>>>>> https://oss.sonatype.org/content/repositories/snapshots";;
>>>>>>>> resolvers += "Maven Repo1" at "https://repo1.maven.org/maven2";;
>>>>>>>> resolvers += "Maven Repo" at "
>>>>>>>> https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/
>>>>>>>> ";
>>>>>>>>
>>>>>>>> /* Dependencies - %% appends Scala version to artifactId */
>>>>>>>> libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
>>>>>>>> "2.6.0"
>>>>>>>> libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.0"
>>>>>>>> libraryDependencies += "org.bdgenomics.adam" % "adam-core" %
>>>>>>>> "0.16.0"
>>>>>>>> libraryDependencies += "ai.h2o" % "sparkling-water-core_2.10" %
>>>>>>>> "0.2.10"
>>>>>>>>
>>>>>>>>
>>>>>>>> CODE --
>>>>>>>> import org.apache.spark.{SparkConf, SparkContext}
>>>>>>>> case class KmerIntesect(kmer: String, kCount: Int, fileName: String)
>>>>>>>>
>>>>>>>> object preDefKmerIntersection {
>>>>>>>>   def main(args: Array[String]) {
>>>>>>>>
>>>>>>>>  val sparkConf = new SparkConf().setAppName("preDefKmer-intersect")
>>>>>>>>      val sc = new SparkContext(sparkConf)
>>>>>>>>         import sqlContext.createSchemaRDD
>>>>>>>>         val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>>>>>>>>             val bedFile = sc.textFile("s3n://a/b/c",40)
>>>>>>>>              val hgfasta = sc.textFile("hdfs://a/b/c",40)
>>>>>>>>                  val hgPair = hgfasta.map(_.split (",")).map(a=>
>>>>>>>> (a(0), a(1).trim().toInt))
>>>>>>>>                  val filtered = hgPair.filter(kv => kv._2 == 1)
>>>>>>>>                  val bedPair = bedFile.map(_.split (",")).map(a=>
>>>>>>>> (a(0), a(1).trim().toInt))
>>>>>>>>                  val joinRDD = bedPair.join(filtered)
>>>>>>>>                 val ty = joinRDD.map{case(word, (file1Counts,
>>>>>>>> file2Counts)) => KmerIntesect(word, file1Counts,"xyz")}
>>>>>>>>                 ty.registerTempTable("KmerIntesect")
>>>>>>>>
>>>>>>>> ty.saveAsParquetFile("hdfs://x/y/z/kmerIntersect.parquet")
>>>>>>>>   }
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

Reply via email to