Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

Dean Wampler Wed, 25 Mar 2015 19:46:50 -0700

You could stop the running the processes and run the same processes using
the new version, starting with the master and then the slaves. You would
have to snoop around a bit to get the command-line arguments right, but
it's doable. Use `ps -efw` to find the command-lines used. Be sure to rerun
them as the same user. Or look at what the EC2 scripts do.


Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typesafe.com>
@deanwampler <http://twitter.com/deanwampler>
http://polyglotprogramming.com

On Wed, Mar 25, 2015 at 4:54 PM, roni <[email protected]> wrote:

> Is there any way that I can install the new one and remove previous
> version.
> I installed spark 1.3 on my EC2 master and set teh spark home to the new
> one.
> But when I start teh spark-shell I get -
>  java.lang.UnsatisfiedLinkError:
> org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V
>     at
> org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative(Native
> Method)
>
> Is There no way to upgrade without creating new cluster?
> Thanks
> Roni
>
>
>
> On Wed, Mar 25, 2015 at 1:18 PM, Dean Wampler <[email protected]>
> wrote:
>
>> Yes, that's the problem. The RDD class exists in both binary jar files,
>> but the signatures probably don't match. The bottom line, as always for
>> tools like this, is that you can't mix versions.
>>
>> Dean Wampler, Ph.D.
>> Author: Programming Scala, 2nd Edition
>> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
>> Typesafe <http://typesafe.com>
>> @deanwampler <http://twitter.com/deanwampler>
>> http://polyglotprogramming.com
>>
>> On Wed, Mar 25, 2015 at 3:13 PM, roni <[email protected]> wrote:
>>
>>> My cluster is still on spark 1.2 and in SBT I am using 1.3.
>>> So probably it is compiling with 1.3 but running with 1.2 ?
>>>
>>> On Wed, Mar 25, 2015 at 12:34 PM, Dean Wampler <[email protected]>
>>> wrote:
>>>
>>>> Weird. Are you running using SBT console? It should have the spark-core
>>>> jar on the classpath. Similarly, spark-shell or spark-submit should work,
>>>> but be sure you're using the same version of Spark when running as when
>>>> compiling. Also, you might need to add spark-sql to your SBT dependencies,
>>>> but that shouldn't be this issue.
>>>>
>>>> Dean Wampler, Ph.D.
>>>> Author: Programming Scala, 2nd Edition
>>>> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
>>>> Typesafe <http://typesafe.com>
>>>> @deanwampler <http://twitter.com/deanwampler>
>>>> http://polyglotprogramming.com
>>>>
>>>> On Wed, Mar 25, 2015 at 12:09 PM, roni <[email protected]> wrote:
>>>>
>>>>> Thanks Dean and Nick.
>>>>> So, I removed the ADAM and H2o from my SBT as I was not using them.
>>>>> I got the code to compile  - only for fail while running with -
>>>>> SparkContext: Created broadcast 1 from textFile at
>>>>> kmerIntersetion.scala:21
>>>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>>>> org/apache/spark/rdd/RDD$
>>>>>         at preDefKmerIntersection$.main(kmerIntersetion.scala:26)
>>>>>
>>>>> This line is where I do a "JOIN" operation.
>>>>> val hgPair = hgfasta.map(_.split (",")).map(a=> (a(0),
>>>>> a(1).trim().toInt))
>>>>>                  val filtered = hgPair.filter(kv => kv._2 == 1)
>>>>>                  val bedPair = bedFile.map(_.split (",")).map(a=>
>>>>> (a(0), a(1).trim().toInt))
>>>>>             *     val joinRDD = bedPair.join(filtered)   *
>>>>> Any idea whats going on?
>>>>> I have data on the EC2 so I am avoiding creating the new cluster , but
>>>>> just upgrading and changing the code to use 1.3 and Spark SQL
>>>>> Thanks
>>>>> Roni
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 25, 2015 at 9:50 AM, Dean Wampler <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> For the Spark SQL parts, 1.3 breaks backwards compatibility, because
>>>>>> before 1.3, Spark SQL was considered experimental where API changes were
>>>>>> allowed.
>>>>>>
>>>>>> So, H2O and ADA compatible with 1.2.X might not work with 1.3.
>>>>>>
>>>>>> dean
>>>>>>
>>>>>> Dean Wampler, Ph.D.
>>>>>> Author: Programming Scala, 2nd Edition
>>>>>> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
>>>>>> Typesafe <http://typesafe.com>
>>>>>> @deanwampler <http://twitter.com/deanwampler>
>>>>>> http://polyglotprogramming.com
>>>>>>
>>>>>> On Wed, Mar 25, 2015 at 9:39 AM, roni <[email protected]> wrote:
>>>>>>
>>>>>>> Even if H2o and ADA are dependent on 1.2.1 , it should be backword
>>>>>>> compatible, right?
>>>>>>> So using 1.3 should not break them.
>>>>>>> And the code is not using the classes from those libs.
>>>>>>> I tried sbt clean compile .. same errror
>>>>>>> Thanks
>>>>>>> _R
>>>>>>>
>>>>>>> On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> What version of Spark do the other dependencies rely on (Adam and
>>>>>>>> H2O?) - that could be it
>>>>>>>>
>>>>>>>> Or try sbt clean compile
>>>>>>>>
>>>>>>>> —
>>>>>>>> Sent from Mailbox <https://www.dropbox.com/mailbox>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 25, 2015 at 5:58 PM, roni <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>     I have a EC2 cluster created using spark version 1.2.1.
>>>>>>>>> And I have a SBT project .
>>>>>>>>> Now I want to upgrade to spark 1.3 and use the new features.
>>>>>>>>> Below are issues .
>>>>>>>>> Sorry for the long post.
>>>>>>>>> Appreciate your help.
>>>>>>>>> Thanks
>>>>>>>>> -Roni
>>>>>>>>>
>>>>>>>>> Question - Do I have to create a new cluster using spark 1.3?
>>>>>>>>>
>>>>>>>>> Here is what I did -
>>>>>>>>>
>>>>>>>>> In my SBT file I  changed to -
>>>>>>>>> libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.0"
>>>>>>>>>
>>>>>>>>> But then I started getting compilation error. along with
>>>>>>>>> Here are some of the libraries that were evicted:
>>>>>>>>> [warn]  * org.apache.spark:spark-core_2.10:1.2.0 -> 1.3.0
>>>>>>>>> [warn]  * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0)
>>>>>>>>> -> 2.6.0
>>>>>>>>> [warn] Run 'evicted' to see detailed eviction warnings
>>>>>>>>>
>>>>>>>>>  constructor cannot be instantiated to expected type;
>>>>>>>>> [error]  found   : (T1, T2)
>>>>>>>>> [error]  required: org.apache.spark.sql.catalyst.expressions.Row
>>>>>>>>> [error]                                 val ty =
>>>>>>>>> joinRDD.map{case(word, (file1Counts, file2Counts)) => 
>>>>>>>>> KmerIntesect(word,
>>>>>>>>> file1Counts,"xyz")}
>>>>>>>>> [error]                                                          ^
>>>>>>>>>
>>>>>>>>> Here is my SBT and code --
>>>>>>>>> SBT -
>>>>>>>>>
>>>>>>>>> version := "1.0"
>>>>>>>>>
>>>>>>>>> scalaVersion := "2.10.4"
>>>>>>>>>
>>>>>>>>> resolvers += "Sonatype OSS Snapshots" at "
>>>>>>>>> https://oss.sonatype.org/content/repositories/snapshots";;
>>>>>>>>> resolvers += "Maven Repo1" at "https://repo1.maven.org/maven2";;
>>>>>>>>> resolvers += "Maven Repo" at "
>>>>>>>>> https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/
>>>>>>>>> ";
>>>>>>>>>
>>>>>>>>> /* Dependencies - %% appends Scala version to artifactId */
>>>>>>>>> libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
>>>>>>>>> "2.6.0"
>>>>>>>>> libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.0"
>>>>>>>>> libraryDependencies += "org.bdgenomics.adam" % "adam-core" %
>>>>>>>>> "0.16.0"
>>>>>>>>> libraryDependencies += "ai.h2o" % "sparkling-water-core_2.10" %
>>>>>>>>> "0.2.10"
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> CODE --
>>>>>>>>> import org.apache.spark.{SparkConf, SparkContext}
>>>>>>>>> case class KmerIntesect(kmer: String, kCount: Int, fileName:
>>>>>>>>> String)
>>>>>>>>>
>>>>>>>>> object preDefKmerIntersection {
>>>>>>>>>   def main(args: Array[String]) {
>>>>>>>>>
>>>>>>>>>  val sparkConf = new SparkConf().setAppName("preDefKmer-intersect")
>>>>>>>>>      val sc = new SparkContext(sparkConf)
>>>>>>>>>         import sqlContext.createSchemaRDD
>>>>>>>>>         val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>>>>>>>>>             val bedFile = sc.textFile("s3n://a/b/c",40)
>>>>>>>>>              val hgfasta = sc.textFile("hdfs://a/b/c",40)
>>>>>>>>>                  val hgPair = hgfasta.map(_.split (",")).map(a=>
>>>>>>>>> (a(0), a(1).trim().toInt))
>>>>>>>>>                  val filtered = hgPair.filter(kv => kv._2 == 1)
>>>>>>>>>                  val bedPair = bedFile.map(_.split (",")).map(a=>
>>>>>>>>> (a(0), a(1).trim().toInt))
>>>>>>>>>                  val joinRDD = bedPair.join(filtered)
>>>>>>>>>                 val ty = joinRDD.map{case(word, (file1Counts,
>>>>>>>>> file2Counts)) => KmerIntesect(word, file1Counts,"xyz")}
>>>>>>>>>                 ty.registerTempTable("KmerIntesect")
>>>>>>>>>
>>>>>>>>> ty.saveAsParquetFile("hdfs://x/y/z/kmerIntersect.parquet")
>>>>>>>>>   }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

Reply via email to