Hi Ted, I meant as we have the spark-shell and spark-sql, what is the advantage of building self contained applications? We still need to submit it via spark-submit. Basically the use case for self contained programs. That is we build the code, create the class and run it independently of spark-shell? I mean I can run the code from Apache Zeppelin through the notebook.
Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 5 March 2016 at 00:32, Ted Yu <yuzhih...@gmail.com> wrote: > Answers to first two questions are 'yes' > > Not clear on what the 3rd question is asking. > > On Fri, Mar 4, 2016 at 4:28 PM, Mich Talebzadeh <mich.talebza...@gmail.com > > wrote: > >> Thanks now all working. Also select from tmp tables are part >> of sqlContext not HiveContext >> >> This is the final code that works in blue >> >> >> Couple of questions if I may >> >> >> 1. This works pretty effortless in spark-shell. Is this because >> $CLASSPATH already includes all the needed jars? >> 2. The import section. That imports the needed classes. So basically >> import org.apache.spark.sql.functions._ imports all the methods of Class >> functions? >> 3. What is the reason why we should use sbt to build custom jars from >> a spark code as opposed to running the code against spark shell in a file? >> Any particular use case for it? >> >> >> import org.apache.spark.SparkContext >> import org.apache.spark.SparkConf >> import org.apache.spark.sql.Row >> import org.apache.spark.sql.hive.HiveContext >> import org.apache.spark.sql.types._ >> import org.apache.spark.sql.SQLContext >> import org.apache.spark.sql.functions._ >> // >> object Sequence { >> def main(args: Array[String]) { >> val conf = new >> SparkConf().setAppName("Sequence").setMaster("local[*]").set("spark.driver.allowMultipleContexts", >> "true") >> val sc = new SparkContext(conf) >> // Note that this should be done only after an instance of >> org.apache.spark.sql.SQLContext is created. It should be written as: >> val sqlContext= new org.apache.spark.sql.SQLContext(sc) >> import sqlContext.implicits._ >> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) >> val a = Seq(("Mich",20), ("Christian", 18), ("James",13), >> ("Richard",16)) >> // Sort option 1 using tempTable >> val b = a.toDF("Name","score").registerTempTable("tmp") >> sqlContext.sql("select Name,score from tmp order by score desc").show >> // Sort option 2 with FP >> a.toDF("Name","score").sort(desc("score")).show >> } >> } >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> >> On 4 March 2016 at 23:58, Chandeep Singh <c...@chandeep.com> wrote: >> >>> That is because an instance of org.apache.spark.sql.SQLContext doesn’t >>> exist in the current context and is required before you can use any of its >>> implicit methods. >>> >>> As Ted mentioned importing org.apache.spark.sql.functions._ will take >>> care of the below error. >>> >>> >>> On Mar 4, 2016, at 11:35 PM, Mich Talebzadeh <mich.talebza...@gmail.com> >>> wrote: >>> >>> thanks. It is like war of attrition. I always thought that you add >>> import before the class itself not within the class? w3hat is the reason >>> for it please? >>> >>> this is my code >>> >>> import org.apache.spark.SparkContext >>> import org.apache.spark.SparkConf >>> import org.apache.spark.sql.Row >>> import org.apache.spark.sql.hive.HiveContext >>> import org.apache.spark.sql.types._ >>> import org.apache.spark.sql.SQLContext >>> // >>> object Sequence { >>> def main(args: Array[String]) { >>> val conf = new >>> SparkConf().setAppName("Sequence").setMaster("local[*]").set("spark.driver.allowMultipleContexts", >>> "true") >>> val sc = new SparkContext(conf) >>> // Note that this should be done only after an instance of >>> org.apache.spark.sql.SQLContext is created. It should be written as: >>> val sqlContext= new org.apache.spark.sql.SQLContext(sc) >>> import sqlContext.implicits._ >>> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) >>> val a = Seq(("Mich",20), ("Christian", 18), ("James",13), >>> ("Richard",16)) >>> // Sort option 1 using tempTable >>> val b = a.toDF("Name","score").registerTempTable("tmp") >>> HiveContext.sql("select Name,score from tmp order by score desc").show >>> // Sort option 2 with FP >>> a.toDF("Name","score").sort(desc("score")).show >>> } >>> } >>> >>> And now the last failure is in >>> >>> info] [SUCCESSFUL ] org.scala-lang#jline;2.10.5!jline.jar (104ms) >>> [info] Done updating. >>> [info] Compiling 1 Scala source to >>> /home/hduser/dba/bin/scala/Sequence/target/scala-2.10/classes... >>> [info] 'compiler-interface' not yet compiled for Scala 2.10.5. >>> Compiling... >>> [info] Compilation completed in 15.779 s >>> [error] >>> /home/hduser/dba/bin/scala/Sequence/src/main/scala/Sequence.scala:21: not >>> found: value desc >>> [error] a.toDF("Name","score").sort(desc("score")).show >>> [error] ^ >>> [error] one error found >>> [error] (compile:compileIncremental) Compilation failed >>> >>> >>> Dr Mich Talebzadeh >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> >>> On 4 March 2016 at 23:25, Chandeep Singh <c...@chandeep.com> wrote: >>> >>>> This is what you need: >>>> >>>> val sc = new SparkContext(sparkConf) >>>> val sqlContext = new org.apache.spark.sql.SQLContext(sc) >>>> import sqlContext.implicits._ >>>> >>>> On Mar 4, 2016, at 11:03 PM, Mich Talebzadeh <mich.talebza...@gmail.com> >>>> wrote: >>>> >>>> Hi Ted, >>>> >>>> This is my code >>>> >>>> import org.apache.spark.SparkConf >>>> import org.apache.spark.sql.Row >>>> import org.apache.spark.sql.hive.HiveContext >>>> import org.apache.spark.sql.types._ >>>> import org.apache.spark.sql.SQLContext >>>> // >>>> object Sequence { >>>> def main(args: Array[String]) { >>>> val conf = new >>>> SparkConf().setAppName("Sequence").setMaster("local[*]").set("spark.driver.allowMultipleContexts", >>>> "true") >>>> val sc = new SparkContext(conf) >>>> val sqlContext = new org.apache.spark.sql.SQLContext(sc) >>>> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) >>>> val a = Seq(("Mich",20), ("Christian", 18), ("James",13), >>>> ("Richard",16)) >>>> // Sort option 1 using tempTable >>>> val b = a.toDF("Name","score").registerTempTable("tmp") >>>> sql("select Name,score from tmp order by score desc").show >>>> // Sort option 2 with FP >>>> a.toDF("Name","score").sort(desc("score")).show >>>> } >>>> } >>>> >>>> And the error I am getting now is >>>> >>>> [info] downloading >>>> https://repo1.maven.org/maven2/org/scala-lang/jline/2.10.5/jline-2.10.5.jar >>>> ... >>>> [info] [SUCCESSFUL ] org.scala-lang#jline;2.10.5!jline.jar (103ms) >>>> [info] Done updating. >>>> [info] Compiling 1 Scala source to >>>> /home/hduser/dba/bin/scala/Sequence/target/scala-2.10/classes... >>>> [info] 'compiler-interface' not yet compiled for Scala 2.10.5. >>>> Compiling... >>>> [info] Compilation completed in 12.462 s >>>> [error] >>>> /home/hduser/dba/bin/scala/Sequence/src/main/scala/Sequence.scala:16: value >>>> toDF is not a member of Seq[(String, Int)] >>>> [error] val b = a.toDF("Name","score").registerTempTable("tmp") >>>> [error] ^ >>>> [error] >>>> /home/hduser/dba/bin/scala/Sequence/src/main/scala/Sequence.scala:17: not >>>> found: value sql >>>> [error] sql("select Name,score from tmp order by score desc").show >>>> [error] ^ >>>> [error] >>>> /home/hduser/dba/bin/scala/Sequence/src/main/scala/Sequence.scala:19: value >>>> toDF is not a member of Seq[(String, Int)] >>>> [error] a.toDF("Name","score").sort(desc("score")).show >>>> [error] ^ >>>> [error] three errors found >>>> [error] (compile:compileIncremental) Compilation failed >>>> [error] Total time: 88 s, completed Mar 4, 2016 11:12:46 PM >>>> >>>> >>>> Dr Mich Talebzadeh >>>> >>>> >>>> LinkedIn * >>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>> >>>> >>>> http://talebzadehmich.wordpress.com >>>> >>>> >>>> >>>> On 4 March 2016 at 22:52, Ted Yu <yuzhih...@gmail.com> wrote: >>>> >>>>> Can you show your code snippet ? >>>>> Here is an example: >>>>> >>>>> val sqlContext = new SQLContext(sc) >>>>> import sqlContext.implicits._ >>>>> >>>>> On Fri, Mar 4, 2016 at 1:55 PM, Mich Talebzadeh < >>>>> mich.talebza...@gmail.com> wrote: >>>>> >>>>>> Hi Ted, >>>>>> >>>>>> I am getting the following error after adding that import >>>>>> >>>>>> [error] >>>>>> /home/hduser/dba/bin/scala/Sequence/src/main/scala/Sequence.scala:5: not >>>>>> found: object sqlContext >>>>>> [error] import sqlContext.implicits._ >>>>>> [error] ^ >>>>>> [error] >>>>>> /home/hduser/dba/bin/scala/Sequence/src/main/scala/Sequence.scala:15: >>>>>> value >>>>>> toDF is not a member of Seq[(String, Int)] >>>>>> >>>>>> >>>>>> Dr Mich Talebzadeh >>>>>> >>>>>> >>>>>> LinkedIn * >>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>> >>>>>> >>>>>> http://talebzadehmich.wordpress.com >>>>>> >>>>>> >>>>>> >>>>>> On 4 March 2016 at 21:39, Ted Yu <yuzhih...@gmail.com> wrote: >>>>>> >>>>>>> Can you add the following into your code ? >>>>>>> import sqlContext.implicits._ >>>>>>> >>>>>>> On Fri, Mar 4, 2016 at 1:14 PM, Mich Talebzadeh < >>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I have a simple Scala program as below >>>>>>>> >>>>>>>> import org.apache.spark.SparkContext >>>>>>>> import org.apache.spark.SparkContext._ >>>>>>>> import org.apache.spark.SparkConf >>>>>>>> import org.apache.spark.sql.SQLContext >>>>>>>> object Sequence { >>>>>>>> def main(args: Array[String]) { >>>>>>>> val conf = new SparkConf().setAppName("Sequence") >>>>>>>> val sc = new SparkContext(conf) >>>>>>>> val sqlContext = new org.apache.spark.sql.SQLContext(sc) >>>>>>>> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) >>>>>>>> val a = Seq(("Mich",20), ("Christian", 18), ("James",13), >>>>>>>> ("Richard",16)) >>>>>>>> // Sort option 1 using tempTable >>>>>>>> val b = a.toDF("Name","score").registerTempTable("tmp") >>>>>>>> sql("select Name,score from tmp order by score desc").show >>>>>>>> // Sort option 2 with FP >>>>>>>> a.toDF("Name","score").sort(desc("score")).show >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> I build this using sbt tool as below >>>>>>>> >>>>>>>> cat sequence.sbt >>>>>>>> name := "Sequence" >>>>>>>> version := "1.0" >>>>>>>> scalaVersion := "2.10.5" >>>>>>>> libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.0" >>>>>>>> libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.0.0" >>>>>>>> libraryDependencies += "org.apache.spark" %% "spark-hive" % "1.5.0" >>>>>>>> >>>>>>>> >>>>>>>> But it fails compilation as below >>>>>>>> >>>>>>>> [info] Compilation completed in 12.366 s >>>>>>>> [error] >>>>>>>> /home/hduser/dba/bin/scala/Sequence/src/main/scala/Sequence.scala:15: >>>>>>>> value >>>>>>>> toDF is not a member of Seq[(String, Int)] >>>>>>>> [error] val b = a.toDF("Name","score").registerTempTable("tmp") >>>>>>>> [error] ^ >>>>>>>> [error] >>>>>>>> /home/hduser/dba/bin/scala/Sequence/src/main/scala/Sequence.scala:16: >>>>>>>> not >>>>>>>> found: value sql >>>>>>>> [error] sql("select Name,score from tmp order by score desc").show >>>>>>>> [error] ^ >>>>>>>> [error] >>>>>>>> /home/hduser/dba/bin/scala/Sequence/src/main/scala/Sequence.scala:18: >>>>>>>> value >>>>>>>> toDF is not a member of Seq[(String, Int)] >>>>>>>> [error] a.toDF("Name","score").sort(desc("score")).show >>>>>>>> [error] ^ >>>>>>>> [error] three errors found >>>>>>>> [error] (compile:compileIncremental) Compilation failed >>>>>>>> [error] Total time: 95 s, completed Mar 4, 2016 9:06:40 PM >>>>>>>> >>>>>>>> I think I am missing some dependencies here >>>>>>>> >>>>>>>> >>>>>>>> I have a simple >>>>>>>> Dr Mich Talebzadeh >>>>>>>> >>>>>>>> >>>>>>>> LinkedIn * >>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>>> >>>>>>>> >>>>>>>> http://talebzadehmich.wordpress.com >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>> >>> >> >