Answers to first two questions are 'yes' Not clear on what the 3rd question is asking.
On Fri, Mar 4, 2016 at 4:28 PM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Thanks now all working. Also select from tmp tables are part > of sqlContext not HiveContext > > This is the final code that works in blue > > > Couple of questions if I may > > > 1. This works pretty effortless in spark-shell. Is this because > $CLASSPATH already includes all the needed jars? > 2. The import section. That imports the needed classes. So basically > import org.apache.spark.sql.functions._ imports all the methods of Class > functions? > 3. What is the reason why we should use sbt to build custom jars from > a spark code as opposed to running the code against spark shell in a file? > Any particular use case for it? > > > import org.apache.spark.SparkContext > import org.apache.spark.SparkConf > import org.apache.spark.sql.Row > import org.apache.spark.sql.hive.HiveContext > import org.apache.spark.sql.types._ > import org.apache.spark.sql.SQLContext > import org.apache.spark.sql.functions._ > // > object Sequence { > def main(args: Array[String]) { > val conf = new > SparkConf().setAppName("Sequence").setMaster("local[*]").set("spark.driver.allowMultipleContexts", > "true") > val sc = new SparkContext(conf) > // Note that this should be done only after an instance of > org.apache.spark.sql.SQLContext is created. It should be written as: > val sqlContext= new org.apache.spark.sql.SQLContext(sc) > import sqlContext.implicits._ > val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) > val a = Seq(("Mich",20), ("Christian", 18), ("James",13), ("Richard",16)) > // Sort option 1 using tempTable > val b = a.toDF("Name","score").registerTempTable("tmp") > sqlContext.sql("select Name,score from tmp order by score desc").show > // Sort option 2 with FP > a.toDF("Name","score").sort(desc("score")).show > } > } > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 4 March 2016 at 23:58, Chandeep Singh <c...@chandeep.com> wrote: > >> That is because an instance of org.apache.spark.sql.SQLContext doesn’t >> exist in the current context and is required before you can use any of its >> implicit methods. >> >> As Ted mentioned importing org.apache.spark.sql.functions._ will take >> care of the below error. >> >> >> On Mar 4, 2016, at 11:35 PM, Mich Talebzadeh <mich.talebza...@gmail.com> >> wrote: >> >> thanks. It is like war of attrition. I always thought that you add >> import before the class itself not within the class? w3hat is the reason >> for it please? >> >> this is my code >> >> import org.apache.spark.SparkContext >> import org.apache.spark.SparkConf >> import org.apache.spark.sql.Row >> import org.apache.spark.sql.hive.HiveContext >> import org.apache.spark.sql.types._ >> import org.apache.spark.sql.SQLContext >> // >> object Sequence { >> def main(args: Array[String]) { >> val conf = new >> SparkConf().setAppName("Sequence").setMaster("local[*]").set("spark.driver.allowMultipleContexts", >> "true") >> val sc = new SparkContext(conf) >> // Note that this should be done only after an instance of >> org.apache.spark.sql.SQLContext is created. It should be written as: >> val sqlContext= new org.apache.spark.sql.SQLContext(sc) >> import sqlContext.implicits._ >> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) >> val a = Seq(("Mich",20), ("Christian", 18), ("James",13), >> ("Richard",16)) >> // Sort option 1 using tempTable >> val b = a.toDF("Name","score").registerTempTable("tmp") >> HiveContext.sql("select Name,score from tmp order by score desc").show >> // Sort option 2 with FP >> a.toDF("Name","score").sort(desc("score")).show >> } >> } >> >> And now the last failure is in >> >> info] [SUCCESSFUL ] org.scala-lang#jline;2.10.5!jline.jar (104ms) >> [info] Done updating. >> [info] Compiling 1 Scala source to >> /home/hduser/dba/bin/scala/Sequence/target/scala-2.10/classes... >> [info] 'compiler-interface' not yet compiled for Scala 2.10.5. >> Compiling... >> [info] Compilation completed in 15.779 s >> [error] >> /home/hduser/dba/bin/scala/Sequence/src/main/scala/Sequence.scala:21: not >> found: value desc >> [error] a.toDF("Name","score").sort(desc("score")).show >> [error] ^ >> [error] one error found >> [error] (compile:compileIncremental) Compilation failed >> >> >> Dr Mich Talebzadeh >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> http://talebzadehmich.wordpress.com >> >> >> >> On 4 March 2016 at 23:25, Chandeep Singh <c...@chandeep.com> wrote: >> >>> This is what you need: >>> >>> val sc = new SparkContext(sparkConf) >>> val sqlContext = new org.apache.spark.sql.SQLContext(sc) >>> import sqlContext.implicits._ >>> >>> On Mar 4, 2016, at 11:03 PM, Mich Talebzadeh <mich.talebza...@gmail.com> >>> wrote: >>> >>> Hi Ted, >>> >>> This is my code >>> >>> import org.apache.spark.SparkConf >>> import org.apache.spark.sql.Row >>> import org.apache.spark.sql.hive.HiveContext >>> import org.apache.spark.sql.types._ >>> import org.apache.spark.sql.SQLContext >>> // >>> object Sequence { >>> def main(args: Array[String]) { >>> val conf = new >>> SparkConf().setAppName("Sequence").setMaster("local[*]").set("spark.driver.allowMultipleContexts", >>> "true") >>> val sc = new SparkContext(conf) >>> val sqlContext = new org.apache.spark.sql.SQLContext(sc) >>> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) >>> val a = Seq(("Mich",20), ("Christian", 18), ("James",13), >>> ("Richard",16)) >>> // Sort option 1 using tempTable >>> val b = a.toDF("Name","score").registerTempTable("tmp") >>> sql("select Name,score from tmp order by score desc").show >>> // Sort option 2 with FP >>> a.toDF("Name","score").sort(desc("score")).show >>> } >>> } >>> >>> And the error I am getting now is >>> >>> [info] downloading >>> https://repo1.maven.org/maven2/org/scala-lang/jline/2.10.5/jline-2.10.5.jar >>> ... >>> [info] [SUCCESSFUL ] org.scala-lang#jline;2.10.5!jline.jar (103ms) >>> [info] Done updating. >>> [info] Compiling 1 Scala source to >>> /home/hduser/dba/bin/scala/Sequence/target/scala-2.10/classes... >>> [info] 'compiler-interface' not yet compiled for Scala 2.10.5. >>> Compiling... >>> [info] Compilation completed in 12.462 s >>> [error] >>> /home/hduser/dba/bin/scala/Sequence/src/main/scala/Sequence.scala:16: value >>> toDF is not a member of Seq[(String, Int)] >>> [error] val b = a.toDF("Name","score").registerTempTable("tmp") >>> [error] ^ >>> [error] >>> /home/hduser/dba/bin/scala/Sequence/src/main/scala/Sequence.scala:17: not >>> found: value sql >>> [error] sql("select Name,score from tmp order by score desc").show >>> [error] ^ >>> [error] >>> /home/hduser/dba/bin/scala/Sequence/src/main/scala/Sequence.scala:19: value >>> toDF is not a member of Seq[(String, Int)] >>> [error] a.toDF("Name","score").sort(desc("score")).show >>> [error] ^ >>> [error] three errors found >>> [error] (compile:compileIncremental) Compilation failed >>> [error] Total time: 88 s, completed Mar 4, 2016 11:12:46 PM >>> >>> >>> Dr Mich Talebzadeh >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> >>> On 4 March 2016 at 22:52, Ted Yu <yuzhih...@gmail.com> wrote: >>> >>>> Can you show your code snippet ? >>>> Here is an example: >>>> >>>> val sqlContext = new SQLContext(sc) >>>> import sqlContext.implicits._ >>>> >>>> On Fri, Mar 4, 2016 at 1:55 PM, Mich Talebzadeh < >>>> mich.talebza...@gmail.com> wrote: >>>> >>>>> Hi Ted, >>>>> >>>>> I am getting the following error after adding that import >>>>> >>>>> [error] >>>>> /home/hduser/dba/bin/scala/Sequence/src/main/scala/Sequence.scala:5: not >>>>> found: object sqlContext >>>>> [error] import sqlContext.implicits._ >>>>> [error] ^ >>>>> [error] >>>>> /home/hduser/dba/bin/scala/Sequence/src/main/scala/Sequence.scala:15: >>>>> value >>>>> toDF is not a member of Seq[(String, Int)] >>>>> >>>>> >>>>> Dr Mich Talebzadeh >>>>> >>>>> >>>>> LinkedIn * >>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>> >>>>> >>>>> http://talebzadehmich.wordpress.com >>>>> >>>>> >>>>> >>>>> On 4 March 2016 at 21:39, Ted Yu <yuzhih...@gmail.com> wrote: >>>>> >>>>>> Can you add the following into your code ? >>>>>> import sqlContext.implicits._ >>>>>> >>>>>> On Fri, Mar 4, 2016 at 1:14 PM, Mich Talebzadeh < >>>>>> mich.talebza...@gmail.com> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I have a simple Scala program as below >>>>>>> >>>>>>> import org.apache.spark.SparkContext >>>>>>> import org.apache.spark.SparkContext._ >>>>>>> import org.apache.spark.SparkConf >>>>>>> import org.apache.spark.sql.SQLContext >>>>>>> object Sequence { >>>>>>> def main(args: Array[String]) { >>>>>>> val conf = new SparkConf().setAppName("Sequence") >>>>>>> val sc = new SparkContext(conf) >>>>>>> val sqlContext = new org.apache.spark.sql.SQLContext(sc) >>>>>>> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) >>>>>>> val a = Seq(("Mich",20), ("Christian", 18), ("James",13), >>>>>>> ("Richard",16)) >>>>>>> // Sort option 1 using tempTable >>>>>>> val b = a.toDF("Name","score").registerTempTable("tmp") >>>>>>> sql("select Name,score from tmp order by score desc").show >>>>>>> // Sort option 2 with FP >>>>>>> a.toDF("Name","score").sort(desc("score")).show >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> I build this using sbt tool as below >>>>>>> >>>>>>> cat sequence.sbt >>>>>>> name := "Sequence" >>>>>>> version := "1.0" >>>>>>> scalaVersion := "2.10.5" >>>>>>> libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.0" >>>>>>> libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.0.0" >>>>>>> libraryDependencies += "org.apache.spark" %% "spark-hive" % "1.5.0" >>>>>>> >>>>>>> >>>>>>> But it fails compilation as below >>>>>>> >>>>>>> [info] Compilation completed in 12.366 s >>>>>>> [error] >>>>>>> /home/hduser/dba/bin/scala/Sequence/src/main/scala/Sequence.scala:15: >>>>>>> value >>>>>>> toDF is not a member of Seq[(String, Int)] >>>>>>> [error] val b = a.toDF("Name","score").registerTempTable("tmp") >>>>>>> [error] ^ >>>>>>> [error] >>>>>>> /home/hduser/dba/bin/scala/Sequence/src/main/scala/Sequence.scala:16: >>>>>>> not >>>>>>> found: value sql >>>>>>> [error] sql("select Name,score from tmp order by score desc").show >>>>>>> [error] ^ >>>>>>> [error] >>>>>>> /home/hduser/dba/bin/scala/Sequence/src/main/scala/Sequence.scala:18: >>>>>>> value >>>>>>> toDF is not a member of Seq[(String, Int)] >>>>>>> [error] a.toDF("Name","score").sort(desc("score")).show >>>>>>> [error] ^ >>>>>>> [error] three errors found >>>>>>> [error] (compile:compileIncremental) Compilation failed >>>>>>> [error] Total time: 95 s, completed Mar 4, 2016 9:06:40 PM >>>>>>> >>>>>>> I think I am missing some dependencies here >>>>>>> >>>>>>> >>>>>>> I have a simple >>>>>>> Dr Mich Talebzadeh >>>>>>> >>>>>>> >>>>>>> LinkedIn * >>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>> >>>>>>> >>>>>>> http://talebzadehmich.wordpress.com >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >>> >> >> >