I know I can arrive at the same result with this code, val range100 = spark.range(1,101).agg((sum('id) as "sum")).first.get(0) println(f"sum of range100 = $range100")
so I am not stuck, I was just curious 😯 why the code breaks using the current link libraries. spark.range(1,101).reduce(_+_) spark-submit test /opt/spark/spark-submit spark.range(1,101).reduce(_+_) <console>:24: error: overloaded method value reduce with alternatives: (func: org.apache.spark.api.java.function.ReduceFunction[java.lang.Long])java.lang.Long <and> (func: (java.lang.Long, java.lang.Long) => java.lang.Long)java.lang.Long cannot be applied to ((java.lang.Long, java.lang.Long) => scala.Long) spark.range(1,101).reduce(_+_) <http://www.backbutton.co.uk/> On Wed, 24 Jun 2020, 19:54 Anwar AliKhan, <anwaralikhan...@gmail.com> wrote: > > I am using the method describe on this page for Scala development in > eclipse. > > https://data-flair.training/blogs/create-spark-scala-project/ > > > in the middle of the page you will find > > > *“y**ou will see lots of error due to missing libraries.* > viii. Add Spark Libraries” > > > Now that I have my own build I will be pointing to the jars (spark > libraries) > > in directory /opt/spark/assembly/target/scala-2.12/jars > > > This way I know exactly the jar libraries I am using to remove the > formentioned errors. > > > At the same time I am trying to setup a template environment as shown here > > > https://medium.com/@faizanahemad/apache-spark-setup-with-gradle-scala-and-intellij-2eeb9f30c02a > > > so that I can have variables sc and spark in the eclipse editor same you > would have spark, sc variables in the spark-shell. > > > I used the word trying because the following code is broken > > > spark.range(1,101).reduce(_ + _) > > with latest spark. > > > If I use the gradle method as described then the code does work because > it is pulling the libraries from maven repository as stipulated in > gradle.properties > <https://github.com/faizanahemad/spark-gradle-template/blob/master/gradle.properties> > . > > > In my previous post I *forget* with maven pom.xml you can actually > specify version number of jar you want to pull from maven repository using > *mvn > clean package *command. > > > So even if I use maven with eclipse then any new libraries uploaded in > maven repository by developers will have recent version numbers. So will > not effect my project. > > Can you please tell me why the code spark.range(1,101).reduce(_ + _) is > broken with latest spark ? > > > <http://www.backbutton.co.uk/> > > > On Wed, 24 Jun 2020, 17:07 Jeff Evans, <jeffrey.wayne.ev...@gmail.com> > wrote: > >> If I'm understanding this correctly, you are building Spark from source >> and using the built artifacts (jars) in some other project. Correct? If >> so, then why are you concerning yourself with the directory structure that >> Spark, internally, uses when building its artifacts? It should be a black >> box to your application, entirely. You would pick the profiles (ex: Scala >> version, Hadoop version, etc.) you need, then the install phase of Maven >> will take care of building the jars and putting them in your local Maven >> repo. After that, you can resolve them from your other project seamlessly >> (simply by declaring the org/artifact/version). >> >> Maven artifacts are immutable, at least released versions in Maven >> central. If "someone" (unclear who you are talking about) is "swapping >> out" jars in a Maven repo then they're doing something extremely strange >> and broken, unless they're simply replacing snapshot versions, which is a >> different >> beast entirely >> <https://maven.apache.org/guides/getting-started/index.html#What_is_a_SNAPSHOT_version> >> . >> >> On Wed, Jun 24, 2020 at 10:39 AM Anwar AliKhan <anwaralikhan...@gmail.com> >> wrote: >> >>> THANKS >>> >>> >>> It appears the directory containing the jars have been switched from >>> download version to source version. >>> >>> In the download version it is just below parent directory called jars. >>> level 1. >>> >>> In the git source version it is 4 levels down in the directory >>> /spark/assembly/target/scala-2.12/jars >>> >>> The issue I have with using maven is that the linking libraries can be >>> changed at maven repository without my knowledge . >>> So if an application compiled and worked previously could just break. >>> >>> It is not like when the developers make a change to the link libraries >>> they run it by me first ,😢 they just upload it to maven repository with >>> out asking me if their change >>> Is going to impact my app. >>> >>> >>> >>> >>> >>> >>> On Wed, 24 Jun 2020, 16:07 ArtemisDev, <arte...@dtechspace.com> wrote: >>> >>>> If you are using Maven to manage your jar dependencies, the jar files >>>> are located in the maven repository on your home directory. It is usually >>>> in the .m2 directory. >>>> >>>> Hope this helps. >>>> >>>> -ND >>>> On 6/23/20 3:21 PM, Anwar AliKhan wrote: >>>> >>>> Hi, >>>> >>>> I prefer to do most of my projects in Python and for that I use Jupyter. >>>> I have been downloading the compiled version of spark. >>>> >>>> I do not normally like the source code version because the build >>>> process makes me nervous. >>>> You know with lines of stuff scrolling up the screen. >>>> What am I am going to do if a build fails. I am a user! >>>> >>>> I decided to risk it and it was only one mvn command to build. (45 >>>> minutes later) >>>> Everything is great. Success. >>>> >>>> I removed all jvms except jdk8 for compilation. >>>> >>>> I used jdk8 so I know which libraries where linked in the build process. >>>> I also used my local version of maven. Not the apt install version . >>>> >>>> I used jdk8 because if you go this scala site. >>>> >>>> http://scala-ide.org/download/sdk.html. they say requirement jdk8 for >>>> IDE >>>> even for scala12. >>>> They don't say JDK 8 or higher , just jdk8. >>>> >>>> So anyway once in a while I do spark projects in scala with eclipse. >>>> >>>> For that I don't use maven or anything. I prefer to make use of build >>>> path >>>> And external jars. This way I know exactly which libraries I am linking >>>> to. >>>> >>>> creating a jar in eclipse is straight forward for spark_submit. >>>> >>>> >>>> Anyway as you can see (below) I am pointing jupyter to find >>>> spark.init('opt/spark'). >>>> That's OK everything is fine. >>>> >>>> With the compiled version of spark there is a jar directory which I >>>> have been using in eclipse. >>>> >>>> >>>> >>>> With my own compiled from source version there is no jar directory. >>>> >>>> >>>> Where are all the jars gone ?. >>>> >>>> >>>> >>>> I am not sure how findspark.init('/opt/spark') is locating the >>>> libraries unless it is finding them from >>>> Anaconda. >>>> >>>> >>>> import findspark >>>> findspark.init('/opt/spark') >>>> from pyspark.sql import SparkSession >>>> spark = SparkSession \ >>>> .builder \ >>>> .appName('Titanic Data') \ >>>> .getOrCreate() >>>> >>>>