I can run it now with the suggested method. However, I have encountered a new problem that I have not faced before (sent another email with that one but here it goes again ...)
I ran SparkKMeans with a big file (~ 7 GB of data) for one iteration with spark-0.8.0 with this line in bash.rc " export _JAVA_OPTIONS="-Xmx15g -Xms15g -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails" ". It finished in a decent time, ~50 seconds, and I had only a few "Full GC...." messages from Java. (a max of 4-5) Now, using the same export in bash.rc but with spark-1.0.0 (and running it with spark-submit) the first loop never finishes and I get a lot of: "18.537: [GC (Allocation Failure) --[PSYoungGen: 11796992K->11796992K(13762560K)] 11797442K->11797450K(13763072K), 2.8420311 secs] [Times: user=5.81 sys=2.12, real=2.85 secs] " or "31.867: [Full GC (Ergonomics) [PSYoungGen: 11796992K->3177967K(13762560K)] [ParOldGen: 505K->505K(512K)] 11797497K->3178473K(13763072K), [Metaspace: 37646K->37646K(1081344K)], 2.3053283 secs] [Times: user=37.74 sys=0.11, real=2.31 secs]" I tried passing different parameters for the JVM through spark-submit, but the results are the same This happens with java 1.7 and also with java 1.8. I do not know what the "Ergonomics" stands for ... How can I get a decent performance from spark-1.0.0 considering that spark-0.8.0 did not need any fine tuning on the gargage collection method (the default worked well) ? Thank you On Wednesday, July 2, 2014 4:45 PM, Yana Kadiyska <yana.kadiy...@gmail.com> wrote: The scripts that Xiangrui mentions set up the classpath...Can you run ./run-example for the provided example sucessfully? What you can try is set SPARK_PRINT_LAUNCH_COMMAND=1 and then call run-example -- that will show you the exact java command used to run the example at the start of execution. Assuming you can run examples succesfully, you should be able to just copy that and add your jar to the front of the classpath. If that works you can start removing extra jars (run-examples put all the example jars in the cp, which you won't need) As you said the error you see is indicative of the class not being available/seen at runtime but it's hard to tell why. On Wed, Jul 2, 2014 at 2:13 AM, Wanda Hawk <wanda_haw...@yahoo.com> wrote: > I want to make some minor modifications in the SparkMeans.scala so running > the basic example won't do. > I have also packed my code under a "jar" file with sbt. It completes > successfully but when I try to run it : "java -jar myjar.jar" I get the same > error: > "Exception in thread "main" java.lang.NoClassDefFoundError: > breeze/linalg/Vector > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2531) > at java.lang.Class.getMethod0(Class.java:2774) > at java.lang.Class.getMethod(Class.java:1663) > at > sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494) > at > sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486) > " > > If "scalac -d classes/ SparkKMeans.scala" can't see my classpath, why does > it succeeds in compiling and does not give the same error ? > The error itself "NoClassDefFoundError" means that the files are available > at compile time, but for some reason I cannot figure out they are not > available at run time. Does anyone know why ? > > Thank you > > > On Tuesday, July 1, 2014 7:03 PM, Xiangrui Meng <men...@gmail.com> wrote: > > > You can use either bin/run-example or bin/spark-summit to run example > code. "scalac -d classes/ SparkKMeans.scala" doesn't recognize Spark > classpath. There are examples in the official doc: > http://spark.apache.org/docs/latest/quick-start.html#where-to-go-from-here > -Xiangrui > > On Tue, Jul 1, 2014 at 4:39 AM, Wanda Hawk <wanda_haw...@yahoo.com> wrote: >> Hello, >> >> I have installed spark-1.0.0 with scala2.10.3. I have built spark with >> "sbt/sbt assembly" and added >> >> "/home/wanda/spark-1.0.0/assembly/target/scala-2.10/spark-assembly-1.0.0-hadoop1.0.4.jar" >> to my CLASSPATH variable. >> Then I went here >> "../spark-1.0.0/examples/src/main/scala/org/apache/spark/examples" created >> a >> new directory "classes" and compiled SparkKMeans.scala with "scalac -d >> classes/ SparkKMeans.scala" >> Then I navigated to "classes" (I commented this line in the scala file : >> package org.apache.spark.examples ) and tried to run it with "java -cp . >> SparkKMeans" and I get the following error: >> "Exception in thread "main" java.lang.NoClassDefFoundError: >> breeze/linalg/Vector >> at java.lang.Class.getDeclaredMethods0(Native Method) >> at java.lang.Class.privateGetDeclaredMethods(Class.java:2531) >> at java.lang.Class.getMethod0(Class.java:2774) >> at java.lang.Class.getMethod(Class.java:1663) >> at >> sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494) >> at >> sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486) >> Caused by: java.lang.ClassNotFoundException: breeze.linalg.Vector >> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >> ... 6 more >> " >> The jar under >> >> "/home/wanda/spark-1.0.0/assembly/target/scala-2.10/spark-assembly-1.0.0-hadoop1.0.4.jar" >> contains the breeze/linalg/Vector* path, I even tried to unpack it and put >> it in CLASSPATH to it does not seem to pick it up >> >> >> I am currently running java 1.8 >> "java version "1.8.0_05" >> Java(TM) SE Runtime Environment (build 1.8.0_05-b13) >> Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode)" >> >> What I am doing wrong ? >> > >