Hi all, Question from a newbie here about your excellent Spark:
I've just installed Spark 1.5.2, pre-built for Hadoop 2.4 and later. I'm trying to go through the introductory documentation using local[4] to begin with. In pyspark, I'm able to use examples such as the simple application at PROVIDED that I remove the sc initialisation. Otherwise, if I try to run any Python script using spark-submit, I get the verbose error message I show below and no output. I am not able to fix this. Any assistance would be very gratefully received. My machine runs Windows 10 HOME, with 8GB ram on a 64 bit Intel Core i3-@ 3.4 gHz. I'm using Python 2.7.11 under Anaconda 2.4.1. Source, from http://spark.apache.org/docs/latest/quick-start.html#self-contained-applications = from pyspark import SparkContext logFile = "README.md" # Should be some file on your system sc = SparkContext("local", "Simple App") logData = sc.textFile(logFile).cache() numAs = logData.filter(lambda s: 'a' in s).count() numBs = logData.filter(lambda s: 'b' in s).count() print("Lines with a: %i, lines with b: %i" % (numAs, numBs)) Error output = Traceback (most recent call last): File "c:/Users/Peter/spark-1.5.2-bin-hadoop2.4/SimpleApp.py", line 3, in <module> sc = SparkContext("local", "Simple App") File "c:\Users\Peter\spark-1.5.2-bin-hadoop2.4\python\lib\pyspark.zip\pyspark\context.py", line 113, in __init__ File "c:\Users\Peter\spark-1.5.2-bin-hadoop2.4\python\lib\pyspark.zip\pyspark\context.py", line 170, in _do_init File "c:\Users\Peter\spark-1.5.2-bin-hadoop2.4\python\lib\pyspark.zip\pyspark\context.py", line 224, in _initialize_context File "c:\Users\Peter\spark-1.5.2-bin-hadoop2.4\python\lib\py4j-0.8.2.1-src.zip\py4j\java_gateway.py", line 701, in __call__ File "c:\Users\Peter\spark-1.5.2-bin-hadoop2.4\python\lib\py4j-0.8.2.1-src.zip\py4j\protocol.py", line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.lang.NullPointerException at java.lang.ProcessBuilder.start(Unknown Source) at org.apache.hadoop.util.Shell.runCommand(Shell.java:445) at org.apache.hadoop.util.Shell.run(Shell.java:418) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650) at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:873) at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:853) at org.apache.spark.util.Utils$.fetchFile(Utils.scala:381) at org.apache.spark.SparkContext.addFile(SparkContext.scala:1387) at org.apache.spark.SparkContext.addFile(SparkContext.scala:1341) at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:484) at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:484) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.SparkContext.<init>(SparkContext.scala:484) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:214) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Unknown Source) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/1-5-2-prebuilt-for-2-4-spark-submit-standalone-Python-scripts-not-running-tp25804.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org