Hi,

I am trying to start with spark and get number of lines of a text file
in my mac, however I get

org.apache.spark.SparkException: Task not serializable error on

JavaRDD<String> logData = javaCtx.textFile(file);

Please see below for the sample of code and the stackTrace.

Any idea why this error is thrown?

Best regards,

Mina

System.out.println("Creating Spark Configuration");
SparkConf javaConf = new SparkConf();
javaConf.setAppName("My First Spark Java Application");
javaConf.setMaster("PATH to my spark");
System.out.println("Creating Spark Context");
JavaSparkContext javaCtx = new JavaSparkContext(javaConf);
System.out.println("Loading the Dataset and will further process it");
String file = "file:///file.txt";
JavaRDD<String> logData = javaCtx.textFile(file);

long numLines = logData.filter(new Function<String, Boolean>() {
   public Boolean call(String s) {
      return true;
   }
}).count();

System.out.println("Number of Lines in the Dataset "+numLines);

javaCtx.close();

Exception in thread "main" org.apache.spark.SparkException: Task not
serializable
        at 
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
        at 
org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
        at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
        at org.apache.spark.SparkContext.clean(SparkContext.scala:2094)
        at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:387)
        at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:386)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
        at org.apache.spark.rdd.RDD.filter(RDD.scala:386)
        at org.apache.spark.api.java.JavaRDD.filter(JavaRDD.scala:78)

Reply via email to