Using Spark shell : scala> import scala.collection.mutable.MutableList import scala.collection.mutable.MutableList
scala> val lst = MutableList[(String,String,Double)]() lst: scala.collection.mutable.MutableList[(String, String, Double)] = MutableList() scala> Range(0,10000).foreach(i=>lst+=(("10","10",i:Double))) scala> val rdd=sc.makeRDD(lst).map(i=> if(a==10) 1 else 0) <console>:27: error: not found: value a val rdd=sc.makeRDD(lst).map(i=> if(a==10) 1 else 0) ^ scala> val rdd=sc.makeRDD(lst).map(i=> if(i._1==10) 1 else 0) rdd: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[1] at map at <console>:27 scala> rdd.count() ... 15/08/30 06:53:40 INFO DAGScheduler: Job 0 finished: count at <console>:30, took 0.478350 s res1: Long = 10000 Ashish: Please refine your example to mimic more closely what your code actually did. Thanks On Sun, Aug 30, 2015 at 12:24 AM, Sean Owen <so...@cloudera.com> wrote: > That can't cause any error, since there is no action in your first > snippet. Even calling count on the result doesn't cause an error. You > must be executing something different. > > On Sun, Aug 30, 2015 at 4:21 AM, ashrowty <ashish.shro...@gmail.com> > wrote: > > I am running the Spark shell (1.2.1) in local mode and I have a simple > > RDD[(String,String,Double)] with about 10,000 objects in it. I get a > > StackOverFlowError each time I try to run the following code (the code > > itself is just representative of other logic where I need to pass in a > > variable). I tried broadcasting the variable too, but no luck .. missing > > something basic here - > > > > val rdd = sc.makeRDD(List(<Data read from file>) > > val a=10 > > rdd.map(r => if (a==10) 1 else 0) > > This throws - > > > > java.lang.StackOverflowError > > at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:318) > > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1133) > > at > > > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) > > at > > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) > > at > > > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > > at > > > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) > > at > > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) > > at > > > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > > ... > > ... > > > > More experiments .. this works - > > > > val lst = Range(0,10000).map(i=>("10","10",i:Double)).toList > > sc.makeRDD(lst).map(i=> if(a==10) 1 else 0) > > > > But below doesn't and throws the StackoverflowError - > > > > val lst = MutableList[(String,String,Double)]() > > Range(0,10000).foreach(i=>lst+=(("10","10",i:Double))) > > sc.makeRDD(lst).map(i=> if(a==10) 1 else 0) > > > > Any help appreciated! > > > > Thanks, > > Ashish > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-shell-and-StackOverFlowError-tp24508.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >