There could be some other character like a space or ^M etc. You could try the following and see the actual row.
val newstream = datastream.map(row => { try{ val strArray = str.trim().split(",") (strArray(0).toInt, strArray(1).toInt) //Instead try this //*(strArray(0).trim().toInt, strArray(1).trim().toInt)* }catch{ case e: Exception => println("W000t!! Exception!! => " + e + "\n The line was :" + row); (0, 0) } }) Thanks Best Regards On Tue, Dec 16, 2014 at 3:19 AM, yu <yuz1...@iastate.edu> wrote: > > Hello, everyone > > I know 'NumberFormatException' is due to the reason that String can not be > parsed properly, but I really can not find any mistakes for my code. I hope > someone may kindly help me. > My hdfs file is as follows: > 8,22 > 3,11 > 40,10 > 49,47 > 48,29 > 24,28 > 50,30 > 33,56 > 4,20 > 30,38 > ... > > So each line contains an integer + "," + an integer + "\n" > My code is as follows: > object StreamMonitor { > def main(args: Array[String]): Unit = { > val myFunc = (str: String) => { > val strArray = str.trim().split(",") > (strArray(0).toInt, strArray(1).toInt) > } > val conf = new SparkConf().setAppName("StreamMonitor"); > val ssc = new StreamingContext(conf, Seconds(30)); > val datastream = ssc.textFileStream("/user/yu/streaminput"); > val newstream = datastream.map(myFunc) > newstream.saveAsTextFiles("output/", ""); > ssc.start() > ssc.awaitTermination() > } > > } > > The exception info is: > 14/12/15 15:35:03 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 > (TID 0, h3): java.lang.NumberFormatException: For input string: "8" > > > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > java.lang.Integer.parseInt(Integer.java:492) > java.lang.Integer.parseInt(Integer.java:527) > > scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229) > scala.collection.immutable.StringOps.toInt(StringOps.scala:31) > StreamMonitor$$anonfun$1.apply(StreamMonitor.scala:9) > StreamMonitor$$anonfun$1.apply(StreamMonitor.scala:7) > scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:984) > > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:974) > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) > org.apache.spark.scheduler.Task.run(Task.scala:54) > > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > java.lang.Thread.run(Thread.java:745) > > So based on the above info, "8" is the first number in the file and I think > it should be parsed to integer without any problems. > I know it may be a very stupid question and the answer may be very easy. > But > I really can not find the reason. I am thankful to anyone who helps! > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/NumberFormatException-tp20694.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >