There could be some other character like a space or ^M etc. You could try
the following and see the actual row.

val newstream = datastream.map(row => {
try{

val strArray = str.trim().split(",")
        (strArray(0).toInt, strArray(1).toInt)
        //Instead try this
        //*(strArray(0).trim().toInt, strArray(1).trim().toInt)*

}catch{ case e: Exception => println("W000t!! Exception!! => " + e + "\n
The line was :" + row); (0, 0) }
})


Thanks
Best Regards

On Tue, Dec 16, 2014 at 3:19 AM, yu <yuz1...@iastate.edu> wrote:
>
> Hello, everyone
>
> I know 'NumberFormatException' is due to the reason that String can not be
> parsed properly, but I really can not find any mistakes for my code. I hope
> someone may kindly help me.
> My hdfs file is as follows:
> 8,22
> 3,11
> 40,10
> 49,47
> 48,29
> 24,28
> 50,30
> 33,56
> 4,20
> 30,38
> ...
>
> So each line contains an integer + "," + an integer + "\n"
> My code is as follows:
> object StreamMonitor {
>   def main(args: Array[String]): Unit = {
>     val myFunc = (str: String) => {
>       val strArray = str.trim().split(",")
>       (strArray(0).toInt, strArray(1).toInt)
>     }
>     val conf = new SparkConf().setAppName("StreamMonitor");
>     val ssc = new StreamingContext(conf, Seconds(30));
>     val datastream = ssc.textFileStream("/user/yu/streaminput");
>     val newstream = datastream.map(myFunc)
>     newstream.saveAsTextFiles("output/", "");
>     ssc.start()
>     ssc.awaitTermination()
>   }
>
> }
>
> The exception info is:
> 14/12/15 15:35:03 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0
> (TID 0, h3): java.lang.NumberFormatException: For input string: "8"
>
>
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>         java.lang.Integer.parseInt(Integer.java:492)
>         java.lang.Integer.parseInt(Integer.java:527)
>
> scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
>         scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
>         StreamMonitor$$anonfun$1.apply(StreamMonitor.scala:9)
>         StreamMonitor$$anonfun$1.apply(StreamMonitor.scala:7)
>         scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>         scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:984)
>
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:974)
>         org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>         org.apache.spark.scheduler.Task.run(Task.scala:54)
>
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         java.lang.Thread.run(Thread.java:745)
>
> So based on the above info, "8" is the first number in the file and I think
> it should be parsed to integer without any problems.
> I know it may be a very stupid question and the answer may be very easy.
> But
> I really can not find the reason. I am thankful to anyone who helps!
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/NumberFormatException-tp20694.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to