Hi Yu, Try this : val data = csv.map( line => line.split(",").map(elem => elem.trim)) //lines in rows
data.map( rec => (rec(0).toInt, rec(1).toInt)) to convert into integer. On 16 December 2014 at 10:49, yu [via Apache Spark User List] < ml-node+s1001560n20694...@n3.nabble.com> wrote: > > Hello, everyone > > I know 'NumberFormatException' is due to the reason that String can not be > parsed properly, but I really can not find any mistakes for my code. I hope > someone may kindly help me. > My hdfs file is as follows: > 8,22 > 3,11 > 40,10 > 49,47 > 48,29 > 24,28 > 50,30 > 33,56 > 4,20 > 30,38 > ... > > So each line contains an integer + "," + an integer + "\n" > My code is as follows: > object StreamMonitor { > def main(args: Array[String]): Unit = { > val myFunc = (str: String) => { > val strArray = str.trim().split(",") > (strArray(0).toInt, strArray(1).toInt) > } > val conf = new SparkConf().setAppName("StreamMonitor"); > val ssc = new StreamingContext(conf, Seconds(30)); > val datastream = ssc.textFileStream("/user/yu/streaminput"); > val newstream = datastream.map(myFunc) > newstream.saveAsTextFiles("output/", ""); > ssc.start() > ssc.awaitTermination() > } > > } > > The exception info is: > 14/12/15 15:35:03 WARN scheduler.TaskSetManager: Lost task 0.0 in stage > 0.0 (TID 0, h3): java.lang.NumberFormatException: For input string: "8" > > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > > java.lang.Integer.parseInt(Integer.java:492) > java.lang.Integer.parseInt(Integer.java:527) > > scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229) > scala.collection.immutable.StringOps.toInt(StringOps.scala:31) > StreamMonitor$$anonfun$1.apply(StreamMonitor.scala:9) > StreamMonitor$$anonfun$1.apply(StreamMonitor.scala:7) > scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:984) > > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:974) > > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) > org.apache.spark.scheduler.Task.run(Task.scala:54) > > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > java.lang.Thread.run(Thread.java:745) > > So based on the above info, "8" is the first number in the file and I > think it should be parsed to integer without any problems. > I know it may be a very stupid question and the answer may be very easy. > But I really can not find the reason. I am thankful to anyone who helps! > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > > http://apache-spark-user-list.1001560.n3.nabble.com/NumberFormatException-tp20694.html > To start a new topic under Apache Spark User List, email > ml-node+s1001560n1...@n3.nabble.com > To unsubscribe from Apache Spark User List, click here > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=aG5haGFrQHd5bnlhcmRncm91cC5jb218MXwtMTgxOTE5MTkyOQ==> > . > NAML > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- Regards, Harihar Nahak BigData Developer Wynyard Email:hna...@wynyardgroup.com | Extn: 8019 ----- --Harihar -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/NumberFormatException-tp20694p20696.html Sent from the Apache Spark User List mailing list archive at Nabble.com.