Thanks all. Finally I am able to run my code successfully. It is running in Spark 1.2.1. I will try it on Spark 1.3 too.
The major cause of all errors I faced was that the delimiter was not correctly declared. val TABLE_A = sc.textFile("/Myhome/SPARK/files/table_a_file.txt").map(_.split("|")).map(p => ROW_A(p(0), p(1), p(2), p(3), p(4), p(5), p(6))) Now I am using following and that solved most of the issues: val Delimeter = "\\|" val TABLE_A = sc.textFile("/Myhome/SPARK/files/table_a_file.txt").map(_.split(Delimeter)).map(p => ROW_A(p(0), p(1), p(2), p(3), p(4), p(5), p(6))) Thanks again. My first code ran successfully giving me some confidence, now I will explore more. Regards Ananda From: BASAK, ANANDA Sent: Thursday, March 26, 2015 4:55 PM To: Dean Wampler Cc: Yin Huai; user@spark.apache.org Subject: RE: Date and decimal datatype not working Thanks all. I am installing Spark 1.3 now. Thought that I should better sync with the daily evolution of this new technology. So once I install that, I will try to use the Spark-CSV library. Regards Ananda From: Dean Wampler [mailto:deanwamp...@gmail.com] Sent: Wednesday, March 25, 2015 1:17 PM To: BASAK, ANANDA Cc: Yin Huai; user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: Date and decimal datatype not working Recall that the input isn't actually read until to do something that forces evaluation, like call saveAsTextFile. You didn't show the whole stack trace here, but it probably occurred while parsing an input line where one of your long fields is actually an empty string. Because this is such a common problem, I usually define a "parse" method that converts input text to the desired schema. It catches parse exceptions like this and reports the bad line at least. If you can return a default long in this case, say 0, that makes it easier to return something. dean Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly) Typesafe<http://typesafe.com> @deanwampler<http://twitter.com/deanwampler> http://polyglotprogramming.com On Wed, Mar 25, 2015 at 11:48 AM, BASAK, ANANDA <ab9...@att.com<mailto:ab9...@att.com>> wrote: Thanks. This library is only available with Spark 1.3. I am using version 1.2.1. Before I upgrade to 1.3, I want to try what can be done in 1.2.1. So I am using following: val MyDataset = sqlContext.sql("my select query”) MyDataset.map(t => t(0)+"|"+t(1)+"|"+t(2)+"|"+t(3)+"|"+t(4)+"|"+t(5)).saveAsTextFile("/my_destination_path") But it is giving following error: 15/03/24 17:05:51 ERROR Executor: Exception in task 1.0 in stage 13.0 (TID 106) java.lang.NumberFormatException: For input string: "" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:453) at java.lang.Long.parseLong(Long.java:483) at scala.collection.immutable.StringLike$class.toLong(StringLike.scala:230) is there something wrong with the TSTAMP field which is Long datatype? Thanks & Regards ----------------------- Ananda Basak From: Yin Huai [mailto:yh...@databricks.com<mailto:yh...@databricks.com>] Sent: Monday, March 23, 2015 8:55 PM To: BASAK, ANANDA Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: Date and decimal datatype not working To store to csv file, you can use Spark-CSV<https://github.com/databricks/spark-csv> library. On Mon, Mar 23, 2015 at 5:35 PM, BASAK, ANANDA <ab9...@att.com<mailto:ab9...@att.com>> wrote: Thanks. This worked well as per your suggestions. I had to run following: val TABLE_A = sc.textFile("/Myhome/SPARK/files/table_a_file.txt").map(_.split("|")).map(p => ROW_A(p(0).trim.toLong, p(1), p(2).trim.toInt, p(3), BigDecimal(p(4)), BigDecimal(p(5)), BigDecimal(p(6)))) Now I am stuck at another step. I have run a SQL query, where I am Selecting from all the fields with some where clause , TSTAMP filtered with date range and order by TSTAMP clause. That is running fine. Then I am trying to store the output in a CSV file. I am using saveAsTextFile(“filename”) function. But it is giving error. Can you please help me to write a proper syntax to store output in a CSV file? Thanks & Regards ----------------------- Ananda Basak From: BASAK, ANANDA Sent: Tuesday, March 17, 2015 3:08 PM To: Yin Huai Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: RE: Date and decimal datatype not working Ok, thanks for the suggestions. Let me try and will confirm all. Regards Ananda From: Yin Huai [mailto:yh...@databricks.com<mailto:yh...@databricks.com>] Sent: Tuesday, March 17, 2015 3:04 PM To: BASAK, ANANDA Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: Date and decimal datatype not working p(0) is a String. So, you need to explicitly convert it to a Long. e.g. p(0).trim.toLong. You also need to do it for p(2). For those BigDecimals value, you need to create BigDecimal objects from your String values. On Tue, Mar 17, 2015 at 5:55 PM, BASAK, ANANDA <ab9...@att.com<mailto:ab9...@att.com>> wrote: Hi All, I am very new in Spark world. Just started some test coding from last week. I am using spark-1.2.1-bin-hadoop2.4 and scala coding. I am having issues while using Date and decimal data types. Following is my code that I am simply running on scala prompt. I am trying to define a table and point that to my flat file containing raw data (pipe delimited format). Once that is done, I will run some SQL queries and put the output data in to another flat file with pipe delimited format. ******************************************************* val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.createSchemaRDD // Define row and table case class ROW_A( TSTAMP: Long, USIDAN: String, SECNT: Int, SECT: String, BLOCK_NUM: BigDecimal, BLOCK_DEN: BigDecimal, BLOCK_PCT: BigDecimal) val TABLE_A = sc.textFile("/Myhome/SPARK/files/table_a_file.txt").map(_.split("|")).map(p => ROW_A(p(0), p(1), p(2), p(3), p(4), p(5), p(6))) TABLE_A.registerTempTable("TABLE_A") *************************************************** The second last command is giving error, like following: <console>:17: error: type mismatch; found : String required: Long Looks like the content from my flat file are considered as String always and not as Date or decimal. How can I make Spark to take them as Date or decimal types? Regards Ananda