To store to csv file, you can use Spark-CSV <https://github.com/databricks/spark-csv> library.
On Mon, Mar 23, 2015 at 5:35 PM, BASAK, ANANDA <ab9...@att.com> wrote: > Thanks. This worked well as per your suggestions. I had to run following: > > val TABLE_A = > sc.textFile("/Myhome/SPARK/files/table_a_file.txt").map(_.split("|")).map(p > => ROW_A(p(0).trim.toLong, p(1), p(2).trim.toInt, p(3), BigDecimal(p(4)), > BigDecimal(p(5)), BigDecimal(p(6)))) > > > > Now I am stuck at another step. I have run a SQL query, where I am > Selecting from all the fields with some where clause , TSTAMP filtered with > date range and order by TSTAMP clause. That is running fine. > > > > Then I am trying to store the output in a CSV file. I am using > saveAsTextFile(“filename”) function. But it is giving error. Can you please > help me to write a proper syntax to store output in a CSV file? > > > > > > Thanks & Regards > > ----------------------- > > Ananda Basak > > Ph: 425-213-7092 > > > > *From:* BASAK, ANANDA > *Sent:* Tuesday, March 17, 2015 3:08 PM > *To:* Yin Huai > *Cc:* user@spark.apache.org > *Subject:* RE: Date and decimal datatype not working > > > > Ok, thanks for the suggestions. Let me try and will confirm all. > > > > Regards > > Ananda > > > > *From:* Yin Huai [mailto:yh...@databricks.com] > *Sent:* Tuesday, March 17, 2015 3:04 PM > *To:* BASAK, ANANDA > *Cc:* user@spark.apache.org > *Subject:* Re: Date and decimal datatype not working > > > > p(0) is a String. So, you need to explicitly convert it to a Long. e.g. > p(0).trim.toLong. You also need to do it for p(2). For those BigDecimals > value, you need to create BigDecimal objects from your String values. > > > > On Tue, Mar 17, 2015 at 5:55 PM, BASAK, ANANDA <ab9...@att.com> wrote: > > Hi All, > > I am very new in Spark world. Just started some test coding from last > week. I am using spark-1.2.1-bin-hadoop2.4 and scala coding. > > I am having issues while using Date and decimal data types. Following is > my code that I am simply running on scala prompt. I am trying to define a > table and point that to my flat file containing raw data (pipe delimited > format). Once that is done, I will run some SQL queries and put the output > data in to another flat file with pipe delimited format. > > > > ******************************************************* > > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > > import sqlContext.createSchemaRDD > > > > > > // Define row and table > > case class ROW_A( > > TSTAMP: Long, > > USIDAN: String, > > SECNT: Int, > > SECT: String, > > BLOCK_NUM: BigDecimal, > > BLOCK_DEN: BigDecimal, > > BLOCK_PCT: BigDecimal) > > > > val TABLE_A = > sc.textFile("/Myhome/SPARK/files/table_a_file.txt").map(_.split("|")).map(p > => ROW_A(p(0), p(1), p(2), p(3), p(4), p(5), p(6))) > > > > TABLE_A.registerTempTable("TABLE_A") > > > > *************************************************** > > > > The second last command is giving error, like following: > > <console>:17: error: type mismatch; > > found : String > > required: Long > > > > Looks like the content from my flat file are considered as String always > and not as Date or decimal. How can I make Spark to take them as Date or > decimal types? > > > > Regards > > Ananda > > >