RE: Date and decimal datatype not working

BASAK, ANANDA Thu, 02 Apr 2015 12:38:30 -0700

Thanks all. Finally I am able to run my code successfully. It is running in 
Spark 1.2.1. I will try it on Spark 1.3 too.


The major cause of all errors I faced was that the delimiter was not correctly 
declared.
val TABLE_A = 
sc.textFile("/Myhome/SPARK/files/table_a_file.txt").map(_.split("|")).map(p => 
ROW_A(p(0), p(1), p(2), p(3), p(4), p(5), p(6)))
Now I am using following and that solved most of the issues:

val Delimeter = "\\|"
val TABLE_A = 
sc.textFile("/Myhome/SPARK/files/table_a_file.txt").map(_.split(Delimeter)).map(p
 => ROW_A(p(0), p(1), p(2), p(3), p(4), p(5), p(6)))

Thanks again. My first code ran successfully giving me some confidence, now I 
will explore more.

Regards
Ananda

From: BASAK, ANANDA
Sent: Thursday, March 26, 2015 4:55 PM
To: Dean Wampler
Cc: Yin Huai; user@spark.apache.org
Subject: RE: Date and decimal datatype not working

Thanks all. I am installing Spark 1.3 now. Thought that I should better sync 
with the daily evolution of this new technology.
So once I install that, I will try to use the Spark-CSV library.

Regards
Ananda

From: Dean Wampler [mailto:deanwamp...@gmail.com]
Sent: Wednesday, March 25, 2015 1:17 PM
To: BASAK, ANANDA
Cc: Yin Huai; user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: Date and decimal datatype not working

Recall that the input isn't actually read until to do something that forces 
evaluation, like call saveAsTextFile. You didn't show the whole stack trace 
here, but it probably occurred while parsing an input line where one of your 
long fields is actually an empty string.

Because this is such a common problem, I usually define a "parse" method that 
converts input text to the desired schema. It catches parse exceptions like 
this and reports the bad line at least. If you can return a default long in 
this case, say 0, that makes it easier to return something.

dean



Dean Wampler, Ph.D.
Author: Programming Scala, 2nd 
Edition<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe<http://typesafe.com>
@deanwampler<http://twitter.com/deanwampler>
http://polyglotprogramming.com

On Wed, Mar 25, 2015 at 11:48 AM, BASAK, ANANDA 
<ab9...@att.com<mailto:ab9...@att.com>> wrote:
Thanks. This library is only available with Spark 1.3. I am using version 
1.2.1. Before I upgrade to 1.3, I want to try what can be done in 1.2.1.

So I am using following:
val MyDataset = sqlContext.sql("my select query”)

MyDataset.map(t => 
t(0)+"|"+t(1)+"|"+t(2)+"|"+t(3)+"|"+t(4)+"|"+t(5)).saveAsTextFile("/my_destination_path")

But it is giving following error:
15/03/24 17:05:51 ERROR Executor: Exception in task 1.0 in stage 13.0 (TID 106)
java.lang.NumberFormatException: For input string: ""
        at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:453)
        at java.lang.Long.parseLong(Long.java:483)
        at 
scala.collection.immutable.StringLike$class.toLong(StringLike.scala:230)

is there something wrong with the TSTAMP field which is Long datatype?

Thanks & Regards
-----------------------
Ananda Basak

From: Yin Huai [mailto:yh...@databricks.com<mailto:yh...@databricks.com>]
Sent: Monday, March 23, 2015 8:55 PM

To: BASAK, ANANDA
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: Date and decimal datatype not working

To store to csv file, you can use 
Spark-CSV<https://github.com/databricks/spark-csv> library.

On Mon, Mar 23, 2015 at 5:35 PM, BASAK, ANANDA 
<ab9...@att.com<mailto:ab9...@att.com>> wrote:
Thanks. This worked well as per your suggestions. I had to run following:
val TABLE_A = 
sc.textFile("/Myhome/SPARK/files/table_a_file.txt").map(_.split("|")).map(p => 
ROW_A(p(0).trim.toLong, p(1), p(2).trim.toInt, p(3), BigDecimal(p(4)), 
BigDecimal(p(5)), BigDecimal(p(6))))

Now I am stuck at another step. I have run a SQL query, where I am Selecting 
from all the fields with some where clause , TSTAMP filtered with date range 
and order by TSTAMP clause. That is running fine.

Then I am trying to store the output in a CSV file. I am using 
saveAsTextFile(“filename”) function. But it is giving error. Can you please 
help me to write a proper syntax to store output in a CSV file?


Thanks & Regards
-----------------------
Ananda Basak

From: BASAK, ANANDA
Sent: Tuesday, March 17, 2015 3:08 PM
To: Yin Huai
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: RE: Date and decimal datatype not working

Ok, thanks for the suggestions. Let me try and will confirm all.

Regards
Ananda

From: Yin Huai [mailto:yh...@databricks.com<mailto:yh...@databricks.com>]
Sent: Tuesday, March 17, 2015 3:04 PM
To: BASAK, ANANDA
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: Date and decimal datatype not working

p(0) is a String. So, you need to explicitly convert it to a Long. e.g. 
p(0).trim.toLong. You also need to do it for p(2). For those BigDecimals value, 
you need to create BigDecimal objects from your String values.

On Tue, Mar 17, 2015 at 5:55 PM, BASAK, ANANDA 
<ab9...@att.com<mailto:ab9...@att.com>> wrote:
Hi All,
I am very new in Spark world. Just started some test coding from last week. I 
am using spark-1.2.1-bin-hadoop2.4 and scala coding.
I am having issues while using Date and decimal data types. Following is my 
code that I am simply running on scala prompt. I am trying to define a table 
and point that to my flat file containing raw data (pipe delimited format). 
Once that is done, I will run some SQL queries and put the output data in to 
another flat file with pipe delimited format.

*******************************************************
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.createSchemaRDD


// Define row and table
case class ROW_A(
  TSTAMP:           Long,
  USIDAN:             String,
  SECNT:                Int,
  SECT:                   String,
  BLOCK_NUM:        BigDecimal,
  BLOCK_DEN:        BigDecimal,
  BLOCK_PCT:        BigDecimal)

val TABLE_A = 
sc.textFile("/Myhome/SPARK/files/table_a_file.txt").map(_.split("|")).map(p => 
ROW_A(p(0), p(1), p(2), p(3), p(4), p(5), p(6)))

TABLE_A.registerTempTable("TABLE_A")

***************************************************

The second last command is giving error, like following:
<console>:17: error: type mismatch;
found   : String
required: Long

Looks like the content from my flat file are considered as String always and 
not as Date or decimal. How can I make Spark to take them as Date or decimal 
types?

Regards
Ananda

RE: Date and decimal datatype not working

Reply via email to