Hi,
I'm not a master on SparkSQL, but from what I understand, the problem ıs that
you're trying to access an RDD
inside an RDD here: val xyz = file.map(line => ***
extractCurRate(sqlContext.sql("select rate ... *** and
here: xyz = file.map(line => *** extractCurRate(sqlContext.sql("select rate
... ***.
RDDs can't be serialized inside other RDD tasks, therefore you're receiving the
NullPointerException.
More specifically, you are trying to generate a SchemaRDD inside an RDD, which
you can't do.
If file isn't huge, you can call .collect() to transform the RDD to an array
and then use .map() on the Array.
If the file is huge, then you may do number 3 first, join the two RDDs using
'txCurCode' as a key, and then do filtering
operations, etc...
Best,
Burak
----- Original Message -----
From: "rkishore999" <[email protected]>
To: [email protected]
Sent: Saturday, September 13, 2014 10:29:26 PM
Subject: Spark SQL
val file =
sc.textFile("hdfs://ec2-54-164-243-97.compute-1.amazonaws.com:9010/user/fin/events.txt")
1. val xyz = file.map(line => extractCurRate(sqlContext.sql("select rate
from CurrencyCodeRates where txCurCode = '" + line.substring(202,205) + "'
and fxCurCode = '" + fxCurCodesMap(line.substring(77,82)) + "' and
effectiveDate >= '" + line.substring(221,229) + "' order by effectiveDate
desc"))
2. val xyz = file.map(line => sqlContext.sql("select rate, txCurCode,
fxCurCode, effectiveDate from CurrencyCodeRates where txCurCode = 'USD' and
fxCurCode = 'CSD' and effectiveDate >= '20140901' order by effectiveDate
desc"))
3. val xyz = sqlContext.sql("select rate, txCurCode, fxCurCode,
effectiveDate from CurrencyCodeRates where txCurCode = 'USD' and fxCurCode =
'CSD' and effectiveDate >= '20140901' order by effectiveDate desc")
xyz.saveAsTextFile("/user/output")
In statements 1 and 2 I'm getting nullpointer expecption. But statement 3 is
good. I'm guessing spark context and sql context are not going together
well.
Any suggestions regarding how I can achieve this?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-tp14183.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]