One thing to be aware is that you better convert Oracle NUMBER and NUMBER(m,n) 
columns to varchar (--> TO_CHAR()) at source as Spark  will throw overflow 
errors.

 

It is better to user TO_CHAR() in Oracle rather than writing UDF in Spark. UDFs 
in any language are slower compared to the generic functions.

 

HTH

 

Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Technology Ltd, its subsidiaries nor their employees 
accept any responsibility.

 

 

From: Gaurav Agarwal [mailto:gaurav130...@gmail.com] 
Sent: 16 February 2016 15:52
To: Mich Talebzadeh <m...@peridale.co.uk>
Cc: user <user@spark.apache.org>
Subject: Re: Stored proc with spark

 

Thanks I will try with the options

On Feb 16, 2016 9:15 PM, "Mich Talebzadeh" <m...@peridale.co.uk 
<mailto:m...@peridale.co.uk> > wrote:

You can use JDBC to oracle to get that data from a given table. What Oracle 
stored procedure does anyway? How many tables are involved?

JDBC is pretty neat. In example below I use JDBC to load two Dimension tables 
from Oracle in Spark shell and read the FACT table of 100 million rows from Hive

val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)

println ("\nStarted at"); HiveContext.sql("SELECT 
FROM_unixtime(unix_timestamp(), 'dd/MM/yyyy HH:mm:ss.ss') 
").collect.foreach(println)

//
var _ORACLEserver : String = "jdbc:oracle:thin:@rhes564:1521:mydb"
var _username : String = "sh"
var _password : String = "xx"
//

/Get the FACT table from Hive
//
var s = HiveContext.sql("SELECT AMOUNT_SOLD, TIME_ID, CHANNEL_ID FROM 
oraclehadoop.sales")

//Get Oracle tables via JDBC

val c = HiveContext.load("jdbc",
Map("url" -> _ORACLEserver,
"dbtable" -> "(SELECT to_char(CHANNEL_ID) AS CHANNEL_ID, CHANNEL_DESC FROM 
sh.channels)",
"user" -> _username,
"password" -> _password))

val t = HiveContext.load("jdbc",
Map("url" -> _ORACLEserver,
"dbtable" -> "(SELECT TIME_ID AS TIME_ID, CALENDAR_MONTH_DESC FROM sh.times)",
"user" -> _username,
"password" -> _password))

// Registar three data frames as temporary tables using registerTempTable() call

s.registerTempTable("t_s")
c.registerTempTable("t_c")
t.registerTempTable("t_t")
//
var sqltext : String = ""
sqltext = """
SELECT rs.Month, rs.SalesChannel, round(TotalSales,2)
FROM
(
SELECT t_t.CALENDAR_MONTH_DESC AS Month, t_c.CHANNEL_DESC AS SalesChannel, 
SUM(t_s.AMOUNT_SOLD) AS TotalSales
FROM t_s, t_t, t_c
WHERE t_s.TIME_ID = t_t.TIME_ID
AND   t_s.CHANNEL_ID = t_c.CHANNEL_ID
GROUP BY t_t.CALENDAR_MONTH_DESC, t_c.CHANNEL_DESC
ORDER by t_t.CALENDAR_MONTH_DESC, t_c.CHANNEL_DESC
) rs
LIMIT 10
"""
HiveContext.sql(sqltext).collect.foreach(println)
println ("\nFinished at"); HiveContext.sql("SELECT 
FROM_unixtime(unix_timestamp(), 'dd/MM/yyyy HH:mm:ss.ss') 
").collect.foreach(println)

sys.exit()

 

HTH

--

Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Cloud Technology Partners Ltd, 
its subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Cloud Technology partners Ltd, its subsidiaries nor their 
employees accept any responsibility.

On 16/02/2016 09:04, Gaurav Agarwal wrote:

Hi
Can I load the data into spark from oracle storedproc

Thanks

 

 

 

Reply via email to