You will need to have Oracle Database 11g JDBC Driver ojdbc6.jar installed in
$SQOOP_HOME/lib. You can download it from here
<http://www.oracle.com/technetwork/apps-tech/jdbc-112010-090769.html>
The approach I prefer is to let Sqoop import it as a text file to a staging
table and then insert/select into an ORC table from the staging table.
sqoop import --connect "jdbc:oracle:thin:@rhes564:1521:mydb" --username
scratchpad -P \
--query "select * from scratchpad.dummy where \
\$CONDITIONS" \
-split-by id \
-hive-import -hive-table "test.dummy_staging" --target-dir
"/a/b/c/dummy_staging" --create-hive-table
Once the table staging is created you can then insert/select to an Orc table of
your definition and make sure that the schema is clearly defined as you wish.
For example you have to cater for date fields or columns that are varchar as
opposed to String.
Case in point
The source table schema in Oracle is
CREATE TABLE "SCRATCHPAD"."DUMMY"
( "ID" NUMBER,
"CLUSTERED" NUMBER,
"SCATTERED" NUMBER,
"RANDOMISED" NUMBER,
"RANDOM_STRING" VARCHAR2(50 BYTE),
"SMALL_VC" VARCHAR2(10 BYTE),
"PADDING" VARCHAR2(10 BYTE),
CONSTRAINT "DUMMY_PK" PRIMARY KEY ("ID")
)
The staging table dummy_staging is generated by Sqoop:
desc dummy_staging;
+----------------+------------+----------+--+
| col_name | data_type | comment |
+----------------+------------+----------+--+
| id | double | |
| clustered | double | |
| scattered | double | |
| randomised | double | |
| random_string | string | |
| small_vc | string | |
| padding | string | |
+----------------+------------+----------+--+
Your ORC table may look like:
desc dummy;
+----------------+--------------+----------+--+
| col_name | data_type | comment |
+----------------+--------------+----------+--+
| id | int | |
| clustered | int | |
| scattered | int | |
| randomised | int | |
| random_string | varchar(50) | |
| small_vc | varchar(10) | |
| padding | varchar(10) | |
+----------------+--------------+----------+--+
This also translates to Extract Load Transfer (ELT)) methodology which I prefer.
HTH
Dr Mich Talebzadeh
LinkedIn
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
Sybase ASE 15 Gold Medal Award 2008
A Winning Strategy: Running the most Critical Financial Data on ASE 15
http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15",
ISBN 978-0-9563693-0-7.
co-author "Sybase Transact SQL Guidelines Best Practices", ISBN
978-0-9759693-0-4
Publications due shortly:
Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8
Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one
out shortly
http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this message
shall not be understood as given or endorsed by Peridale Technology Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Technology Ltd, its subsidiaries nor their employees
accept any responsibility.
From: Ashok Kumar [mailto:[email protected]]
Sent: 31 January 2016 13:15
To: User <[email protected]>
Subject: Re: Importing Oracle data into Hive
Thanks,
Can sqoop create this table as ORC in Hive?
On Sunday, 31 January 2016, 13:13, Ashok Kumar <[email protected]
<mailto:[email protected]> > wrote:
Thanks.
Can sqoop create this table as ORC in Hive?
On Sunday, 31 January 2016, 13:11, Nitin Pawar <[email protected]
<mailto:[email protected]> > wrote:
check sqoop
On Sun, Jan 31, 2016 at 6:36 PM, Ashok Kumar <[email protected]
<mailto:[email protected]> > wrote:
Hi,
What is the easiest method of importing data from an Oracle 11g table to Hive
please? This will be a weekly periodic job. The source table has 20 million
rows.
I am running Hive 1.2.1
regards
--
Nitin Pawar