Hello
I try to sqoop import from mysql and compress the stored files on hive -->
HDFS but I do not succeded :(
Any idea ?
SW version Sqoop 1.4.5.2.2.0.0-2041 on HDP 2.2
Phil
thanks
---------------------------------------------------------------------------------------------------------------------------------------------------------------
The command
sqoop import -Dmapreduce.output.fileoutputformat.compress=true
-Dmapreduce.output.fileoutputformat.compress.type=BLOCK
-Dmapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec
--verbose --connect jdbc:mysql://xxxxx/my_db --username sqoop --password sqoop
--table indicators --hcatalog-database omy_db --hcatalog-table indicators
--hcatalog-storage-stanza "STORED AS ORC TBLPROPERTIES ('orc.compress'='ZLIB')"
-m 4 --create-hcatalog-table
The files stay uncompressed on the HDFS :
hadoop fs -ls /apps/hive/warehouse/my_db.db/indicators
Found 4 items
-rw-r--r-- 3 hive hdfs 1032 2015-04-22 09:09
/apps/hive/warehouse/my_db.db/indicators/part-m-00000
-rw-r--r-- 3 hive hdfs 848 2015-04-22 09:09
/apps/hive/warehouse/mydb.db/indicators/part-m-00001
-rw-r--r-- 3 hive hdfs 1192 2015-04-22 09:09
/apps/hive/warehouse/my_db.db/indicators/part-m-00002
-rw-r--r-- 3 hive hdfs 999 2015-04-22 09:09
/apps/hive/warehouse/my_db.db/indicators/part-m-00003
I have checked carefully .. they are not compressed ..:-(
Hive desc seems OK for zlib compression -->
hive> desc formatted indicators;
# col_name data_type comment
id_indicators bigint
cd_indicators varchar(32)
lb_indicators varchar(64)
descrip_indicators varchar(255)
# Detailed Table Information
Database: my_db
Owner: hive
CreateTime: Wed Apr 22 09:08:22 UTC 2015
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location:
hdfs://xxxxx:8020/apps/hive/warehouse/my_db.db/indicators
Table Type: MANAGED_TABLE
Table Parameters:
orc.compress ZLIB
transient_lastDdlTime 1429693702
# Storage Information
SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.742 seconds, Fetched: 30 row(s)
[Description : Description : Description :
cid:[email protected]]<http://www.orange.com/>
Philippe Gibert
Ingénieur R&D
FT/IMT/OLPS/BIZZ/INFSVC/EMB
tél. +33 4 92 94 53 70
mob. +33 6 73 41 11 18
[email protected]
_________________________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou
falsifie. Merci.
This message and its attachments may contain confidential or privileged
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been
modified, changed or falsified.
Thank you.