pretty simple --1 Move gz file or files into HDFS: Multiple files can be in that staging directory with hdfs dfs -copyFromLocal <local_dir>/*.gz hdfs://rhes564:9000/data/stg/ --2 Create an external table. Just one will do CREATE EXTERNAL TABLE stg_t2 ... STORED AS TEXTFILE.... LOCATION '/data/stg/' --3 Create the internal Hive table. CREATE TABLE t2 ( .... STORED AS ORC TBLPROPERTIES ( "orc.compress"="SNAPPY" ) --4 Insert the data from the external table to the Hive table INSERT INTO TABLE t2 SELECT...FROM stg_t2 --5 remove the gz files if needed once processed hdfs dfs -rm hdfs://rhes564:9000/data/stg/*.gz
HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On 19 July 2016 at 12:03, Amatucci, Mario, Vodafone Group < [email protected]> wrote: > > > Hi I have huge gzip on hdfs and |I’d like to create an external table on > top of them > > Any code example? Cheers > > Ps > > I cannot use snappy or lzo for some constraints > > > > -- > > Kind regards > > Mario Amatucci > CG TB PS GDC PRAGUE THINK BIG > > >
