pretty simple

--1 Move gz file or files into HDFS: Multiple files can be in that staging
directory with hdfs dfs -copyFromLocal <local_dir>/*.gz
hdfs://rhes564:9000/data/stg/
--2 Create an external table. Just one will do CREATE EXTERNAL TABLE stg_t2
... STORED AS TEXTFILE.... LOCATION '/data/stg/'
--3 Create the internal Hive table.  CREATE TABLE t2 ( .... STORED AS ORC
TBLPROPERTIES ( "orc.compress"="SNAPPY" )
--4 Insert the data from the external table to the Hive  table INSERT INTO
TABLE t2 SELECT...FROM stg_t2
--5 remove the gz files if needed once processed hdfs dfs -rm
hdfs://rhes564:9000/data/stg/*.gz

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 19 July 2016 at 12:03, Amatucci, Mario, Vodafone Group <
[email protected]> wrote:

>
>
> Hi I have huge gzip on hdfs and |I’d like to create an external table on
> top of them
>
> Any code example? Cheers
>
> Ps
>
> I cannot use snappy or lzo for some constraints
>
>
>
> --
>
> Kind regards
>
> Mario Amatucci
> CG TB PS GDC PRAGUE THINK BIG
>
>
>

Reply via email to