As per my understanding, its not file extensions as compressed files can be
renamed to anything without extensions.

First check, is file compressed if no then directly proceed to read else if
yes then find out the compression codec and use it. you can see by running
a file command on any compressed file on linux and it does tell all the
detail.
I am really not sure what happens when the compression codecs are not
available.
May be someone from mapred or hdfs dev forum can tell in detail how this is
handled.


On Mon, Aug 19, 2013 at 1:36 PM, w00t w00t <w00...@yahoo.de> wrote:

> My scenario is a bit different - I am using external tables.
>
> So I uploaded some lzo compressed files into HDFS, generated the lzo-index
> files and finally I created the external table without the specific storage
> as clause .
> A SELECT statement on the table still works.
>
> Does it work transparently? So, Hadoop sees the lzo extension of my files
> and knows how to decompress it?
>
>
>
>   ------------------------------
>  *Von:* Nitin Pawar <nitinpawar...@gmail.com>
> *An:* "user@hive.apache.org" <user@hive.apache.org>
> *Gesendet:* 19:54 Mittwoch, 14.August 2013
>
> *Betreff:* Re: Hive and Lzo Compression
>
> Please correct me if I understood the question correctly
>
> You created a table def without mentioning a stored as clause
> then you load data into table from a compressed a file
> then do a select query and it still works
> but how did it figured out which compression codec to use?
>
> Am I stating it correctly ?
>
>
>
> On Wed, Aug 14, 2013 at 11:11 PM, Sanjay Subramanian <
> sanjay.subraman...@wizecommerce.com> wrote:
>
>  That is really interesting…let me try and think of a reason…meanwhile
> any other LZO Hive Samurais out there ? Please help with some guidance
>
>  sanjay
>
>   From: w00t w00t <w00...@yahoo.de>
> Reply-To: "user@hive.apache.org" <user@hive.apache.org>, w00t w00t <
> w00...@yahoo.de>
> Date: Wednesday, August 14, 2013 1:15 AM
>
> To: "user@hive.apache.org" <user@hive.apache.org>
> Subject: Re: Hive and Lzo Compression
>
>
>  Thanks for your reply.
>
>  The interesting thing I experience is that the SELECT query still works
> - even when I do not specify the STORED AS clause... that puzzles me a bit.
>
>   ------------------------------
> *Von:* Sanjay Subramanian <sanjay.subraman...@wizecommerce.com>
> *An:* "user@hive.apache.org" <user@hive.apache.org>; w00t w00t <
> w00...@yahoo.de>
> *Gesendet:* 3:44 Mittwoch, 14.August 2013
> *Betreff:* Re: Hive and Lzo Compression
>
>  Hi
>
>  I think the CREATE TABLE without the STORED AS clause will not give any
> errors while creating the table.
> However when you query that table and since that table contains .lzo files
> , you would  get errors.
> With external tables , u r separating the table creation(definition) from
> the data. So only at the time of query of that table, hive might report
> errors.
>
>  LZO compression rocks ! I am so glad I used it in our projects here.
>
>  Regards
>
>  sanjay
>
>   From: w00t w00t <w00...@yahoo.de>
> Reply-To: "user@hive.apache.org" <user@hive.apache.org>, w00t w00t <
> w00...@yahoo.de>
> Date: Tuesday, August 13, 2013 12:13 AM
> To: "user@hive.apache.org" <user@hive.apache.org>
> Subject: Re: Hive and Lzo Compression
>
>   Thanks for your replies and the link.
>
>  I could get it working, but wondered why the CREATE TABLE statement
> worked without the STORED AS Clause as well...that's what puzzles me a
> bit...
>
>  But I will use the STORED AS Clause to be on the safe side.
>
>
>   ------------------------------
> *Von:* Lefty Leverenz <leftylever...@gmail.com>
> *An:* user@hive.apache.org
> *CC:* w00t w00t <w00...@yahoo.de>
> *Gesendet:* 19:06 Samstag, 10.August 2013
> *Betreff:* Re: Hive and Lzo Compression
>
>  I'm not seeing any documentation link in Sanjay's message, so here it is
> again (in the Hive wiki's language manual):
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO.
>
>
> On Thu, Aug 8, 2013 at 3:30 PM, Sanjay Subramanian <
> sanjay.subraman...@wizecommerce.com> wrote:
>
>  Please refer this documentation here
> Let me know if u need more clarifications so that we can make this
> document better and complete
>
>  Thanks
>
>  sanjay
>
>   From: w00t w00t <w00...@yahoo.de>
> Reply-To: "user@hive.apache.org" <user@hive.apache.org>, w00t w00t <
> w00...@yahoo.de>
> Date: Thursday, August 8, 2013 2:02 AM
> To: "user@hive.apache.org" <user@hive.apache.org>
> Subject: Hive and Lzo Compression
>
>
>    Hello,
>
> I am started to run Hive with Lzo compression on Hortonworks 1.2
>
> I have managed to install/configure Lzo and  hive -e "set
> io.compression.codecs" shows me the Lzo Codecs:
> io.compression.codecs=
> org.apache.hadoop.io.compress.GzipCodec,
> org.apache.hadoop.io.compress.DefaultCodec,
> com.hadoop.compression.lzo.LzoCodec,
> com.hadoop.compression.lzo.LzopCodec,
> org.apache.hadoop.io.compress.BZip2Codec
>
> However, I have some questions where I would be happy if you could help me.
>
> (1) CREATE TABLE statement
>
>  I read in different postings, that in the CREATE TABLE statement, I have
> to use the following STORAGE clause:
>
>  CREATE EXTERNAL TABLE txt_table_lzo (
>     txt_line STRING
>  )
>  ROW FORMAT DELIMITED FIELDS TERMINATED BY '||||'
>  STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
>  LOCATION '/user/myuser/data/in/lzo_compressed';
>
>  It works withouth any problems now to execute SELECT statements on this
> table with Lzo data.
>
>  However I also created a table on the same data without this STORAGE
> clause:
>
>  CREATE EXTERNAL TABLE txt_table_lzo_tst (
>     txt_line STRING
>  )
>  ROW FORMAT DELIMITED FIELDS TERMINATED BY '||||'
>  LOCATION '/user/myuser/data/in/lzo_compressed';
>
>  The interesting thing is, it works as well, when I execute a SELECT
> statement and this table.
>
>  Can you help, why the second CREATE TABLE statement works as well?
>  What should I use in DDLs?
>  Is it best practice to use the STORED AS clause with a
> "deprecatedLzoTextInputFormat"? Or should I remove it?
>
>
> (2) Output and Intermediate Compression Settings
>
>  I want to use output compression .
>
>  In "Programming Hive" from Capriolo, Wampler, Rutherglen the following
> commands are recommended:
>  SET hive.exec.compress.output=true;
>  SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
>
>           However, in some other places in forums, I found the following
> recommended settings:
>  SET hive.exec.compress.output=true
>  SET mapreduce.output.fileoutputformat.compress=true
>  SET
> mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec
>
>  Am I right, that the first settings are for Hadoop versions prior 0.23?
>  Or is there any other reason why the settings are different?
>
>  I am using Hadoop 1.1.2 with Hive 0.10.0.
>  Which settings would you recommend to use?
>
>  --------------
>           I also want to compress intermediate results.
>
>          Again, in  "Programming Hive" the following settings are
> recommended:
>          SET hive.exec.compress.intermediate=true;
>          SET
> mapred.map.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
>
>           Is this the right setting?
>
>           Or should I again use the settings (which look more valid for
> Hadoop 0.23 and greater)?:
>           SET hive.exec.compress.intermediate=true;
>           SET
> mapreduce.map.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
>
> Thanks
>
>
>
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>
>
>
>
>  -- Lefty
>
>
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>
>
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>
>
>
>
> --
> Nitin Pawar
>
>
>


-- 
Nitin Pawar

Reply via email to