Re: External partition table question

2014-07-17 Thread Satish Mittal
'ALTER TABLE .. ADD PARTITION..' would just a partition entry for the table in hive metastore. It doesn't perform any data loading, instead it expects the data to be loaded already in the file pointed to by LOCATION. On Tue, Jul 15, 2014 at 5:39 AM, Raymond Lau r...@ooyala.com wrote: I've

Re: External partition table question

2014-07-17 Thread Lefty Leverenz
Thanks for this clarification. I've revised the Add Partitions section https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AddPartitions in the wiki accordingly. -- Lefty On Fri, Jul 18, 2014 at 12:45 AM, Satish Mittal satish.mit...@inmobi.com wrote: 'ALTER

External partition table question

2014-07-14 Thread Raymond Lau
I've created an external table partitioned by a field and am attempting to load in the data via the command 'ALTER TABLE partitioned_table_test ADD PARTITION (pcode = '123') LOCATION '/path/to/parquet/files';' using a custom Parquet SerDe. Does loading in the data this way call the serializer()

Compression for a HDFS text file - Hive External Partition Table

2013-11-13 Thread Raj Hadoop
Hi ,    1)  My requirement is to load a file ( a tar.gz file which has multiple tab separated values files and one file is the main file which has huge data – about 10 GB per day) to an externally partitioned hive table.   2)  What I am doing is I have automated the process by extracting

External Partition Table

2013-10-31 Thread Raj Hadoop
Hi, I am planning for a Hive External Partition Table based on a date. Which one of the below yields a better performance or both have the same performance? 1) Partition based on one folder per day LIKE date INT 2) Partition based on one folder per year / month / day ( So it has three folders

Re: External Partition Table

2013-10-31 Thread Brad Ruderman
number of files is typically preferred but partitions will help when date restricting. Thx, Brad On Thu, Oct 31, 2013 at 3:34 PM, Raj Hadoop hadoop...@yahoo.com wrote: Hi, I am planning for a Hive External Partition Table based on a date. Which one of the below yields a better performance

Re: External Partition Table

2013-10-31 Thread Brad Ruderman
PM, Raj Hadoop hadoop...@yahoo.com wrote: Hi, I am planning for a Hive External Partition Table based on a date. Which one of the below yields a better performance or both have the same performance? 1) Partition based on one folder per day LIKE date INT 2) Partition based on one folder per

Re: External Partition Table

2013-10-31 Thread Timothy Potter
because Hive is still selecting the same number of input paths in both scenarios, one just happens to be a little deeper. Cheers, Tim On Thu, Oct 31, 2013 at 4:34 PM, Raj Hadoop hadoop...@yahoo.com wrote: Hi, I am planning for a Hive External Partition Table based on a date. Which one

Re: External Partition Table

2013-10-31 Thread Raj Hadoop
On Thu, Oct 31, 2013 at 4:34 PM, Raj Hadoop hadoop...@yahoo.com wrote: Hi, I am planning for a Hive External Partition Table based on a date. Which one of the below yields a better performance or both have the same performance? 1) Partition based on one folder per day LIKE date INT 2