Never use string when you can use int - the performance will be much better - especially for tables in Orc / parquet format
> On 04 Jun 2016, at 22:31, Igor Kravzov <igork.ine...@gmail.com> wrote: > > Thanks Dudu. > So if I need actual date I will use view. > Regarding partition column: I can create 2 external tables based on the same > data with integer or string column partition and see which one is more > convenient for our use. > >> On Sat, Jun 4, 2016 at 2:20 PM, Markovitz, Dudu <dmarkov...@paypal.com> >> wrote: >> I’m not aware of an option to do what you request in the external table >> definition but you might want to that using a view. >> >> >> >> P.s. >> >> I seems to me that defining the partition column as a string would be more >> user friendly than integer, e.g. – >> >> >> >> select * from threads_test where yyyymmdd like ‘2016%’ – year 2016; >> >> select * from threads_test where yyyymmdd like ‘201603%’ –- March 2016; >> >> select * from threads_test where yyyymmdd like ‘______01’ -- first of every >> month; >> >> >> >> >> >> >> >> >> >> >> >> $ hdfs dfs -ls -R /tmp/threads_test >> >> drwxr-xr-x - cloudera supergroup 0 2016-06-04 10:45 >> /tmp/threads_test/20160604 >> >> -rw-r--r-- 1 cloudera supergroup 136 2016-06-04 10:45 >> /tmp/threads_test/20160604/data.txt >> >> >> >> $ hdfs dfs -cat /tmp/threads_test/20160604/data.txt >> >> {"url":"www.blablabla.com","pageType":"pg1","addDate":"2016-05-17T02:10:44.527","postDate":"2016-05-16T02:08:55","postText":"YadaYada"} >> >> >> >> ---------------------------------------------------------------------------------------------------- >> >> >> >> >> >> hive> add jar /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar; >> >> >> >> hive> >> >> create external table threads_test >> >> ( >> >> url string >> >> ,pagetype string >> >> ,adddate string >> >> ,postdate string >> >> ,posttext string >> >> ) >> >> partitioned by (yyyymmdd string) >> >> row format serde 'org.apache.hive.hcatalog.data.JsonSerDe' >> >> location '/tmp/threads_test' >> >> ; >> >> >> >> hive> alter table threads_test add partition (yyyymmdd=20160604) location >> '/tmp/threads_test/20160604'; >> >> >> >> hive> select * from threads_test; >> >> >> >> www.blablabla.com pg1 2016-05-17T02:10:44.527 2016-05-16T02:08:55 >> YadaYada 20160604 >> >> >> >> hive> >> >> create view threads_test_v >> >> as >> >> select url >> >> ,pagetype >> >> ,cast (concat_ws(' ',substr (adddate ,1,10),substr (adddate ,12)) >> as timestamp) as adddate >> >> ,cast (concat_ws(' ',substr (postdate,1,10),substr (postdate,12)) >> as timestamp) as postdate >> >> ,posttext >> >> >> >> from threads_test >> >> ; >> >> >> >> hive> select * from threads_test_v; >> >> >> >> www.blablabla.com pg1 2016-05-17 02:10:44.527 2016-05-16 02:08:55 >> YadaYada >> >> >> >> >> >> From: Igor Kravzov [mailto:igork.ine...@gmail.com] >> Sent: Saturday, June 04, 2016 8:13 PM >> To: user@hive.apache.org >> Subject: Convert date in string format to timestamp in table definition >> >> >> >> Hi, >> >> >> >> I have 2 dates in Json file defined like this >> >> "addDate": "2016-05-17T02:10:44.527", >> >> "postDate": "2016-05-16T02:08:55", >> >> >> >> Right now I define external table based on this file like this: >> >> CREATE external TABLE threads_test >> >> (url string, >> >> pagetype string, >> >> adddate string, >> >> postdate string, >> >> posttext string) >> >> partitioned by (yyyymmdd int) >> >> ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' >> >> location 'my location'; >> >> >> >> is it possible to define these 2 dates as timestamp? >> >> Do I need to change date format in the file? is it possible to specify date >> format in table definition? >> >> Or I better off with string? >> >> >> >> Thanks in advance. >> >