Hi Prabhu,

Be careful when going into the direction of calendar dimensions. While strictly 
speaking this is a cleaner dwh design you will for sure run into issues you 
might not expect. Consider this is probably what you would want to do (roughly) 
to query a day:

select count(*)
from fact f
  join dim_date d on (d.date_id = f.date_id)
where ddate = '2020-12-22'

That won't trigger partition pruning and the query will walk over all records 
in the fact table (I doubt that's what you would want). Pruning happens during 
the creation of the query plan and at that time it doesn't know how many 
records the dim_date table will return so can't do any partition pruning for 
you. If you would want partitioning to work in this case you would have to do:

select count(*)
from fact f
where f.dateid =7662

Which kind of defeats the purpose of the dim_date table :( At this point in 
time I would simply point the date in the fact table and use functions to get 
things like month. It's annoying but it works so:

select count(*)
from fact f
where date = '2020-12-22'

Bennie.


From: prabhu k [mailto:prabhu.h...@gmail.com]
Sent: Wednesday, July 25, 2012 1:59 PM
To: user@hive.apache.org; bejoy...@yahoo.com
Subject: Re: Loading data into data_dim table

Thanks for your help :)

it's data has been loaded fine now,

select * from dim_date;

7662    2020-12-22 00:00:00.000 2020    4       12      3       52      13      
4       357     83      22      3       December        Dec     Tuesday Tue
7663    2020-12-23 00:00:00.000 2020    4       12      3       52      13      
4       358     84      23      4       December        Dec     Wednesday       
Wed
7664    2020-12-24 00:00:00.000 2020    4       12      3       52      13      
4       359     85      24      5       December        Dec     Thursday        
Thu
7665    2020-12-25 00:00:00.000 2020    4       12      3       52      13      
4       360     86      25      6       December        Dec     Friday  Fri
7666    2020-12-26 00:00:00.000 2020    4       12      3       52      13      
4       361     87      26      7       December        Dec     Saturday        
Sat
7667    2020-12-27 00:00:00.000 2020    4       12      3       53      14      
5       362     88      27      1       December        Dec     Sunday  Sun
7668    2020-12-28 00:00:00.000 2020    4       12      3       53      14      
5       363     89      28      2       December        Dec     Monday  Mon
7669    2020-12-29 00:00:00.000 2020    4       12      3       53      14      
5       364     90      29      3       December        Dec     Tuesday Tue
7670    2020-12-30 00:00:00.000 2020    4       12      3       53      14      
5       365     91      30      4       December        Dec     Wednesday       
Wed
7671    2020-12-31 00:00:00.000 2020    4       12      3       53      14      
5       366     92      31      5       December        Dec     Thursday        
Thu
Time taken: 0.401 seconds
Thanks,
Prabhu.
On Wed, Jul 25, 2012 at 5:20 PM, Bejoy KS 
<bejoy...@yahoo.com<mailto:bejoy...@yahoo.com>> wrote:
Hi Prabhu

Your data is tab delimited use /t as the delimiter while creating table.

fields terminated by '/t'

Not sure this is the right / or not. If this doesn't work try the other one.

Regards
Bejoy KS

Sent from handheld, please excuse typos.
________________________________
From: prabhu k <prabhu.h...@gmail.com<mailto:prabhu.h...@gmail.com>>
Date: Wed, 25 Jul 2012 17:10:09 +0530
To: <user@hive.apache.org<mailto:user@hive.apache.org>>
ReplyTo: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: Loading data into data_dim table

Thanks for the reply.

I have tried the with delimited fields terminated by '|'  and delimited fields 
terminated by ','  while selecting the table both Im getting null .

when i see the HDFS file looks like below.
bin/hadoop fs -cat /user/hive/warehoure/time.txt

7666 2020-12-26 00:00:00.000    2020    4   12  3       52 13   4      361      
87 26   7       December        Dec     Saturday        Sat     20201226        
2020/12/26  Dec 26 2020 2020-12-26
7667 2020-12-27 00:00:00.000    2020    4   12  3       53 14   5      362      
88 27   1       December        Dec     Sunday  Sun     20201227        
2020/12/27     Dec 27 2020  2020-12-27
7668 2020-12-28 00:00:00.000    2020    4   12  3       53 14   5      363      
89 28   2       December        Dec     Monday  Mon     20201228        
2020/12/28     Dec 28 2020  2020-12-28
7669 2020-12-29 00:00:00.000    2020    4   12  3       53 14   5      364      
90 29   3       December        Dec     Tuesday Tue     20201229        
2020/12/29     Dec 29 2020  2020-12-29
7670 2020-12-30 00:00:00.000    2020    4   12  3       53 14   5      365      
91 30   4       December        Dec     Wednesday       Wed     20201230        
2020/12/30  Dec 30 2020 2020-12-30
7671 2020-12-31 00:00:00.000    2020    4   12  3       53 14   5      366      
92 31   5       December        Dec     Thursday        Thu     20201231        
2020/12/31  Dec 31 2020 2020-12-31

Please suggest and help me.
Thanks,
Prabhu.
On Wed, Jul 25, 2012 at 4:58 PM, Bertrand Dechoux 
<decho...@gmail.com<mailto:decho...@gmail.com>> wrote:
What Bejoy is saying implicitly, is that the format is not verified by the load 
command. If it does not match, you will get NULL.
And it would be curious that your comma separated value (csv) file is using 
pipe (|) but why not.

Bertrand

On Wed, Jul 25, 2012 at 12:45 PM, Bejoy KS 
<bejoy...@yahoo.com<mailto:bejoy...@yahoo.com>> wrote:
Hi Prabhu

Can you cat the file in hdfs and ensure that the fields are delimited by '|' 
character.

hadoop fs -text user/hive/warehouse/dim_date/time.csv
Regards
Bejoy KS

Sent from handheld, please excuse typos.
________________________________
From: prabhu k <prabhu.h...@gmail.com<mailto:prabhu.h...@gmail.com>>
Date: Wed, 25 Jul 2012 16:05:42 +0530
To: <user@hive.apache.org<mailto:user@hive.apache.org>>
ReplyTo: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Loading data into data_dim table


Hi Users,

I have created dim_date table like below. table created successfully and i then 
load the data into the dim_date table

while i am selecting the table, getting null values.my input file is time.csv 
file

hive> create table dim_date(DateId int,ddate string,Year int,Quarter 
int,Month_Number_Year int,Month_Number_Of_Quarter int,Week_Number_Of_Year 
int,Week_Number_Of_Quarter int,Week_Number_Of_Month int,Day_Number_Of_Year 
int,Day_Number_Of_Quarter int,Day_Number_Of_Month int,Day_Number_Of_Week 
int,Month_Name string,Month_Name_Abbreviation string,Day_Name 
string,Day_Name_Abbreviation string)row format delimited fields terminated by 
'|' stored as textfile;
OK
Time taken: 0.123 seconds

hive> load data local inpath '/usr/local/time.csv' overwrite into table 
dim_date;
Copying data from file:/usr/local/time.csv
Copying file: file:/usr/local/time.csv
Loading data to table default.dim_date
Deleted hdfs://md-aaa1:54310/user/hive/warehouse/dim_date
OK
Time taken: 0.234 seconds

hive> select * from dim_date;
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    
NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    
NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    
NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    
NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    
NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    
NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    
NULL    NULL    NULL    NULL    NULL    NULL    NULL
Please help me on this issue, I have missed anything wrong?

Thanks,
Prabhu.


--
Bertrand Dechoux


Reply via email to