The query looks a little bit too complex from what it is supposed to do. Can you reformulate and restrict the data in a where clause (highest restriction first). Another hint would be to use the Orc format (with indexes and optionally bloom filters) with snappy compression as well as sorting the data on the column you choose to restrict in the where part.
> On 23 Jun 2016, at 02:42, @Sanjiv Singh <sanjiv.is...@gmail.com> wrote: > > Hi All, > > I am running performance issue with below query. Its took 2-3 hours to > complete in hive. > > Try tried to partition and bucketing changes on this tables, but without luck. > > Please help me in optimizing this query. > > what schema level changes can be done ? > other parameters recommendations ? > > > Below are complete details : > > Hive Table DDL : > >> CREATE TABLE `tuning_dd_key`( >> m_d_key smallint, >> sb_gu_key bigint, >> t_ev_st_dt date, >> a_z_key int, >> c_dt date, >> e_p_dt date, >> sq_nbr int); > > Total data size : >> 250 GB > > Long running query : > >> SELECT >> sb_gu_key, m_d_key, t_ev_st_dt, >> LAG( t_ev_st_dt ) OVER ( PARTITION BY m_d_key , sb_gu_key ORDER BY >> t_ev_st_dt ) AS LAG_START_DT, >> a_z_key, >> c_dt, >> e_p_dt, >> sq_nbr, >> CASE WHEN LAG( t_ev_st_dt ) OVER ( PARTITION BY m_d_key , sb_gu_key ORDER >> BY t_ev_st_dt ) IS NULL OR a_z_key <> LAG( a_z_key , 1 , -999 ) OVER ( >> PARTITION BY m_d_key , sb_gu_key ORDER BY t_ev_st_dt ) THEN 'S' ELSE >> NULL END AS ST_FLAG >> FROM `PRDDB`.tuning_dd_key ; > > > More info : > >> number of distinct value in column m_d_key : 29 >> number of distinct value in column sb_gu_key : 15434343 > > > > > Regards > Sanjiv Singh > Mob : +091 9990-447-339