Hi All, I have the following skewed table "addresses_1"
select id, count(*) c from addresses_1 group by id order by c desc limit 10; 142624653 1554806 198477395 958492 102641838 220181 138947865 211331 156483436 193429 96411677 179771 210082076 168033 800174765 152421 139116901 141207 704352025 137263 I was able to create the following table with the skew information. And I was able to load the data into the table as well. CREATE TABLE skew_addresses_1( id bigint, address_id bigint, address_lines string, city string, state string, postal_code string, country string, latitude string, longitude string, ) PARTITIONED BY (dateTS string) SKEWED BY (id) ON (142624653, 198477395, 102641838, 138947865, 156483436, 96411677, 210082076, 800174765, 139116901, 704352025) stored as rcfile; select id,count(*) c from skew_addresses_1 where id=142624653 group by id order by c limit 10; *However, at the time of running select query, entire dataset is scanned. *I thought only the relevant dataset (with skew information will be scanned). Am I missing anything here? Any help will be appreciated. I am using Hive 10.x I have enabled hive.optimize.skewjoin.compiletime=true and I can see the skew information populated in SKEWED_COL_NAMES in metadata. But there is no information in SKEWED_COL_VALUE_LOC_MAP table. -- ~Rajesh.B