Thank you for the responses On Sunday, August 21, 2016, Mich Talebzadeh <mich.talebza...@gmail.com> wrote:
> Hi Rahul, > > I don't believe you can drop a particular bucket in Hive > > HTH > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 20 August 2016 at 23:53, Rahul Channe <drah...@googlemail.com > <javascript:_e(%7B%7D,'cvml','drah...@googlemail.com');>> wrote: > >> Hi Mich, >> >> I want to know If we can drop data of particular bucket in hive >> >> On Friday, August 19, 2016, Mich Talebzadeh <mich.talebza...@gmail.com >> <javascript:_e(%7B%7D,'cvml','mich.talebza...@gmail.com');>> wrote: >> >>> Hash partitioning (Bucketing) does not make much sense for YYYY/MM/DD/32 >>> as pointed out. >>> >>> So it is clear that with (mod 32), the maximum number of offsets is >>> going to be 32, i.e. in the range between 0-31. With YYYY/MM/DD you have to >>> account for hash collisions as well. The set of inputs is potentially many >>> (definitely not known until we encounter them all) and if you want to >>> spread them evenly (after all that is what hash partitioning is all about) >>> then I think day of the month makes more sense. >>> >>> HTH >>> >>> >>> >>> Dr Mich Talebzadeh >>> >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> On 19 August 2016 at 23:15, Gopal Vijayaraghavan <gop...@apache.org> >>> wrote: >>> >>>> >>>> > We are bucketing by date so we wil have max 32 buckets >>>> >>>> If you do want to lookup specifically by date, you could just create day >>>> partitions and never partition by month. >>>> >>>> FYI, in a modern version of Hive >>>> >>>> select count(1) from table where YEAR(dt) = 2016 and MONTH(dt) = 12 >>>> >>>> does prune it on the client side. >>>> >>>> On a different note, 31 buckets is a bad idea (32 is ok), because for >>>> String hashes (32-1) is the magic number which hurts "yyyymmdd" and 50% >>>> of >>>> your buckets have 0 data. >>>> >>>> http://www.slideshare.net/t3rmin4t0r/data-organization-hive-meetup/6 >>>> >>>> >>>> Use that as a number and you'll get the same number back as the >>>> hashcode, >>>> so it won't be stable as months change (20160816 % 32 == 16 and >>>> 20160716 % >>>> 32 == 12). >>>> >>>> The only way to have buckets correspond to a day_of_month as an int and >>>> bucket on it with 32 - then bucket0 == 31, bucket1=1, bucket2=2 etc. >>>> >>>> Cheers, >>>> Gopal >>>> >>>> >>>> >>> >