@Puneet Khatod : I found that out. And that's why I am asking here. I guess non AWS users might have the same problems and a way to solve it.
@Ruslan Al-fakikh : It seems great. Is there any documentation for msck? I will find out with the diff file but is there a wiki page or a blog post about it? It would be best. I could not find any. @Edward Capriolo : I now feel silly. This is clearly a better approach that my proposed hacks. The performance impact should be negligible, even more when ensuring partition pruning. I am using hive to 'piggy back' on an external way of writing data. So in my case, I could indeed tell in advance to hive where the data will be written. (Same as you say but the logic is reverse.) I guess I skipped over alter table touch. But it would not help me. The partitions are external. And if I add partitions, I will do it with cron and a shell file. Bertrand On Tue, Jul 24, 2012 at 7:24 PM, Edward Capriolo <[email protected]>wrote: > Alter table touch will create partitions even if they have no data, > You can also just create partitions ahead of time and have your code > "know" where to write data. > > > On Tue, Jul 24, 2012 at 12:35 PM, Ruslan Al-fakikh > <[email protected]> wrote: > > If you are not using Amazon take a look at this: > > > > https://issues.apache.org/jira/browse/HIVE-874 > > > > > > > > Ruslan > > > > > > > > From: Puneet Khatod [mailto:[email protected]] > > Sent: Tuesday, July 24, 2012 8:32 PM > > To: [email protected] > > Subject: RE: Continuous log analysis requires 'dynamic' partitions, is > that > > possible? > > > > > > > > If you are using Amazon (AWS), you can use ‘recover partitions’ to enable > > all top level partitions. > > > > This will add required dynamicity. > > > > > > > > Regards, > > > > Puneet Khatod > > > > > > > > From: Bertrand Dechoux [mailto:[email protected]] > > Sent: 24 July 2012 21:15 > > To: [email protected] > > Subject: Continuous log analysis requires 'dynamic' partitions, is that > > possible? > > > > > > > > Hi, > > > > Let's say logs are stored inside hdfs using the following file tree > > /<logtype>/<month>/<day>. > > So for apache, that would be : > > /apache/01/01 > > /apache/01/02 > > ... > > /apache/02/01 > > ... > > > > I would like to know how to define a table for this information. I found > out > > that the table should be external and should be using partitions. > > However, I did not found any way to dynamically create the partitions. Is > > there no automatic way to define them? > > In that case, the partition 'template' would be <month>/<day> with the > root > > being apache. > > > > I know how to 'hack a fix' : create a script which would generate all the > > "add partition statement" and run the resulting statements without caring > > about the results because partitions may not exist or may already have > been > > added. Better, I could parse the result of 'show partition' for the table > > and run only the relevant statement but it still feels like a hack. > > > > Is there any clean way to do it? > > > > Regards, > > > > Bertrand Dechoux > > > > Any comments or statements made in this email are not necessarily those > of > > Tavant Technologies. > > The information transmitted is intended only for the person or entity to > > which it is addressed and may > > contain confidential and/or privileged material. If you have received > this > > in error, please contact the > > sender and delete the material from any computer. All e-mails sent from > or > > to Tavant Technologies > > may be subject to our monitoring procedures. > -- Bertrand Dechoux
