usage of msck : msck table <table> msck repair table <table> BUT that won't help me.
I am using an external table with 'external' partitions (which do not follow hive conventions). So I first create an external table without local and then I specify every partition with an absolute location. I don't think there is another way given my constraints. But if there is, I will gladly read it. So, with the current implementation and with regards to the parameters that can be used with the current hive commands : 1) hive has no way to list table directory 2) hive has no way to understand which variable should be used for each partitioning level Conclusion : the only solution at the moment is to declare partitions for hive in advance (thanks Edward). But that means that I do have to handle the 'synchronisation' of two 'pseudo' file tree : hdfs and hive partitions. Bertrand On Wed, Jul 25, 2012 at 10:51 AM, Bertrand Dechoux <decho...@gmail.com>wrote: > @Puneet Khatod : I found that out. And that's why I am asking here. I > guess non AWS users might have the same problems and a way to solve it. > > @Ruslan Al-fakikh : It seems great. Is there any documentation for msck? > I will find out with the diff file but is there a wiki page or a blog post > about it? It would be best. I could not find any. > > @Edward Capriolo : I now feel silly. This is clearly a better approach > that my proposed hacks. The performance impact should be negligible, even > more when ensuring partition pruning. I am using hive to 'piggy back' on an > external way of writing data. So in my case, I could indeed tell in advance > to hive where the data will be written. (Same as you say but the logic is > reverse.) I guess I skipped over alter table touch. But it would not help > me. The partitions are external. And if I add partitions, I will do it with > cron and a shell file. > > Bertrand > > > > On Tue, Jul 24, 2012 at 7:24 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote: > >> Alter table touch will create partitions even if they have no data, >> You can also just create partitions ahead of time and have your code >> "know" where to write data. >> >> >> On Tue, Jul 24, 2012 at 12:35 PM, Ruslan Al-fakikh >> <ruslan.al-fak...@jalent.ru> wrote: >> > If you are not using Amazon take a look at this: >> > >> > https://issues.apache.org/jira/browse/HIVE-874 >> > >> > >> > >> > Ruslan >> > >> > >> > >> > From: Puneet Khatod [mailto:puneet.kha...@tavant.com] >> > Sent: Tuesday, July 24, 2012 8:32 PM >> > To: user@hive.apache.org >> > Subject: RE: Continuous log analysis requires 'dynamic' partitions, is >> that >> > possible? >> > >> > >> > >> > If you are using Amazon (AWS), you can use ‘recover partitions’ to >> enable >> > all top level partitions. >> > >> > This will add required dynamicity. >> > >> > >> > >> > Regards, >> > >> > Puneet Khatod >> > >> > >> > >> > From: Bertrand Dechoux [mailto:decho...@gmail.com] >> > Sent: 24 July 2012 21:15 >> > To: user@hive.apache.org >> > Subject: Continuous log analysis requires 'dynamic' partitions, is that >> > possible? >> > >> > >> > >> > Hi, >> > >> > Let's say logs are stored inside hdfs using the following file tree >> > /<logtype>/<month>/<day>. >> > So for apache, that would be : >> > /apache/01/01 >> > /apache/01/02 >> > ... >> > /apache/02/01 >> > ... >> > >> > I would like to know how to define a table for this information. I >> found out >> > that the table should be external and should be using partitions. >> > However, I did not found any way to dynamically create the partitions. >> Is >> > there no automatic way to define them? >> > In that case, the partition 'template' would be <month>/<day> with the >> root >> > being apache. >> > >> > I know how to 'hack a fix' : create a script which would generate all >> the >> > "add partition statement" and run the resulting statements without >> caring >> > about the results because partitions may not exist or may already have >> been >> > added. Better, I could parse the result of 'show partition' for the >> table >> > and run only the relevant statement but it still feels like a hack. >> > >> > Is there any clean way to do it? >> > >> > Regards, >> > >> > Bertrand Dechoux >> > >> > Any comments or statements made in this email are not necessarily those >> of >> > Tavant Technologies. >> > The information transmitted is intended only for the person or entity to >> > which it is addressed and may >> > contain confidential and/or privileged material. If you have received >> this >> > in error, please contact the >> > sender and delete the material from any computer. All e-mails sent from >> or >> > to Tavant Technologies >> > may be subject to our monitoring procedures. >> > > > > -- > Bertrand Dechoux > -- Bertrand Dechoux