usage of msck :
msck table <table>
msck repair table <table>

BUT that won't help me.

I am using an external table with 'external' partitions (which do not
follow hive conventions).
So I first create an external table without local and then I specify every
partition with an absolute location.
I don't think there is another way given my constraints. But if there is, I
will gladly read it.

So, with the current implementation and with regards to the parameters that
can be used with the current hive commands :
1) hive has no way to list table directory
2) hive has no way to understand which variable should be used for each
partitioning level

Conclusion : the only solution at the moment is to declare partitions for
hive in advance (thanks Edward). But that means that I do have to handle
the 'synchronisation' of two 'pseudo' file tree : hdfs and hive partitions.

Bertrand

On Wed, Jul 25, 2012 at 10:51 AM, Bertrand Dechoux <decho...@gmail.com>wrote:

> @Puneet Khatod : I found that out. And that's why I am asking here. I
> guess non AWS users might have the same problems and a way to solve it.
>
> @Ruslan Al-fakikh : It seems great. Is there any documentation for msck?
> I will find out with the diff file but is there a wiki page or a blog post
> about it? It would be best. I could not find any.
>
> @Edward Capriolo : I now feel silly. This is clearly a better approach
> that my proposed hacks. The performance impact should be negligible, even
> more when ensuring partition pruning. I am using hive to 'piggy back' on an
> external way of writing data. So in my case, I could indeed tell in advance
> to hive where the data will be written. (Same as you say but the logic is
> reverse.) I guess I skipped over alter table touch. But it would not help
> me. The partitions are external. And if I add partitions, I will do it with
> cron and a shell file.
>
> Bertrand
>
>
>
> On Tue, Jul 24, 2012 at 7:24 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote:
>
>> Alter table touch will create partitions even if they have no data,
>> You can also just create partitions ahead of time and have your code
>> "know" where to write data.
>>
>>
>> On Tue, Jul 24, 2012 at 12:35 PM, Ruslan Al-fakikh
>> <ruslan.al-fak...@jalent.ru> wrote:
>> > If you are not using Amazon take a look at this:
>> >
>> > https://issues.apache.org/jira/browse/HIVE-874
>> >
>> >
>> >
>> > Ruslan
>> >
>> >
>> >
>> > From: Puneet Khatod [mailto:puneet.kha...@tavant.com]
>> > Sent: Tuesday, July 24, 2012 8:32 PM
>> > To: user@hive.apache.org
>> > Subject: RE: Continuous log analysis requires 'dynamic' partitions, is
>> that
>> > possible?
>> >
>> >
>> >
>> > If you are using Amazon (AWS), you can use ‘recover partitions’ to
>> enable
>> > all top level partitions.
>> >
>> > This will add required dynamicity.
>> >
>> >
>> >
>> > Regards,
>> >
>> > Puneet Khatod
>> >
>> >
>> >
>> > From: Bertrand Dechoux [mailto:decho...@gmail.com]
>> > Sent: 24 July 2012 21:15
>> > To: user@hive.apache.org
>> > Subject: Continuous log analysis requires 'dynamic' partitions, is that
>> > possible?
>> >
>> >
>> >
>> > Hi,
>> >
>> > Let's say logs are stored inside hdfs using the following file tree
>> > /<logtype>/<month>/<day>.
>> > So for apache, that would be :
>> > /apache/01/01
>> > /apache/01/02
>> > ...
>> > /apache/02/01
>> > ...
>> >
>> > I would like to know how to define a table for this information. I
>> found out
>> > that the table should be external and should be using partitions.
>> > However, I did not found any way to dynamically create the partitions.
>> Is
>> > there no automatic way to define them?
>> > In that case, the partition 'template' would be <month>/<day> with the
>> root
>> > being apache.
>> >
>> > I know how to 'hack a fix' : create a script which would generate all
>> the
>> > "add partition statement" and run the resulting statements without
>> caring
>> > about the results because partitions may not exist or may already have
>> been
>> > added. Better, I could parse the result of 'show partition' for the
>> table
>> > and run only the relevant statement but it still feels like a hack.
>> >
>> > Is there any clean way to do it?
>> >
>> > Regards,
>> >
>> > Bertrand Dechoux
>> >
>> > Any comments or statements made in this email are not necessarily those
>> of
>> > Tavant Technologies.
>> > The information transmitted is intended only for the person or entity to
>> > which it is addressed and may
>> > contain confidential and/or privileged material. If you have received
>> this
>> > in error, please contact the
>> > sender and delete the material from any computer. All e-mails sent from
>> or
>> > to Tavant Technologies
>> > may be subject to our monitoring procedures.
>>
>
>
>
> --
> Bertrand Dechoux
>



-- 
Bertrand Dechoux

Reply via email to