You may want to reach out to hdfs dev for the format of editlog. There is a lot of information there and I am not sure how accurate I am.
In one of my previous works, we did convert the daily editlog to a partitioned hive table and did exactly what you wanted to do. Sadly we could not opensource that product. On Tue, Oct 8, 2013 at 12:26 AM, demian rosas <demia...@gmail.com> wrote: > Edward, > > Thanks a lot for this info !!! > > This gives me a clearer picture of the problem and how I can approach it. > > Cheers. > > > On 7 October 2013 11:52, Edward Capriolo <edlinuxg...@gmail.com> wrote: > >> Not a direct API. >> >> What I do is this. From java/thrift: >> Table t = client.getTable("name_of_table"); >> Path p = new Path(t.getSd.getLocation()); >> FileSystem fs = FileSystem.get(conf); >> List<FileStatus> f = fs.listFiles(p) >> /// your logic here. >> >> >> >> >> On Mon, Oct 7, 2013 at 2:01 PM, demian rosas <demia...@gmail.com> wrote: >> >>> Hi all, >>> >>> I want to track the changes made to the files of a Hive table. >>> >>> I wounder whether there is any API that I can use to find out the >>> following: >>> >>> 1. What files in hdfs constitute a hive table. >>> 2. What is the size of each of these files. >>> 3. The time stamp of the creation/last update to each of these files. >>> >>> >>> Also in a wider view, is there any API that can do the above mentioned >>> for HDFS files in general (not only hive specific)? >>> >>> Thanks a lot in advance. >>> >>> Cheers. >>> >>> >>> >> > -- Nitin Pawar