sqlContext.table("...").inputFiles (this is best effort, but should work for hive tables).
Michael On Tue, Dec 1, 2015 at 10:55 AM, Krzysztof Zarzycki <k.zarzy...@gmail.com> wrote: > Hi there, > Do you know how easily I can get a list of all files of a Hive table? > > What I want to achieve is to get all files that are underneath parquet > table and using sparksql-protobuf[1] library(really handy library!) and its > helper class ProtoParquetRDD: > > val protobufsRdd = new ProtoParquetRDD(sc, "files", classOf[MyProto]) > > Access the underlying parquet files as normal protocol buffers. But I > don't know how to get them. I pointed the call above to one file by hand it > worked well. > The parquet table was created based on the same library and it's implicit > hiveContext extension createDataFrame, which creates a DataFrame based on > Protocol buffer class. > > (The revert read operation is needed to support legacy code, where after > converting protocol buffers to parquet, I still want some code to access > parquet as normal protocol buffers). > > Maybe someone will have other way to get an Rdd of protocol buffers from > Parquet-stored table. > > [1] https://github.com/saurfang/sparksql-protobuf > > Thanks! > Krzysztof > > > >