sqlContext.table("...").inputFiles

(this is best effort, but should work for hive tables).

Michael

On Tue, Dec 1, 2015 at 10:55 AM, Krzysztof Zarzycki <k.zarzy...@gmail.com>
wrote:

> Hi there,
> Do you know how easily I can get a list of all files of a Hive table?
>
> What I want to achieve is to get all files that are underneath parquet
> table and using sparksql-protobuf[1] library(really handy library!) and its
> helper class ProtoParquetRDD:
>
> val protobufsRdd = new ProtoParquetRDD(sc, "files", classOf[MyProto])
>
> Access the underlying parquet files as normal protocol buffers. But I
> don't know how to get them. I pointed the call above to one file by hand it
> worked well.
> The parquet table was created based on the same library and it's implicit
> hiveContext extension createDataFrame, which creates a DataFrame based on
> Protocol buffer class.
>
> (The revert read operation is needed to support legacy code, where after
> converting protocol buffers to parquet, I still want some code to access
> parquet as normal protocol buffers).
>
> Maybe someone will have other way to get an Rdd of protocol buffers from
> Parquet-stored table.
>
> [1] https://github.com/saurfang/sparksql-protobuf
>
> Thanks!
> Krzysztof
>
>
>
>

Reply via email to