Datasource V2- Heavy Metadata Query

chris Wed, 22 Apr 2020 23:14:25 -0700

Hi,

We have a datasource V2 implementation for one of our custom data sources. In 
this we have a step that is almost completely analogous to scanning parquet 
files- essentially it’s a heavyweight metadata operation that, although not 
actually a file scan, is best done in parallel and ideally is cached for the 
lifetime of the dataframe.


In the case of parquet files Spark solves this issue via FileScanRDD. For 
Datasource V2 it’s not obvious how one would solve a similar problem.  Does 
anyone have any ideas or prior art here? 

Thanks,

Chris
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Datasource V2- Heavy Metadata Query

Reply via email to