Re: How to load specific Hive partition in DataFrame Spark 1.6?

Yin Huai Thu, 07 Jan 2016 11:57:18 -0800

Hi, we made the change because the partitioning discovery logic was too
flexible and it introduced problems that were very confusing to users. To
make your case work, we have introduced a new data source option called
basePath. You can use


DataFrame df = hiveContext.read().format("orc").option("basePath", "
path/to/table/").load("path/to/table/entity=xyz")

So, the partitioning discovery logic will understand that the base
path is path/to/table/
and your dataframe will has the column "entity".

You can find the doc at the end of partitioning discovery section of the
sql programming guide (
http://spark.apache.org/docs/latest/sql-programming-guide.html#partition-discovery
).

Thanks,

Yin

On Thu, Jan 7, 2016 at 7:34 AM, unk1102 <umesh.ka...@gmail.com> wrote:

> Hi from Spark 1.6 onwards as per this  doc
> <
> http://spark.apache.org/docs/latest/sql-programming-guide.html#partition-discovery
> >
> We cant add specific hive partitions to DataFrame
>
> spark 1.5 the following used to work and the following dataframe will have
> entity column
>
> DataFrame df =
> hiveContext.read().format("orc").load("path/to/table/entity=xyz")
>
> But in Spark 1.6 above does not work and I have to give base path like the
> following but it does not contain entity column which I want in DataFrame
>
> DataFrame df = hiveContext.read().format("orc").load("path/to/table/")
>
> How do I load specific hive partition in a dataframe? What was the driver
> behind removing this feature which was efficient I believe now above Spark
> 1.6 code load all partitions and if I filter for specific partitions it is
> not efficient it hits memory and throws GC error because of thousands of
> partitions get loaded into memory and not the specific one please guide.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-load-specific-Hive-partition-in-DataFrame-Spark-1-6-tp25904.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: How to load specific Hive partition in DataFrame Spark 1.6?

Reply via email to