Re: How to load specific Hive partition in DataFrame Spark 1.6?

Yin Huai Thu, 07 Jan 2016 12:19:33 -0800

No problem! Glad it helped!

On Thu, Jan 7, 2016 at 12:05 PM, Umesh Kacha <umesh.ka...@gmail.com> wrote:


> Hi Yin, thanks much your answer solved my problem. Really appreciate it!
>
> Regards
>
>
> On Fri, Jan 8, 2016 at 1:26 AM, Yin Huai <yh...@databricks.com> wrote:
>
>> Hi, we made the change because the partitioning discovery logic was too
>> flexible and it introduced problems that were very confusing to users. To
>> make your case work, we have introduced a new data source option called
>> basePath. You can use
>>
>> DataFrame df = hiveContext.read().format("orc").option("basePath", "
>> path/to/table/").load("path/to/table/entity=xyz")
>>
>> So, the partitioning discovery logic will understand that the base path
>> is path/to/table/ and your dataframe will has the column "entity".
>>
>> You can find the doc at the end of partitioning discovery section of the
>> sql programming guide (
>> http://spark.apache.org/docs/latest/sql-programming-guide.html#partition-discovery
>> ).
>>
>> Thanks,
>>
>> Yin
>>
>> On Thu, Jan 7, 2016 at 7:34 AM, unk1102 <umesh.ka...@gmail.com> wrote:
>>
>>> Hi from Spark 1.6 onwards as per this  doc
>>> <
>>> http://spark.apache.org/docs/latest/sql-programming-guide.html#partition-discovery
>>> >
>>> We cant add specific hive partitions to DataFrame
>>>
>>> spark 1.5 the following used to work and the following dataframe will
>>> have
>>> entity column
>>>
>>> DataFrame df =
>>> hiveContext.read().format("orc").load("path/to/table/entity=xyz")
>>>
>>> But in Spark 1.6 above does not work and I have to give base path like
>>> the
>>> following but it does not contain entity column which I want in DataFrame
>>>
>>> DataFrame df = hiveContext.read().format("orc").load("path/to/table/")
>>>
>>> How do I load specific hive partition in a dataframe? What was the driver
>>> behind removing this feature which was efficient I believe now above
>>> Spark
>>> 1.6 code load all partitions and if I filter for specific partitions it
>>> is
>>> not efficient it hits memory and throws GC error because of thousands of
>>> partitions get loaded into memory and not the specific one please guide.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-load-specific-Hive-partition-in-DataFrame-Spark-1-6-tp25904.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>

Re: How to load specific Hive partition in DataFrame Spark 1.6?

Reply via email to