No problem! Glad it helped! On Thu, Jan 7, 2016 at 12:05 PM, Umesh Kacha <umesh.ka...@gmail.com> wrote:
> Hi Yin, thanks much your answer solved my problem. Really appreciate it! > > Regards > > > On Fri, Jan 8, 2016 at 1:26 AM, Yin Huai <yh...@databricks.com> wrote: > >> Hi, we made the change because the partitioning discovery logic was too >> flexible and it introduced problems that were very confusing to users. To >> make your case work, we have introduced a new data source option called >> basePath. You can use >> >> DataFrame df = hiveContext.read().format("orc").option("basePath", " >> path/to/table/").load("path/to/table/entity=xyz") >> >> So, the partitioning discovery logic will understand that the base path >> is path/to/table/ and your dataframe will has the column "entity". >> >> You can find the doc at the end of partitioning discovery section of the >> sql programming guide ( >> http://spark.apache.org/docs/latest/sql-programming-guide.html#partition-discovery >> ). >> >> Thanks, >> >> Yin >> >> On Thu, Jan 7, 2016 at 7:34 AM, unk1102 <umesh.ka...@gmail.com> wrote: >> >>> Hi from Spark 1.6 onwards as per this doc >>> < >>> http://spark.apache.org/docs/latest/sql-programming-guide.html#partition-discovery >>> > >>> We cant add specific hive partitions to DataFrame >>> >>> spark 1.5 the following used to work and the following dataframe will >>> have >>> entity column >>> >>> DataFrame df = >>> hiveContext.read().format("orc").load("path/to/table/entity=xyz") >>> >>> But in Spark 1.6 above does not work and I have to give base path like >>> the >>> following but it does not contain entity column which I want in DataFrame >>> >>> DataFrame df = hiveContext.read().format("orc").load("path/to/table/") >>> >>> How do I load specific hive partition in a dataframe? What was the driver >>> behind removing this feature which was efficient I believe now above >>> Spark >>> 1.6 code load all partitions and if I filter for specific partitions it >>> is >>> not efficient it hits memory and throws GC error because of thousands of >>> partitions get loaded into memory and not the specific one please guide. >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-load-specific-Hive-partition-in-DataFrame-Spark-1-6-tp25904.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >> >