I believe that this has been optimized <https://github.com/apache/spark/commit/2a36292534a1e9f7a501e88f69bfc3a09fb62cb3> in Spark 1.3.
On Tue, Mar 3, 2015 at 4:36 AM, matthes <matthias.diekst...@web.de> wrote: > I use "LATERAL VIEW explode(...)" to read data from a parquet-file but the > full schema is requeseted by parquet instead just the used columns. When I > didn't use LATERAL VIEW the requested schema has just the two columns which > I use. Is it correct or is there place for an optimization or do I > understand there somthing wrong? > > Here are my examples: > > 1) hiveContext.sql("SELECT userid FROM pef WHERE observeddays==20140509") > > The requested schema is: > > optional group observedDays (LIST) { > repeated int32 array; > } > required int64 userid; > } > > This is what I expect although the result does not work, but that is not > the > problem here! > > 2) hiveContext.sql("SELECT userid FROM pef LATERAL VIEW > explode(observeddays) od AS observed WHERE observed==20140509") > > the requested schema is: > > required int64 userid; > optional int32 source; > optional group observedDays (LIST) { > repeated int32 array; > } > optional group placetobe (LIST) { > repeated group bag { > optional group array { > optional binary palces (UTF8); > optional group dates (LIST) { > repeated int32 array; > } > } > } > } > } > > Why does parquet request the full schema. I just use two fields of the > table. > > Can somebody please explain me why this can happen. > > Thanks! > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/LATERAL-VIEW-explode-requests-the-full-schema-tp21893.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >