I believe that this has been optimized
<https://github.com/apache/spark/commit/2a36292534a1e9f7a501e88f69bfc3a09fb62cb3>
in Spark 1.3.

On Tue, Mar 3, 2015 at 4:36 AM, matthes <matthias.diekst...@web.de> wrote:

> I use "LATERAL VIEW explode(...)" to read data from a parquet-file but the
> full schema is requeseted by parquet instead just the used columns. When I
> didn't use LATERAL VIEW the requested schema has just the two columns which
> I use. Is it correct or is there place for an optimization or do I
> understand there somthing wrong?
>
> Here are my examples:
>
> 1) hiveContext.sql("SELECT userid FROM pef WHERE observeddays==20140509")
>
> The requested schema is:
>
> optional group observedDays (LIST) {
>     repeated int32 array;
>   }
>   required int64 userid;
> }
>
> This is what I expect although the result does not work, but that is not
> the
> problem here!
>
> 2) hiveContext.sql("SELECT userid FROM pef LATERAL VIEW
> explode(observeddays) od AS observed WHERE observed==20140509")
>
> the requested schema is:
>
>   required int64 userid;
>   optional int32 source;
>   optional group observedDays (LIST) {
>     repeated int32 array;
>   }
>   optional group placetobe (LIST) {
>     repeated group bag {
>       optional group array {
>         optional binary palces (UTF8);
>         optional group dates (LIST) {
>           repeated int32 array;
>         }
>       }
>     }
>   }
> }
>
> Why does parquet request the full schema. I just use two fields of the
> table.
>
> Can somebody please explain me why this can happen.
>
> Thanks!
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/LATERAL-VIEW-explode-requests-the-full-schema-tp21893.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to