Why not two tables and then you can join them? This would be the standard way. 
it depends what your full use case is, what volumes / orders you expect on 
average, how aggregations and filters look like. The example below states that 
you do a Select all on the table.

> Am 19.01.2020 um 01:50 schrieb V0lleyBallJunki3 <venkatda...@gmail.com>:
> 
> I am using a dataframe and has structure like this :
> 
> root
> |-- orders: array (nullable = true)
> |    |-- element: struct (containsNull = true) 
> |    |    |-- amount: double (nullable = true)
> |    |    |-- id: string (nullable = true)
> |-- user: string (nullable = true)
> |-- language: string (nullable = true)
> 
> Each user has multiple orders. Now if I explode orders like this:
> 
> df.select($"user", explode($"orders").as("order")) . Each order element will
> become a row with a duplicated user and language. Was wondering if spark
> actually converts each order element into a single row in memory or it just
> logical. Because if a single user has 1000 orders  then wouldn't it lead to
> a lot more memory consumption since it is duplicating user and language a
> 1000 times (once for each order) in memory?
> 
> 
> 
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to