Why not two tables and then you can join them? This would be the standard way. it depends what your full use case is, what volumes / orders you expect on average, how aggregations and filters look like. The example below states that you do a Select all on the table.
> Am 19.01.2020 um 01:50 schrieb V0lleyBallJunki3 <venkatda...@gmail.com>: > > I am using a dataframe and has structure like this : > > root > |-- orders: array (nullable = true) > | |-- element: struct (containsNull = true) > | | |-- amount: double (nullable = true) > | | |-- id: string (nullable = true) > |-- user: string (nullable = true) > |-- language: string (nullable = true) > > Each user has multiple orders. Now if I explode orders like this: > > df.select($"user", explode($"orders").as("order")) . Each order element will > become a row with a duplicated user and language. Was wondering if spark > actually converts each order element into a single row in memory or it just > logical. Because if a single user has 1000 orders then wouldn't it lead to > a lot more memory consumption since it is duplicating user and language a > 1000 times (once for each order) in memory? > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org