I am using a dataframe and has structure like this : root |-- orders: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- amount: double (nullable = true) | | |-- id: string (nullable = true) |-- user: string (nullable = true) |-- language: string (nullable = true)
Each user has multiple orders. Now if I explode orders like this: df.select($"user", explode($"orders").as("order")) . Each order element will become a row with a duplicated user and language. Was wondering if spark actually converts each order element into a single row in memory or it just logical. Because if a single user has 1000 orders then wouldn't it lead to a lot more memory consumption since it is duplicating user and language a 1000 times (once for each order) in memory? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org