I am using a dataframe and has structure like this :

root
 |-- orders: array (nullable = true)
 |    |-- element: struct (containsNull = true) 
 |    |    |-- amount: double (nullable = true)
 |    |    |-- id: string (nullable = true)
 |-- user: string (nullable = true)
 |-- language: string (nullable = true)

Each user has multiple orders. Now if I explode orders like this:

df.select($"user", explode($"orders").as("order")) . Each order element will
become a row with a duplicated user and language. Was wondering if spark
actually converts each order element into a single row in memory or it just
logical. Because if a single user has 1000 orders  then wouldn't it lead to
a lot more memory consumption since it is duplicating user and language a
1000 times (once for each order) in memory?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to