Re: Does explode lead to more usage of memory

2020-01-19 Thread Chris Teoh
Depends on the use case, if you have to join, you're saving a join and a shuffle from having it already in an array. If you explode, at least sort within partitions to get you predicate pushdown when you read the data next time. On Sun, 19 Jan 2020, 1:19 pm Jörn Franke, wrote: > Why not two

Re: Does explode lead to more usage of memory

2020-01-18 Thread Jörn Franke
Why not two tables and then you can join them? This would be the standard way. it depends what your full use case is, what volumes / orders you expect on average, how aggregations and filters look like. The example below states that you do a Select all on the table. > Am 19.01.2020 um 01:50

Re: Does explode lead to more usage of memory

2020-01-18 Thread Chris Teoh
I think it does mean more memory usage but consider how big your arrays are. Think about your use case requirements and whether it makes sense to use arrays. Also it may be preferable to explode if the arrays are very large. I'd say exploding arrays will make the data more splittable, having the

Does explode lead to more usage of memory

2020-01-18 Thread V0lleyBallJunki3
I am using a dataframe and has structure like this : root |-- orders: array (nullable = true) ||-- element: struct (containsNull = true) |||-- amount: double (nullable = true) |||-- id: string (nullable = true) |-- user: string (nullable = true) |-- language: string