Hi all, i have one large data-set. when i am getting the number of partitions its showing 43. We can't collect() the large data-set in to memory so i am thinking like this, collect() each partitions so that it will be small in size.
Any thoughts ?
Hi all, i have one large data-set. when i am getting the number of partitions its showing 43. We can't collect() the large data-set in to memory so i am thinking like this, collect() each partitions so that it will be small in size.
Any thoughts ?