hi,all: I want't to generate some test data , which contained about one hundred million rows . I create a dataset have ten rows ,and I do df.union operation in 'for' circulation , but this will case the operation only happen on driver node. how can I do it on the whole cluster.
2018-12-14 lk_spark