how to generate a larg dataset paralleled

lk_spark Thu, 13 Dec 2018 18:40:03 -0800

hi,all:
    I want't to generate some test data , which contained about one hundred 
million rows .
    I create a dataset have ten rows ,and I do df.union operation in 'for' 
circulation , but this will case the operation only happen on driver node.
    how can I do it on the whole cluster.


2018-12-14


lk_spark

how to generate a larg dataset paralleled

Reply via email to