Hi Team,
I have two questions regarding Arrow and Spark integration,
1. I am joining two huge tables (1PB) each - will the performance be huge
when I use Arrow format before shuffling ? Will the
serialization/deserialization cost have significant improvement?
2. Can we store the final data in
Hi All,
Trying to understand why connected components algorithms runs much slower
than the graphX equivalent?
Graphx code creates 16 stages.
GraphFrame graphFrame = GraphFrame.fromEdges(edges);
Dataset connectedComponents =
graphFrame.connectedComponents().setAlgorithm("graphx").run();
and the