Martin Junghanns created SPARK-28513: ----------------------------------------
Summary: Compute distinct label sets instead of subsets Key: SPARK-28513 URL: https://issues.apache.org/jira/browse/SPARK-28513 Project: Spark Issue Type: Improvement Components: Graph Affects Versions: 3.0.0 Reporter: Martin Junghanns {code:scala} CypherSession::createDataFrame(nodes: DataFrame, rels: DataFrame) {code} currently computes NodeFrames by filtering label columns, computing all possible subsets and creating one NodeFrame for each subset. This results in 2^n sets / NodeFrames. Instead, we should compute the distinct label sets that actually occur in the nodes DataFrame. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org