Dennis Gove created SOLR-9096: --------------------------------- Summary: Add PartitionStream to Streaming Expressions Key: SOLR-9096 URL: https://issues.apache.org/jira/browse/SOLR-9096 Project: Solr Issue Type: New Feature Reporter: Dennis Gove
The basic idea of a PartitionStream is to take a one or more input streams of tuples, partition them out to a set of workers such that each worker can work with a subset of the tuples, and then bring them all back into a single stream. This differs from a ParallelStream because in ParallelStream the data is partitioned at the source whereas with a PartitionStream one can take an existing stream and spread it out across workers. {code} /--- sort ----\ / \ /--- Collection A / ---- sort ----- \ / Client <--- rollup <----< <----- innerJoin < \ ---- sort ----- / \ \ / \--- Collection B \--- sort ----/ {code} {code} /--- sort -- rollup ----\ / \ /--- Collection A / ---- sort -- rollup ----- \ / Client <-- innerJoin <---< <----- innerJoin < \ \ ---- sort -- rollup ----- / \ \ \ / \--- Collection B \ \--- sort -- rollup ----/ \ \ \ <--- jdbc source {code} {code} /--- sort -- innerJoin ----\ / \ / ---- sort -- innerJoin ----- \ <--- jdbc source Client <-- innerJoin <---< | \ \ ---- sort -- innerJoin ----- / <--- rollup <---- Collection A \ \ / \ \--- sort -- innerJoin ----/ \ \ \ <--- jdbc source {code} I imagine partition expression would look something like this {code} partition( inputA=<source stream A>, inputB=<source stream B>, work=<stream for the workers>, over="fieldA,fieldB", workers=6, zkHost=<zk connection string> ) {code} for example {code} innerJoin( partition( inputA=jdbc(database1), inputB=rollup( search(collectionA, ...), ... ), work=sort( innerJoin( inputA, inputB, on="fieldA,fieldB" ), by="jdbcFieldC asc, collectionAFieldB desc" ), workers=6, zkHost=localhost:12345 ), jdbc(database2), on="fieldZ" ) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org