Adam Binford created SPARK-31376:
------------------------------------

             Summary: Non-global sort support for structured streaming
                 Key: SPARK-31376
                 URL: https://issues.apache.org/jira/browse/SPARK-31376
             Project: Spark
          Issue Type: Improvement
          Components: Structured Streaming
    Affects Versions: 3.0.0
            Reporter: Adam Binford


Currently, all sorting is disallowed with structured streaming queries. Not 
allowing global sorting makes sense, but could non-global sorting (i.e. 
sortWithinPartitions) be allowed? I'm running into this with an external source 
I'm using, but not sure if this would be useful to file sources as well. I have 
to foreachBatch so that I can do a sortWithinPartitions.

Two main questions:
 * Does a local sort cause issues with any exactly-once guarantees streaming 
queries provides? I can't say I know or understand how these semantics work. Or 
are there other issues I can't think of this would cause?
 * Is the change as simple as changing the unsupported operations check to only 
look for global sorts instead of all sorts?

I have built a version that simply changes the unsupported check to only 
disallow global sorts and it seems to be working. Anything I'm missing or is it 
this simple?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to