[ https://issues.apache.org/jira/browse/SPARK-48330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jungtaek Lim resolved SPARK-48330. ---------------------------------- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46651 [https://github.com/apache/spark/pull/46651] > Fix the python streaming data source timeout issue for large trigger interval > ----------------------------------------------------------------------------- > > Key: SPARK-48330 > URL: https://issues.apache.org/jira/browse/SPARK-48330 > Project: Spark > Issue Type: Task > Components: PySpark, SS > Affects Versions: 4.0.0 > Reporter: Chaoqin Li > Assignee: Chaoqin Li > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently we run long running python worker process for python streaming > source and sink to perform planning, commit and abort in driver side. Testing > indicate that current implementation cause connection timeout error when > streaming query has large trigger interval > For python streaming source, keep the long running worker archaetecture but > set the socket timeout to be infinity to avoid timeout error. > For python streaming sink, since StreamingWrite is also created per > microbatch in scala side, long running worker cannot be attached to s > StreamingWrite instance. Therefore we abandon the long running worker > architecture, simply call commit() or abort() and exit the worker and allow > spark to reuse worker for us. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org