Re: Run a specific PySpark test or group of tests

2018-12-05 Thread Wenchen Fan
great job! thanks a lot! On Thu, Dec 6, 2018 at 9:39 AM Hyukjin Kwon wrote: > It's merged now and in developer tools page - > http://spark.apache.org/developer-tools.html#individual-tests > Have some func with PySpark testing! > > 2018년 12월 5일 (수) 오후 4:30, Hyukjin Kwon 님이 작성: > >> Hey all, I

Re: Run a specific PySpark test or group of tests

2018-12-05 Thread Hyukjin Kwon
It's merged now and in developer tools page - http://spark.apache.org/developer-tools.html#individual-tests Have some func with PySpark testing! 2018년 12월 5일 (수) 오후 4:30, Hyukjin Kwon 님이 작성: > Hey all, I kind of met the goal with a minimised fix with keeping > available framework and options.

Re: Implementation for exactly-once streaming sink

2018-12-05 Thread Arun Mahadevan
I guess thats roughly it. As of now theres no in-built support to co-ordinate the commits across the executors in an atomic way. So you need to commit the batch (global commit) at the driver. And when the batch is replayed and if any of the intermediate operations are not idempotent or can cause

Implementation for exactly-once streaming sink

2018-12-05 Thread Eric Wohlstadter
Hi all, We are working on implementing a streaming sink on 2.3.1 with the DataSourceV2 APIs. Can anyone help check if my understanding is correct, with respect to the failure modes which need to be covered? We are assuming that a Reliable Receiver (such as Kafka) is used as the stream source.

Re: [SPARK-26160] Make assertNotBucketed call in DataFrameWriter::save optional

2018-12-05 Thread Wenchen Fan
The bucket feature is designed to only work with data sources with table support, and currently the table support is not public yet, which means no external data sources can access bucketing information right now. The bucket feature only works with Spark native file source tables. We are working

Re: Run a specific PySpark test or group of tests

2018-12-05 Thread Hyukjin Kwon
Hey all, I kind of met the goal with a minimised fix with keeping available framework and options. See https://github.com/apache/spark/pull/23203 https://github.com/apache/spark-website/pull/161 I know it's not perfect and other Python testing framework provide many good other features but