[GitHub] [spark] HeartSaVioR commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink to avoid memory issue

2020-07-07 Thread GitBox
HeartSaVioR commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-655088167 retest this, please This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [spark] HeartSaVioR commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink to avoid memory issue

2020-07-06 Thread GitBox
HeartSaVioR commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-654584036 retest this, please This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [spark] HeartSaVioR commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink to avoid memory issue

2020-07-06 Thread GitBox
HeartSaVioR commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-654529866 retest this, please This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [spark] HeartSaVioR commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink to avoid memory issue

2020-07-06 Thread GitBox
HeartSaVioR commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-654456696 retest this, please This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [spark] HeartSaVioR commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink to avoid memory issue

2020-07-06 Thread GitBox
HeartSaVioR commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-654439146 retest this, please This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [spark] HeartSaVioR commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink to avoid memory issue

2020-07-06 Thread GitBox
HeartSaVioR commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-654262269 retest this, please This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [spark] HeartSaVioR commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink to avoid memory issue

2020-07-06 Thread GitBox
HeartSaVioR commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-654091567 retest this, please This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [spark] HeartSaVioR commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink to avoid memory issue

2020-07-05 Thread GitBox
HeartSaVioR commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-653958193 ``` Test build #124973 has finished for PR 28904 at commit 11cb693. This patch fails Spark unit tests. This patch merges cleanly. This patch adds no public cla

[GitHub] [spark] HeartSaVioR commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink to avoid memory issue

2020-07-05 Thread GitBox
HeartSaVioR commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-653947986 retest this, please This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [spark] HeartSaVioR commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink to avoid memory issue

2020-07-02 Thread GitBox
HeartSaVioR commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-653254475 retest this, please This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [spark] HeartSaVioR commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink to avoid memory issue

2020-07-02 Thread GitBox
HeartSaVioR commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-652829555 retest this, please This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [spark] HeartSaVioR commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink to avoid memory issue

2020-07-01 Thread GitBox
HeartSaVioR commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-652802682 retest this, please This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [spark] HeartSaVioR commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink to avoid memory issue

2020-06-30 Thread GitBox
HeartSaVioR commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-652182305 org.apache.spark.sql.hive.thriftserver.HiveSessionImplSuite.(It is not a test it is a sbt.testing.SuiteSelector) This seems to be failing frequently. I'll see other bu

[GitHub] [spark] HeartSaVioR commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink to avoid memory issue

2020-06-28 Thread GitBox
HeartSaVioR commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-650889370 UPDATE: SPARK-30946 + SPARK-30462 with lower down driver memory to 1.5G now writes batch 9039 which RES is around 1.3g. I guess the process uses up available memory if possi

[GitHub] [spark] HeartSaVioR commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink to avoid memory issue

2020-06-27 Thread GitBox
HeartSaVioR commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-650690135 UPDATE: now SPARK-30946 + SPARK-30462 writes 11879 which RES is still less than 2G (around 1.7G). I'll stop the sustaining test for enough heap and run the another sustainin

[GitHub] [spark] HeartSaVioR commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink to avoid memory issue

2020-06-26 Thread GitBox
HeartSaVioR commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-650460286 I've been running sustain tests (still running) and here's some observation: - SPARK-30946 + SPARK-30462 just wrote compact batch 7309 (73,100,000 entries) which size

[GitHub] [spark] HeartSaVioR commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink to avoid memory issue

2020-06-23 Thread GitBox
HeartSaVioR commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-648455941 Combining efforts together: * master branch + this patch (SPARK-30462) = batch 919 in 2 hours (120 mins) * SPARK-30946 + this patch (SPARK-30462) = batch 919 in 11 m

[GitHub] [spark] HeartSaVioR commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink to avoid memory issue

2020-06-23 Thread GitBox
HeartSaVioR commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-648380748 Just to be sure, the test app with patch now writes the version 1589, which the log file size is 2.9G, with RES 1.025g.

[GitHub] [spark] HeartSaVioR commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink to avoid memory issue

2020-06-23 Thread GitBox
HeartSaVioR commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-647995916 also cc. @xuanyuanking @uncleGen as they have been reviewed sibling PRs. This is an automated message from t

[GitHub] [spark] HeartSaVioR commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink to avoid memory issue

2020-06-23 Thread GitBox
HeartSaVioR commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-647995522 cc. @tdas @zsxwing @jose-torres @gaborgsomogyi This is an automated message from the Apache Git Service. To