Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21700#discussion_r201477277
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/streaming/state/BoundedSortedMap.java
---
@@ -0,0 +1,145 @@
+/*
+ * Licensed
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21733
retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21733
retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21733#discussion_r200934841
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
---
@@ -53,7 +54,30 @@ class
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21733#discussion_r200923851
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
---
@@ -53,7 +54,30 @@ class
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21733
cc. @tdas @zsxwing @jose-torres @jerryshao @arunmahadevan @HyukjinKwon
---
-
To unsubscribe, e-mail: reviews-unsubscr
GitHub user HeartSaVioR opened a pull request:
https://github.com/apache/spark/pull/21733
[SPARK-24763][SS] Remove redundant key data from value in streaming
aggregation
* add option to configure enabling new feature: remove redundant key data
from value
* modify code
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21622#discussion_r200554917
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetricsReporter.scala
---
@@ -39,6 +42,23 @@ class MetricsReporter
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21700
@tedyu Thanks for the suggestion. Published the result to the mail thread.
https://lists.apache.org/thread.html/323ab22fea87c14a2f92e58e7a810aa37cbdf00b9ab81448ee967976
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21721
Though I haven't take a look yet, I would like to see this feature
(mentioned from
https://github.com/apache/spark/pull/21622#issuecomment-399677099) and happy to
see this being implemented
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21700
I would like to add numbers to pursuade how much this patch is helpful for
end users of Apache Spark.
I crafted and published a project which implements some stateful use cases
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21718
I'm aware of this issue and have it in my backlog, but for now it doesn't
look like easy to address in efficient way. I'll propose an approach for
rescaling state when I get one
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21718
It has been fairly easy to rescale partitions before stateful operators
came into play. For structured streaming, it is now not a trivial thing, cause
rescaling partitions should also handle
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21700
@tedyu Thanks for the detailed review comments. Addressed.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21700#discussion_r200249847
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala
---
@@ -240,7 +244,11 @@ private
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21700#discussion_r200249732
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/streaming/state/BoundedSortedMap.java
---
@@ -0,0 +1,79 @@
+/*
+ * Licensed
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21700#discussion_r200249261
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/streaming/state/BoundedSortedMap.java
---
@@ -0,0 +1,79 @@
+/*
+ * Licensed
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21673
@arunmahadevan We'd be better to respect style guide on pull request:
please change title to include let JIRA issue number being guided with `[]` and
also add `[SS]`.
http
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21469
retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21700
Missing new line in EOF for two new Java files. Just addressed.
Jenkins, retest this please.
---
-
To unsubscribe, e
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21700
cc. @tdas @zsxwing @jose-torres @jerryshao @arunmahadevan @HyukjinKwon
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21700
retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21700
Pasting JIRA issue description to explain why this patch is needed:
As default version of "spark.sql.streaming.minBatchesToRetain" is set to
high (100), which doesn't requir
GitHub user HeartSaVioR opened a pull request:
https://github.com/apache/spark/pull/21700
SPARK-24717 Split out min retain version of state for memory in
HDFSBackedStateStoreProvider
## What changes were proposed in this pull request?
This patch proposes breaking down
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21469
Rebased to fix conflict, and added new commit (last one: c9aada5) to
represent cache hit / miss count in HDFS state provider. This is actually
helpful for SPARK-24717 to determine proper value
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21622#discussion_r198300792
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetricsReporter.scala
---
@@ -39,6 +42,23 @@ class MetricsReporter
Github user HeartSaVioR closed the pull request at:
https://github.com/apache/spark/pull/21617
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21617
Abandoning the patch. While I think the JIRA issue is still valid, looks
like we should address watermark issue to have correct number of late events.
Thanks for reviewing @jose-torres
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21617#discussion_r197986093
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala ---
@@ -48,12 +49,13 @@ class StateOperatorProgress private[sql
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21617#discussion_r197981651
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala ---
@@ -48,12 +49,13 @@ class StateOperatorProgress private[sql
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21617
@jose-torres
Yes, you're right. They would be the rows which applies other
transformation and filtering, not origin input rows. I just haven't find proper
alternative word than "inpu
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21622
retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21622
I think we may want to add metrics regarding sources and sinks as well, but
the format of offset information or other metadata information can be different
between sources and sinks
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21622
cc. @tdas @zsxwing @jose-torres @jerryshao @arunmahadevan @HyukjinKwon
---
-
To unsubscribe, e-mail: reviews-unsubscr
GitHub user HeartSaVioR opened a pull request:
https://github.com/apache/spark/pull/21622
[SPARK-24637][SS] Add metrics regarding state and watermark to dropwizard
metrics
## What changes were proposed in this pull request?
The patch adds metrics regarding state
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21617
cc. @tdas @zsxwing @jose-torres @jerryshao @arunmahadevan @HyukjinKwon
---
-
To unsubscribe, e-mail: reviews-unsubscr
GitHub user HeartSaVioR opened a pull request:
https://github.com/apache/spark/pull/21617
[SPARK-24634][SS] Add a new metric regarding number of rows later than
watermark
## What changes were proposed in this pull request?
This adds a new metric to count the number of rows
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21560#discussion_r197004935
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousDataSourceRDD.scala
---
@@ -98,6 +98,10 @@ class
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21560#discussion_r197000483
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
---
@@ -349,6 +349,17 @@ object
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21560#discussion_r196999896
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReadRDD.scala
---
@@ -61,12 +63,14
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21560#discussion_r196999687
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReadRDD.scala
---
@@ -61,12 +63,14
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21560#discussion_r196999745
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceRDD.scala
---
@@ -0,0 +1,108
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21222
adding cc. to @zsxwing since he has been reviewing PRs for SS so far.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21357
adding cc. to @zsxwing since he has been reviewing PRs for SS so far.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21469
adding cc. to @zsxwing since he has been reviewing PRs for SS so far.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21595
@HyukjinKwon @hvanhovell Thanks for reviewing and merging!
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21388
I just provided new patch to remove the comment, as it looks like no longer
preferred option.
https://github.com/apache/spark/pull/21595
Closing this one
Github user HeartSaVioR closed the pull request at:
https://github.com/apache/spark/pull/21388
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user HeartSaVioR opened a pull request:
https://github.com/apache/spark/pull/21595
[MINOR][SQL] Remove invalid comment from SparkStrategies
## What changes were proposed in this pull request?
This patch is removing invalid comment from SparkStrategies, given
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21357
retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21477#discussion_r195632189
--- Diff: python/pyspark/sql/streaming.py ---
@@ -843,6 +843,170 @@ def trigger(self, processingTime=None, once=None,
continuous=None
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21477#discussion_r195625921
--- Diff: python/pyspark/sql/streaming.py ---
@@ -843,6 +843,170 @@ def trigger(self, processingTime=None, once=None,
continuous=None
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21388
@hvanhovell
To be honest, I found the rationalization of the issue from a comment in
Spark code:
https://github.com/apache/spark/blob/4c388bccf1bcac8f833fd9214096dd164c3ea065/sql
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21357
retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21469#discussion_r194613720
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala
---
@@ -112,14 +122,19 @@ trait StateStoreWriter
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21357
Kindly ping again to @tdas
And cc. to @jose-torres @jerryshao @HyukjinKwon @arunmahadevan for
reviewing
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21222
Kindly ping again to @tdas
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21469
Kindly ping again to @tdas
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21497
Kindly ping again to @tdas
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21506
Kindly ping again to @tdas
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21469#discussion_r194585044
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala
---
@@ -112,14 +122,19 @@ trait StateStoreWriter
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21500
@aalobaidi
I just would like to see the benefit of unloading the version of state
which is expected to be read from the next batch. Totally I agree current
mechanism of cache is excessive
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21469#discussion_r194563959
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala
---
@@ -247,6 +253,14 @@ private
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21500
After enabling option, I've observed small expected latency whenever
starting batch per each partition per each batch. Median/average was 4~50 ms
for my case, but max latency was a bit higher
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21506
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21506#discussion_r194295068
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala
---
@@ -280,38 +278,49 @@ private
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21506#discussion_r194293481
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala
---
@@ -280,38 +278,49 @@ private
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21506#discussion_r194293251
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala
---
@@ -280,38 +278,49 @@ private
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21506
cc. @tdas @jose-torres @jerryshao @arunmahadevan @HyukjinKwon
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21469
@jose-torres No problem. I expect there would be some inactive moment in
Spark community during spark summit. Addressed comment regarding renaming
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21500
@aalobaidi
When starting batch, latest version state is being read to start a new
version of state. If the state should be restored from snapshot as well as
delta files, it will incur huge
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21504
Test failures were from kafka.
retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21497
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21504
retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21504#discussion_r193945288
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala
---
@@ -55,6 +57,19 @@ class StreamingQueryManager
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21500
@aalobaidi
One thing you may want to be aware is that in point of executor's view,
executor must load at least 1 version of state in memory regardless of caching
versions. I guess you may
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21500
@aalobaidi
You can also merge #21506 (maybe with changing log level or modify the
patch to set message to INFO level) and see latencies on loading state,
snapshotting, cleaning up
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21469
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21469
retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21506
There're plenty of other debug messages which might hide the log messages
added from this patch. Would we want to log them with INFO instead of DEBUG
GitHub user HeartSaVioR opened a pull request:
https://github.com/apache/spark/pull/21506
[SPARK-24485][SS] Measure and log elapsed time for filesystem operations in
HDFSBackedStateStoreProvider
## What changes were proposed in this pull request?
This patch measures
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21477#discussion_r193740695
--- Diff: python/pyspark/sql/streaming.py ---
@@ -843,6 +844,169 @@ def trigger(self, processingTime=None, once=None,
continuous=None
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21500
Retaining versions of state is also relevant to do snapshotting the last
version in files: HDFSBackedStateStoreProvider doesn't snapshot if the version
doesn't exist in loadedMaps. So we may
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21500
@TomaszGaweda @aalobaidi
Please correct me if I'm missing here.
From every start of batch, state store loads previous version of state so
that it can be read and written. If we
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21469#discussion_r193622940
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryListenerSuite.scala
---
@@ -231,7 +231,7 @@ class
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21469
@arunmahadevan
Added custom metrics in state store to streaming query status as well. You
can see `providerLoadedMapSize` is added to `stateOperators.customMetrics` in
below output
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21497#discussion_r193374662
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/TextSocketStreamSuite.scala
---
@@ -256,6 +246,66 @@ class
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21497#discussion_r193372564
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/TextSocketStreamSuite.scala
---
@@ -256,6 +246,66 @@ class
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21477#discussion_r193304316
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/ForeachWriter.scala
---
@@ -20,10 +20,48 @@ package org.apache.spark.sql
import
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21477#discussion_r193285667
--- Diff: python/pyspark/sql/streaming.py ---
@@ -843,6 +844,169 @@ def trigger(self, processingTime=None, once=None,
continuous=None
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21477#discussion_r193286066
--- Diff: python/pyspark/sql/streaming.py ---
@@ -843,6 +844,169 @@ def trigger(self, processingTime=None, once=None,
continuous=None
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21477#discussion_r193286932
--- Diff: python/pyspark/sql/tests.py ---
@@ -1884,7 +1885,164 @@ def test_query_manager_await_termination(self):
finally
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21477#discussion_r193284839
--- Diff: python/pyspark/sql/streaming.py ---
@@ -843,6 +844,169 @@ def trigger(self, processingTime=None, once=None,
continuous=None
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21477#discussion_r193289099
--- Diff: python/pyspark/sql/tests.py ---
@@ -1884,7 +1885,164 @@ def test_query_manager_await_termination(self):
finally
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21477#discussion_r193284293
--- Diff: python/pyspark/sql/streaming.py ---
@@ -843,6 +844,169 @@ def trigger(self, processingTime=None, once=None,
continuous=None
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21477#discussion_r193289567
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/ForeachWriter.scala
---
@@ -20,10 +20,48 @@ package org.apache.spark.sql
import
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21477#discussion_r193291809
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonForeachWriter.scala
---
@@ -0,0 +1,161 @@
+/*
+ * Licensed
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21497#discussion_r193277616
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/TextSocketStreamSuite.scala
---
@@ -35,10 +34,11 @@ import
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21500
I agree that current cache approach may consume excessive memory
unnecessarily, and that's also same to my finding in #21469.
The issue is not that simple however, because in micro
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21497
@arunmahadevan
Yes, before the patch Spark connects to socket server twice: one for
getting schema, and another one for reading data.
And `-k` flag is only supported for specific
301 - 400 of 469 matches
Mail list logo