Github user liyintang commented on the pull request:
https://github.com/apache/spark/pull/12026#issuecomment-203010916
I thought the back pressure/flow control handles how many message to fetch,
not when to start to generate the job. IMHO, adding the jitter in the start
time is more
GitHub user liyintang opened a pull request:
https://github.com/apache/spark/pull/12026
[Spark-14230][STREAMING] Config the start time (jitter) for streamingâ¦
## What changes were proposed in this pull request?
Currently, RecurringTimer will normalize the start time
Github user liyintang closed the pull request at:
https://github.com/apache/spark/pull/11921
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user liyintang commented on the pull request:
https://github.com/apache/spark/pull/11921#issuecomment-201347094
Thanks @koeninger for the review and insightful comments ! It has been
great start experience to contribute to the community.
For this PR, I think it makes
Github user liyintang commented on the pull request:
https://github.com/apache/spark/pull/11921#issuecomment-201073631
I have verified the issue is caused by serializing MessageAndMetadata.
Previously:
```
val messageHandler = (mmd: MessageAndMetadata[String, String
Github user liyintang commented on the pull request:
https://github.com/apache/spark/pull/11921#issuecomment-200984928
@koeninger , that example is just for demonstrating the bug. The actual
code I run is more than count :)
I need to convert the kafka message to a dataframe
Github user liyintang commented on the pull request:
https://github.com/apache/spark/pull/11921#issuecomment-200960606
@koeninger:
here is the min code example to reproduce this issue:
<
// Create context
val ssc = new StreamingCont
Github user liyintang commented on the pull request:
https://github.com/apache/spark/pull/11921#issuecomment-200896260
@koeninger ,
Just curious, do we have a jira to discuss how exactly caching the result
of a transformation of KafkaRDD ? If there are multiple
transformations
Github user liyintang commented on the pull request:
https://github.com/apache/spark/pull/11921#issuecomment-200885500
@koeninger ,
This bug will be only triggered if there is a window function on the
KafkaRDD, which will force it to be serialized into blocks. Serializing
GitHub user liyintang opened a pull request:
https://github.com/apache/spark/pull/11921
[SPARK-14105] Deep copy each kafka message for serialization
## What changes were proposed in this pull request?
Due to serialization issue described in SPARK-14105, this PR is to deep
10 matches
Mail list logo