[GitHub] spark pull request: [Spark-14230][STREAMING] Config the start time...

2016-03-29 Thread liyintang
Github user liyintang commented on the pull request: https://github.com/apache/spark/pull/12026#issuecomment-203010916 I thought the back pressure/flow control handles how many message to fetch, not when to start to generate the job. IMHO, adding the jitter in the start time is more

[GitHub] spark pull request: [Spark-14230][STREAMING] Config the start time...

2016-03-28 Thread liyintang
GitHub user liyintang opened a pull request: https://github.com/apache/spark/pull/12026 [Spark-14230][STREAMING] Config the start time (jitter) for streaming… ## What changes were proposed in this pull request? Currently, RecurringTimer will normalize the start time

[GitHub] spark pull request: [SPARK-14105][STREAMING] Deep copy each kafka ...

2016-03-25 Thread liyintang
Github user liyintang closed the pull request at: https://github.com/apache/spark/pull/11921 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-14105][STREAMING] Deep copy each kafka ...

2016-03-25 Thread liyintang
Github user liyintang commented on the pull request: https://github.com/apache/spark/pull/11921#issuecomment-201347094 Thanks @koeninger for the review and insightful comments ! It has been great start experience to contribute to the community. For this PR, I think it makes

[GitHub] spark pull request: [SPARK-14105][STREAMING] Deep copy each kafka ...

2016-03-24 Thread liyintang
Github user liyintang commented on the pull request: https://github.com/apache/spark/pull/11921#issuecomment-201073631 I have verified the issue is caused by serializing MessageAndMetadata. Previously: ``` val messageHandler = (mmd: MessageAndMetadata[String, String

[GitHub] spark pull request: [SPARK-14105][STREAMING] Deep copy each kafka ...

2016-03-24 Thread liyintang
Github user liyintang commented on the pull request: https://github.com/apache/spark/pull/11921#issuecomment-200984928 @koeninger , that example is just for demonstrating the bug. The actual code I run is more than count :) I need to convert the kafka message to a dataframe

[GitHub] spark pull request: [SPARK-14105][STREAMING] Deep copy each kafka ...

2016-03-24 Thread liyintang
Github user liyintang commented on the pull request: https://github.com/apache/spark/pull/11921#issuecomment-200960606 @koeninger: here is the min code example to reproduce this issue: < // Create context val ssc = new StreamingCont

[GitHub] spark pull request: [SPARK-14105][STREAMING] Deep copy each kafka ...

2016-03-24 Thread liyintang
Github user liyintang commented on the pull request: https://github.com/apache/spark/pull/11921#issuecomment-200896260 @koeninger , Just curious, do we have a jira to discuss how exactly caching the result of a transformation of KafkaRDD ? If there are multiple transformations

[GitHub] spark pull request: [SPARK-14105][STREAMING] Deep copy each kafka ...

2016-03-24 Thread liyintang
Github user liyintang commented on the pull request: https://github.com/apache/spark/pull/11921#issuecomment-200885500 @koeninger , This bug will be only triggered if there is a window function on the KafkaRDD, which will force it to be serialized into blocks. Serializing

[GitHub] spark pull request: [SPARK-14105] Deep copy each kafka message for...

2016-03-23 Thread liyintang
GitHub user liyintang opened a pull request: https://github.com/apache/spark/pull/11921 [SPARK-14105] Deep copy each kafka message for serialization ## What changes were proposed in this pull request? Due to serialization issue described in SPARK-14105, this PR is to deep