spark git commit: [SPARK-5147][Streaming] Delete the received data WAL log periodically

2015-01-21 Thread tdas
Closes #4149 from tdas/SPARK-5147 and squashes the following commits: 730798b [Tathagata Das] Added comments. c4cf067 [Tathagata Das] Minor fixes 2579b27 [Tathagata Das] Refactored the fix to make sure that the cleanup respects the remember duration of all the receiver streams 2736fd1 [jerryshao

spark git commit: [HOTFIX] Fixed compilation error due to missing SparkContext._ implicit conversions.

2015-01-22 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.2 cab410c52 -> 5d07488ad [HOTFIX] Fixed compilation error due to missing SparkContext._ implicit conversions. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5d07488

spark git commit: [SPARK-5233][Streaming] Fix error replaying of WAL introduced bug

2015-01-22 Thread tdas
Repository: spark Updated Branches: refs/heads/master 820ce0359 -> 3c3fa632e [SPARK-5233][Streaming] Fix error replaying of WAL introduced bug Because of lacking of `BlockAllocationEvent` in WAL recovery, the dangled event will mix into the new batch, which will lead to the wrong result. Deta

spark git commit: [SPARK-5233][Streaming] Fix error replaying of WAL introduced bug

2015-01-22 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.2 5d07488ad -> 5aaf0e0ff [SPARK-5233][Streaming] Fix error replaying of WAL introduced bug Because of lacking of `BlockAllocationEvent` in WAL recovery, the dangled event will mix into the new batch, which will lead to the wrong result.

spark git commit: [SPARK-5315][Streaming] Fix reduceByWindow Java API not work bug

2015-01-22 Thread tdas
Repository: spark Updated Branches: refs/heads/master 3c3fa632e -> e0f7fb7f9 [SPARK-5315][Streaming] Fix reduceByWindow Java API not work bug `reduceByWindow` for Java API is actually not Java compatible, change to make it Java compatible. Current solution is to deprecate the old one and add

spark git commit: [SPARK-4631][streaming][FIX] Wait for a receiver to start before publishing test data.

2015-02-02 Thread tdas
Repository: spark Updated Branches: refs/heads/master 683e93824 -> e908322cd [SPARK-4631][streaming][FIX] Wait for a receiver to start before publishing test data. This fixes two sources of non-deterministic failures in this test: - wait for a receiver to be up before pushing data through MQ

spark git commit: [SPARK-5154] [PySpark] [Streaming] Kafka streaming support in Python

2015-02-02 Thread tdas
f github.com:apache/spark into kafka adeeb38 [Davies Liu] Merge pull request #3 from tdas/kafka-python-api aea8953 [Tathagata Das] Kafka-assembly for Python API eea16a7 [Davies Liu] refactor f6ce899 [Davies Liu] add example and fix bugs 98c8d17 [Davies Liu] fix python style 5697a01 [Davies Liu] bypass

spark git commit: [SPARK-5153][Streaming][Test] Increased timeout to deal with flaky KafkaStreamSuite

2015-02-03 Thread tdas
rom tdas/SPARK-5153 and squashes the following commits: dc42762 [Tathagata Das] Increased timeout to deal with delays in overloaded Jenkins. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/681f9df4 Tree: http://git-wip-us.apache.

spark git commit: [SPARK-5153][Streaming][Test] Increased timeout to deal with flaky KafkaStreamSuite

2015-02-03 Thread tdas
rom tdas/SPARK-5153 and squashes the following commits: dc42762 [Tathagata Das] Increased timeout to deal with delays in overloaded Jenkins. (cherry picked from commit 681f9df47ff40f7b0d9175d835e9758d33a13a06) Signed-off-by: Tathagata Das Project: http://git-wip-us.apache.org/repos/asf/spark/r

spark git commit: [SPARK-5153][Streaming][Test] Increased timeout to deal with flaky KafkaStreamSuite

2015-02-03 Thread tdas
rom tdas/SPARK-5153 and squashes the following commits: dc42762 [Tathagata Das] Increased timeout to deal with delays in overloaded Jenkins. (cherry picked from commit 681f9df47ff40f7b0d9175d835e9758d33a13a06) Signed-off-by: Tathagata Das Project: http://git-wip-us.apache.org/repos/asf/spark/r

spark git commit: [STREAMING] SPARK-4986 Wait for receivers to deregister and receiver job to terminate

2015-02-03 Thread tdas
Repository: spark Updated Branches: refs/heads/master 681f9df47 -> 1e8b5394b [STREAMING] SPARK-4986 Wait for receivers to deregister and receiver job to terminate A slow receiver might not have enough time to shutdown cleanly even when graceful shutdown is used. This PR extends graceful wait

spark git commit: [STREAMING] SPARK-4986 Wait for receivers to deregister and receiver job to terminate

2015-02-03 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.3 d644bd96a -> 092d4ba57 [STREAMING] SPARK-4986 Wait for receivers to deregister and receiver job to terminate A slow receiver might not have enough time to shutdown cleanly even when graceful shutdown is used. This PR extends graceful

spark git commit: [STREAMING] SPARK-4986 Wait for receivers to deregister and receiver job to terminate

2015-02-03 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.2 36c299430 -> 62c758753 [STREAMING] SPARK-4986 Wait for receivers to deregister and receiver job to terminate A slow receiver might not have enough time to shutdown cleanly even when graceful shutdown is used. This PR extends graceful

spark git commit: [SPARK-4969][STREAMING][PYTHON] Add binaryRecords to streaming

2015-02-03 Thread tdas
ark Streaming - new unit tests in Scala and Python This required adding an optional Hadoop configuration param to `fileStream` and `FileInputStream`, but was otherwise straightforward. tdas davies Author: freeman Closes #3803 from freeman-lab/streaming-binary-records and squashes the follow

spark git commit: [SPARK-4969][STREAMING][PYTHON] Add binaryRecords to streaming

2015-02-03 Thread tdas
new PySpark Streaming - new unit tests in Scala and Python This required adding an optional Hadoop configuration param to `fileStream` and `FileInputStream`, but was otherwise straightforward. tdas davies Author: freeman Closes #3803 from freeman-lab/streaming-binary-records and squashes

spark git commit: [SPARK-5379][Streaming] Add awaitTerminationOrTimeout

2015-02-04 Thread tdas
Repository: spark Updated Branches: refs/heads/master 6aed719e5 -> 4cf4cba08 [SPARK-5379][Streaming] Add awaitTerminationOrTimeout Added `awaitTerminationOrTimeout` to return if the waiting time elapsed: * `true` if it's stopped. * `false` if the waiting time elapsed before returning from the

spark git commit: [SPARK-5379][Streaming] Add awaitTerminationOrTimeout

2015-02-04 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.3 3b7acd22a -> 4d3dbfda3 [SPARK-5379][Streaming] Add awaitTerminationOrTimeout Added `awaitTerminationOrTimeout` to return if the waiting time elapsed: * `true` if it's stopped. * `false` if the waiting time elapsed before returning from

spark git commit: [Minor] Fix incorrect warning log

2015-02-04 Thread tdas
Repository: spark Updated Branches: refs/heads/master 4cf4cba08 -> a74cbbf12 [Minor] Fix incorrect warning log The warning log looks incorrect. Just fix it. Author: Liang-Chi Hsieh Closes #4360 from viirya/fixing_typo and squashes the following commits: 48fbe4f [Liang-Chi Hsieh] Fix incorr

spark git commit: [Minor] Fix incorrect warning log

2015-02-04 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.3 4d3dbfda3 -> 316a4bb54 [Minor] Fix incorrect warning log The warning log looks incorrect. Just fix it. Author: Liang-Chi Hsieh Closes #4360 from viirya/fixing_typo and squashes the following commits: 48fbe4f [Liang-Chi Hsieh] Fix in

spark git commit: [Minor] Fix incorrect warning log

2015-02-04 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.2 379976320 -> f318af0fd [Minor] Fix incorrect warning log The warning log looks incorrect. Just fix it. Author: Liang-Chi Hsieh Closes #4360 from viirya/fixing_typo and squashes the following commits: 48fbe4f [Liang-Chi Hsieh] Fix in

spark git commit: [SPARK-4964] [Streaming] Exactly-once semantics for Kafka

2015-02-04 Thread tdas
a adf99a6 [cody koeninger] [SPARK-4964] fix serialization issues for checkpointing 1d50749 [cody koeninger] [SPARK-4964] code cleanup per tdas 8bfd6c0 [cody koeninger] [SPARK-4964] configure rate limiting via spark.streaming.receiver.maxRate e09045b [cody koeninger] [SPARK-4964] add foreachPa

spark git commit: [SPARK-4964] [Streaming] Exactly-once semantics for Kafka

2015-02-04 Thread tdas
a adf99a6 [cody koeninger] [SPARK-4964] fix serialization issues for checkpointing 1d50749 [cody koeninger] [SPARK-4964] code cleanup per tdas 8bfd6c0 [cody koeninger] [SPARK-4964] configure rate limiting via spark.streaming.receiver.maxRate e09045b [cody koeninger] [SPARK-4964] add foreachPa

spark git commit: [SPARK-4707][STREAMING] Reliable Kafka Receiver can lose data if the blo...

2015-02-04 Thread tdas
Repository: spark Updated Branches: refs/heads/master b0c002195 -> f0500f9fa [SPARK-4707][STREAMING] Reliable Kafka Receiver can lose data if the blo... ...ck generator fails to store data. The Reliable Kafka Receiver commits offsets only when events are actually stored, which ensures that o

spark git commit: [SPARK-4707][STREAMING] Reliable Kafka Receiver can lose data if the blo...

2015-02-04 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.3 a119cae48 -> 14c9f32d8 [SPARK-4707][STREAMING] Reliable Kafka Receiver can lose data if the blo... ...ck generator fails to store data. The Reliable Kafka Receiver commits offsets only when events are actually stored, which ensures th

git commit: SPARK-2932 [STREAMING] Move MasterFailureTest out of "main" source directory

2014-09-25 Thread tdas
Repository: spark Updated Branches: refs/heads/master b8487713d -> c3f2a8588 SPARK-2932 [STREAMING] Move MasterFailureTest out of "main" source directory (HT @vanzin) Whatever the reason was for having this test class in `main`, if there is one, appear to be moot. This may have been a result

git commit: SPARK-3744 [STREAMING] FlumeStreamSuite will fail during port contention

2014-09-30 Thread tdas
Repository: spark Updated Branches: refs/heads/master d3a3840e0 -> 8764fe368 SPARK-3744 [STREAMING] FlumeStreamSuite will fail during port contention Since it looked quite easy, I took the liberty of making a quick PR that just uses `Utils.startServiceOnPort` to fix this. It works locally for

git commit: Typo error in KafkaWordCount example

2014-10-01 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.1 b4f690d36 -> 24ee61625 Typo error in KafkaWordCount example topicpMap to topicMap Author: Gaspar Munoz Closes #2614 from gasparms/patch-1 and squashes the following commits: 00aab2c [Gaspar Munoz] Typo error in KafkaWordCount exampl

git commit: Typo error in KafkaWordCount example

2014-10-01 Thread tdas
Repository: spark Updated Branches: refs/heads/master 8cc70e7e1 -> b81ee0b46 Typo error in KafkaWordCount example topicpMap to topicMap Author: Gaspar Munoz Closes #2614 from gasparms/patch-1 and squashes the following commits: 00aab2c [Gaspar Munoz] Typo error in KafkaWordCount example

[1/2] [SPARK-2377] Python API for Streaming

2014-10-12 Thread tdas
Repository: spark Updated Branches: refs/heads/master 7a3f589ef -> 69c67abaa http://git-wip-us.apache.org/repos/asf/spark/blob/69c67aba/python/pyspark/streaming/tests.py -- diff --git a/python/pyspark/streaming/tests.py b/pyth

[2/2] git commit: [SPARK-2377] Python API for Streaming

2014-10-12 Thread tdas
kagiwa] fied input of socketTextDStream dd6de81 [Ken Takagiwa] initial commit for socketTextStream 247fd74 [Ken Takagiwa] modified the code base on comment in https://github.com/tdas/spark/pull/10 4bcb318 [Ken Takagiwa] implementing transform function in Python 38adf95 [Ken Takagiwa] added reduce

git commit: [SPARK-3912][Streaming] Fixed flakyFlumeStreamSuite

2014-10-13 Thread tdas
out batches. Author: Tathagata Das Closes #2773 from tdas/flume-test-fix and squashes the following commits: 93cd7f6 [Tathagata Das] Reimplimented FlumeStreamSuite to be more robust. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/sp

git commit: [SPARK-4026][Streaming] Write ahead log management

2014-10-24 Thread tdas
ing HDFS mini cluster. Author: Hari Shreedharan Author: Tathagata Das Closes #2882 from tdas/driver-ha-wal and squashes the following commits: e4bee20 [Tathagata Das] Removed synchronized, Path.getFileSystem is threadsafe 55514e2 [Tathagata Das] Minor changes based on PR comments. d29f

git commit: [SPARK-4028][Streaming] ReceivedBlockHandler interface to abstract the functionality of storage of received data

2014-10-30 Thread tdas
s a write ahead log. Author: Tathagata Das Closes #2940 from tdas/driver-ha-rbh and squashes the following commits: 78a4aaa [Tathagata Das] Fixed bug causing test failures. f192f47 [Tathagata Das] Fixed import order. df5f320 [Tathagata Das] Updated code to use ReceivedBlockStoreResult as the ret

git commit: [SPARK-4027][Streaming] WriteAheadLogBackedBlockRDD to read received either from BlockManager or WAL in HDFS

2014-10-30 Thread tdas
s a BlockRDD that is backed by HDFS. This BlockRDD can either read data from the Spark's BlockManager, or read the data from file-segments in write ahead log in HDFS. Most of this code has been written by @harishreedharan Author: Tathagata Das Author: Hari Shreedharan Closes #2931 f

git commit: [SPARK-4029][Streaming] Update streaming driver to reliably save and recover received block metadata on driver failures

2014-11-05 Thread tdas
hat tests the driver recovery, by killing and restarting the streaming context, and verifying all the input data gets processed. This has been implemented but not included in this PR yet. A sneak peek of that DriverFailureSuite can be found in this PR (on my personal repo): https://github.com/t

git commit: [SPARK-4029][Streaming] Update streaming driver to reliably save and recover received block metadata on driver failures

2014-11-05 Thread tdas
com/tdas/spark/pull/25 I can either include it in this PR, or submit that as a separate PR after this gets in. - *WAL cleanup:* Cleaning up the received data write ahead log, by calling `ReceivedBlockHandler.cleanupOldBlocks`. This is being worked on. Author: Tathagata Das Closes #3026 from t

spark git commit: Update JavaCustomReceiver.java

2014-11-07 Thread tdas
Repository: spark Updated Branches: refs/heads/master d6e555244 -> 7c9ec529a Update JavaCustomReceiver.java 数组下标越界 Author: xiao321 <1042460...@qq.com> Closes #3153 from xiao321/patch-1 and squashes the following commits: 0ed17b5 [xiao321] Update JavaCustomReceiver.java Project

spark git commit: Update JavaCustomReceiver.java

2014-11-07 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.2 47bd8f302 -> 8cefb63c1 Update JavaCustomReceiver.java 数组下标越界 Author: xiao321 <1042460...@qq.com> Closes #3153 from xiao321/patch-1 and squashes the following commits: 0ed17b5 [xiao321] Update JavaCustomReceiver.java (che

spark git commit: Update JavaCustomReceiver.java

2014-11-07 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.1 0a40eac25 -> 4fb26df87 Update JavaCustomReceiver.java 数组下标越界 Author: xiao321 <1042460...@qq.com> Closes #3153 from xiao321/patch-1 and squashes the following commits: 0ed17b5 [xiao321] Update JavaCustomReceiver.java (che

spark git commit: Update JavaCustomReceiver.java

2014-11-07 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.0 76c20cac9 -> 18c8c3833 Update JavaCustomReceiver.java 数组下标越界 Author: xiao321 <1042460...@qq.com> Closes #3153 from xiao321/patch-1 and squashes the following commits: 0ed17b5 [xiao321] Update JavaCustomReceiver.java (che

spark git commit: [SPARK-4301] StreamingContext should not allow start() to be called after calling stop()

2014-11-08 Thread tdas
Repository: spark Updated Branches: refs/heads/master 4af5c7e24 -> 7b41b17f3 [SPARK-4301] StreamingContext should not allow start() to be called after calling stop() In Spark 1.0.0+, calling `stop()` on a StreamingContext that has not been started is a no-op which has no side-effects. This a

spark git commit: [SPARK-4301] StreamingContext should not allow start() to be called after calling stop()

2014-11-08 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.2 05bffcc02 -> 21b9ac062 [SPARK-4301] StreamingContext should not allow start() to be called after calling stop() In Spark 1.0.0+, calling `stop()` on a StreamingContext that has not been started is a no-op which has no side-effects. Th

spark git commit: [SPARK-4301] StreamingContext should not allow start() to be called after calling stop()

2014-11-08 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.1 4895f6544 -> 78cd3ab88 [SPARK-4301] StreamingContext should not allow start() to be called after calling stop() In Spark 1.0.0+, calling `stop()` on a StreamingContext that has not been started is a no-op which has no side-effects. Th

spark git commit: [SPARK-4301] StreamingContext should not allow start() to be called after calling stop()

2014-11-08 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.0 d4aed266d -> 395656c8e [SPARK-4301] StreamingContext should not allow start() to be called after calling stop() In Spark 1.0.0+, calling `stop()` on a StreamingContext that has not been started is a no-op which has no side-effects. Th

spark git commit: SPARK-2548 [STREAMING] JavaRecoverableWordCount is missing

2014-11-10 Thread tdas
Repository: spark Updated Branches: refs/heads/master ed8bf1eac -> 3a02d416c SPARK-2548 [STREAMING] JavaRecoverableWordCount is missing Here's my attempt to re-port `RecoverableNetworkWordCount` to Java, following the example of its Scala and Java siblings. I fixed a few minor doc/formatting

spark git commit: SPARK-2548 [STREAMING] JavaRecoverableWordCount is missing

2014-11-10 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.2 69dd2997f -> ca3fe8c12 SPARK-2548 [STREAMING] JavaRecoverableWordCount is missing Here's my attempt to re-port `RecoverableNetworkWordCount` to Java, following the example of its Scala and Java siblings. I fixed a few minor doc/formatt

spark git commit: SPARK-2548 [STREAMING] JavaRecoverableWordCount is missing

2014-11-10 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.1 dc38defd2 -> cdcf5467a SPARK-2548 [STREAMING] JavaRecoverableWordCount is missing Here's my attempt to re-port `RecoverableNetworkWordCount` to Java, following the example of its Scala and Java siblings. I fixed a few minor doc/formatt

spark git commit: Update RecoverableNetworkWordCount.scala

2014-11-10 Thread tdas
Repository: spark Updated Branches: refs/heads/master 3a02d416c -> 0340c56a9 Update RecoverableNetworkWordCount.scala Trying this example, I missed the moment when the checkpoint was iniciated Author: comcmipi Closes #2735 from comcmipi/patch-1 and squashes the following commits: b6d8001 [

spark git commit: Update RecoverableNetworkWordCount.scala

2014-11-10 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.1 cdcf5467a -> 254b13570 Update RecoverableNetworkWordCount.scala Trying this example, I missed the moment when the checkpoint was iniciated Author: comcmipi Closes #2735 from comcmipi/patch-1 and squashes the following commits: b6d80

spark git commit: Update RecoverableNetworkWordCount.scala

2014-11-10 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.0 395656c8e -> 66d9070bd Update RecoverableNetworkWordCount.scala Trying this example, I missed the moment when the checkpoint was iniciated Author: comcmipi Closes #2735 from comcmipi/patch-1 and squashes the following commits: b6d80

spark git commit: [SPARK-2548][HOTFIX][Streaming] Removed use of o.a.s.streaming.Durations in branch 1.1

2014-11-10 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.1 254b13570 -> 86b1bd031 [SPARK-2548][HOTFIX][Streaming] Removed use of o.a.s.streaming.Durations in branch 1.1 Author: Tathagata Das Closes #3188 from tdas/branch-1.1 and squashes the following commits: f1996d3 [Tathagata

spark git commit: [SPARK-3954][Streaming] Optimization to FileInputDStream

2014-11-10 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.2 f0eb0a79c -> 07ba50f7e [SPARK-3954][Streaming] Optimization to FileInputDStream about convert files to RDDS there are 3 loops with files sequence in spark source. loops files sequence: 1.files.map(...) 2.files.zip(fileRDDs) 3.files-siz

spark git commit: [SPARK-3954][Streaming] Optimization to FileInputDStream

2014-11-10 Thread tdas
Repository: spark Updated Branches: refs/heads/master a1fc059b6 -> ce6ed2abd [SPARK-3954][Streaming] Optimization to FileInputDStream about convert files to RDDS there are 3 loops with files sequence in spark source. loops files sequence: 1.files.map(...) 2.files.zip(fileRDDs) 3.files-size.fo

spark git commit: [SPARK-3954][Streaming] Optimization to FileInputDStream

2014-11-10 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.1 64945f868 -> 3d889dfc1 [SPARK-3954][Streaming] Optimization to FileInputDStream about convert files to RDDS there are 3 loops with files sequence in spark source. loops files sequence: 1.files.map(...) 2.files.zip(fileRDDs) 3.files-siz

spark git commit: [SPARK-3495][SPARK-3496] Backporting block replication fixes made in master to branch 1.1

2014-11-10 Thread tdas
of NioBlockTransferService, which required slight modification to unit tests. Other than that the code is exactly same as in the original PR. Please refer to discussion in the original PR if you have any thoughts. Author: Tathagata Das Closes #3191 from tdas/replication-fix-branch-1.1-backport and squas

spark git commit: [SPARK-4295][External]Fix exception in SparkSinkSuite

2014-11-11 Thread tdas
Repository: spark Updated Branches: refs/heads/master ef29a9a9a -> f8811a569 [SPARK-4295][External]Fix exception in SparkSinkSuite Handle exception in SparkSinkSuite, please refer to [SPARK-4295] Author: maji2014 Closes #3177 from maji2014/spark-4295 and squashes the following commits: 312

spark git commit: [SPARK-4295][External]Fix exception in SparkSinkSuite

2014-11-11 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.2 e9d009dc3 -> fe8a1cd29 [SPARK-4295][External]Fix exception in SparkSinkSuite Handle exception in SparkSinkSuite, please refer to [SPARK-4295] Author: maji2014 Closes #3177 from maji2014/spark-4295 and squashes the following commits:

spark git commit: [SPARK-4295][External]Fix exception in SparkSinkSuite

2014-11-11 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.1 b2cb357d7 -> bf867c317 [SPARK-4295][External]Fix exception in SparkSinkSuite Handle exception in SparkSinkSuite, please refer to [SPARK-4295] Author: maji2014 Closes #3177 from maji2014/spark-4295 and squashes the following commits:

spark git commit: [SPARK-2492][Streaming] kafkaReceiver minor changes to align with Kafka 0.8

2014-11-11 Thread tdas
launched will introduce issue as mentioned in [SPARK-2383](https://issues.apache.org/jira/browse/SPARK-2383). So Here we change to offer user to API to explicitly reset offset before create Kafka stream, while in the meantime keep the same behavior as Kafka 0.8 for parameter `auto.offset.reset`

spark git commit: [SPARK-2492][Streaming] kafkaReceiver minor changes to align with Kafka 0.8

2014-11-11 Thread tdas
hed will introduce issue as mentioned in [SPARK-2383](https://issues.apache.org/jira/browse/SPARK-2383). So Here we change to offer user to API to explicitly reset offset before create Kafka stream, while in the meantime keep the same behavior as Kafka 0.8 for parameter `auto.offset.reset`. @tda

spark git commit: [Streaming][Minor]Replace some 'if-else' in Clock

2014-11-11 Thread tdas
Repository: spark Updated Branches: refs/heads/master c8850a3d6 -> 6e03de304 [Streaming][Minor]Replace some 'if-else' in Clock Replace some 'if-else' statement by math.min and math.max in Clock.scala Author: huangzhaowei Closes #3088 from SaintBacchus/StreamingClock and squashes the followi

spark git commit: [Streaming][Minor]Replace some 'if-else' in Clock

2014-11-11 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.2 7710b7156 -> cc1f3a0d6 [Streaming][Minor]Replace some 'if-else' in Clock Replace some 'if-else' statement by math.min and math.max in Clock.scala Author: huangzhaowei Closes #3088 from SaintBacchus/StreamingClock and squashes the fol

spark git commit: [SPARK-3660][STREAMING] Initial RDD for updateStateByKey transformation

2014-11-12 Thread tdas
Repository: spark Updated Branches: refs/heads/master 4b736dbab -> 36ddeb7bf [SPARK-3660][STREAMING] Initial RDD for updateStateByKey transformation SPARK-3660 : Initial RDD for updateStateByKey transformation I have added a sample StatefulNetworkWordCountWithInitial inspired by StatefulNetw

spark git commit: [SPARK-4062][Streaming]Add ReliableKafkaReceiver in Spark Streaming Kafka connector

2014-11-14 Thread tdas
doc can be seen in [SPARK-4062](https://issues.apache.org/jira/browse/SPARK-4062). Author: jerryshao Author: Tathagata Das Author: Saisai Shao Closes #2991 from jerryshao/kafka-refactor and squashes the following commits: 5461f1c [Saisai Shao] Merge pull request #8 from tdas/kafka-refact

spark git commit: [SPARK-4062][Streaming]Add ReliableKafkaReceiver in Spark Streaming Kafka connector

2014-11-14 Thread tdas
can be seen in [SPARK-4062](https://issues.apache.org/jira/browse/SPARK-4062). Author: jerryshao Author: Tathagata Das Author: Saisai Shao Closes #2991 from jerryshao/kafka-refactor and squashes the following commits: 5461f1c [Saisai Shao] Merge pull request #8 from tdas/kafka-refactor3 eae4

spark git commit: [SPARK-6766][Streaming] Fix issue about StreamingListenerBatchSubmitted and StreamingListenerBatchStarted (backport to branch 1.3)

2015-04-14 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.3 db2154d7d -> 1ab423f6e [SPARK-6766][Streaming] Fix issue about StreamingListenerBatchSubmitted and StreamingListenerBatchStarted (backport to branch 1.3) Backport SPARK-6766 #5414 to branch 1.3 Conflicts: streaming/src/main/

spark git commit: [SPARK-6796][Streaming][WebUI] Add "Active Batches" and "Completed Batches" lists to StreamingPage

2015-04-14 Thread tdas
Repository: spark Updated Branches: refs/heads/master a76b921a9 -> 6de282e2d [SPARK-6796][Streaming][WebUI] Add "Active Batches" and "Completed Batches" lists to StreamingPage This PR adds two lists, `Active Batches` and `Completed Batches`. Here is the screenshot: ![batch_list](https://clo

spark git commit: [HOTFIX][SQL] Fix broken cached test

2015-04-22 Thread tdas
Repository: spark Updated Branches: refs/heads/master 03e85b4a1 -> d9e70f331 [HOTFIX][SQL] Fix broken cached test Added in #5475. Pointed as broken in #5639. /cc marmbrus Author: Liang-Chi Hsieh Closes #5640 from viirya/fix_cached_test and squashes the following commits: c0cf69a [Liang-Chi

spark git commit: [SPARK-6752][Streaming] Allow StreamingContext to be recreated from checkpoint and existing SparkContext

2015-04-23 Thread tdas
create StreamingContext using the provided createFunction TODO: the corresponding Java and Python API has to be added as well. Author: Tathagata Das Closes #5428 from tdas/SPARK-6752 and squashes the following commits: 94db63c [Tathagata Das] Fix long line. 524f519 [Tathagata Das] Many changes

spark git commit: [SPARK-5946] [STREAMING] Add Python API for direct Kafka stream

2015-04-27 Thread tdas
Repository: spark Updated Branches: refs/heads/master 29576e786 -> 9e4e82b7b [SPARK-5946] [STREAMING] Add Python API for direct Kafka stream Currently only added `createDirectStream` API, I'm not sure if `createRDD` is also needed, since some Java object needs to be wrapped in Python. Please

spark git commit: [SPARK-7138] [STREAMING] Add method to BlockGenerator to add multiple records to BlockGenerator with single callback

2015-04-28 Thread tdas
to add them but want the callback function to be called only once. This is for internal use only for improvement to Kinesis Receiver that we are planning to do. Author: Tathagata Das Closes #5695 from tdas/SPARK-7138 and squashes the following commits: a35cf7d [Tathagata Das] Fixed style. a7a4

[2/2] spark git commit: [SPARK-7056] [STREAMING] Make the Write Ahead Log pluggable

2015-04-29 Thread tdas
/1A2XaOLRFzvIZSi18i_luNw5Rmm9j2j4AigktXxIYxmY/edit?usp=sharing Things to add. * Unit tests for WriteAheadLogUtils Author: Tathagata Das Closes #5645 from tdas/wal-pluggable and squashes the following commits: 2c431fd [Tathagata Das] Minor fixes. c2bc7384 [Tathagata Das] More changes based on PR comments

[1/2] spark git commit: [SPARK-7056] [STREAMING] Make the Write Ahead Log pluggable

2015-04-29 Thread tdas
Repository: spark Updated Branches: refs/heads/master c0c0ba6d2 -> 1868bd40d http://git-wip-us.apache.org/repos/asf/spark/blob/1868bd40/streaming/src/test/scala/org/apache/spark/streaming/ReceivedBlockTrackerSuite.scala -- diff

spark git commit: [SPARK-6752] [STREAMING] [REOPENED] Allow StreamingContext to be recreated from checkpoint and existing SparkContext

2015-04-29 Thread tdas
IRA). This replaces MutableBoolean with AtomicBoolean. srowen pwendell Author: Tathagata Das Closes #5773 from tdas/SPARK-6752 and squashes the following commits: a0c0ead [Tathagata Das] Fix for hadoop 1.0.4 70ae85b [Tathagata Das] Merge remote-tracking branch 'apache-github/master' in

spark git commit: [SPARK-6629] cancelJobGroup() may not work for jobs whose job groups are inherited from parent threads

2015-04-29 Thread tdas
Repository: spark Updated Branches: refs/heads/master a9c4e2995 -> 3a180c19a [SPARK-6629] cancelJobGroup() may not work for jobs whose job groups are inherited from parent threads When a job is submitted with a job group and that job group is inherited from a parent thread, there are multipl

spark git commit: [SPARK-6862] [STREAMING] [WEBUI] Add BatchPage to display details of a batch

2015-04-29 Thread tdas
Repository: spark Updated Branches: refs/heads/master 114bad606 -> 1b7106b86 [SPARK-6862] [STREAMING] [WEBUI] Add BatchPage to display details of a batch This is an initial commit for SPARK-6862. Once SPARK-6796 is merged, I will add the links to StreamingPage so that the user can jump to Bat

spark git commit: [SPARK-7282] [STREAMING] Fix the race conditions in StreamingListenerSuite

2015-04-30 Thread tdas
Repository: spark Updated Branches: refs/heads/master beeafcfd6 -> 69a739c7f [SPARK-7282] [STREAMING] Fix the race conditions in StreamingListenerSuite Fixed the following flaky test ```Scala [info] StreamingListenerSuite: [info] - batch info reporting (782 milliseconds) [info] - receiver info

spark git commit: [SPARK-7309] [CORE] [STREAMING] Shutdown the thread pools in ReceivedBlockHandler and DAGScheduler

2015-05-01 Thread tdas
Repository: spark Updated Branches: refs/heads/master 98e704580 -> ebc25a4dd [SPARK-7309] [CORE] [STREAMING] Shutdown the thread pools in ReceivedBlockHandler and DAGScheduler Shutdown the thread pools in ReceivedBlockHandler and DAGScheduler when stopping them. Author: zsxwing Closes #58

spark git commit: [SPARK-7112][Streaming][WIP] Add a InputInfoTracker to track all the input streams

2015-05-01 Thread tdas
Repository: spark Updated Branches: refs/heads/master ebc25a4dd -> b88c275e6 [SPARK-7112][Streaming][WIP] Add a InputInfoTracker to track all the input streams Author: jerryshao Author: Saisai Shao Closes #5680 from jerryshao/SPARK-7111 and squashes the following commits: 339f854 [Saisai

spark git commit: [SPARK-2808][Streaming][Kafka] update kafka to 0.8.2

2015-05-01 Thread tdas
RK-2808][Streaming][Kafka] add more logging to python test, see why its timing out in jenkins 2b92d3f [cody koeninger] [SPARK-2808][Streaming][Kafka] wait for leader offsets in the java test as well 3824ce3 [cody koeninger] [SPARK-2808][Streaming][Kafka] naming / comments per tdas 61b3464 [cody koen

spark git commit: [SPARK-7315] [STREAMING] [TEST] Fix flaky WALBackedBlockRDDSuite

2015-05-02 Thread tdas
rom tdas/SPARK-7315 and squashes the following commits: 141afd5 [Tathagata Das] Removed use of FileUtils b08d4f1 [Tathagata Das] Fix flaky WALBackedBlockRDDSuite Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ecc6eb50 Tree: h

spark git commit: [SPARK-7139] [STREAMING] Allow received block metadata to be saved to WAL and recovered on driver failure

2015-05-05 Thread tdas
zed WALBackedBlockRDD by skipping block fetch when the block is known to not exist in Spark Author: Tathagata Das Closes #5732 from tdas/SPARK-7139 and squashes the following commits: 575476e [Tathagata Das] Added more tests to get 100% coverage of the WALBackedBlockRDD 19668ba [Tathagata Das] Merge rem

spark git commit: [SPARK-7139] [STREAMING] Allow received block metadata to be saved to WAL and recovered on driver failure

2015-05-05 Thread tdas
zed WALBackedBlockRDD by skipping block fetch when the block is known to not exist in Spark Author: Tathagata Das Closes #5732 from tdas/SPARK-7139 and squashes the following commits: 575476e [Tathagata Das] Added more tests to get 100% coverage of the WALBackedBlockRDD 19668ba [Tathagata Das] Merge rem

spark git commit: [HOTFIX] [TEST] Ignoring flaky tests

2015-05-05 Thread tdas
edu/jenkins/job/Spark-Master-SBT/2269/ Author: Tathagata Das Closes #5901 from tdas/ignore-flaky-tests and squashes the following commits: 9cd8667 [Tathagata Das] Ignoring tests. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/com

spark git commit: [HOTFIX] [TEST] Ignoring flaky tests

2015-05-05 Thread tdas
tps://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/2269/ Author: Tathagata Das Closes #5901 from tdas/ignore-flaky-tests and squashes the following commits: 9cd8667 [Tathagata Das] Ignoring tests. (cherry picked from commit 8776fe0b93b6e6d718738bcaf9838a2196e12c8a) Signed-off-by: Tathagata Das Project: h

spark git commit: [SPARK-7113] [STREAMING] Support input information reporting for Direct Kafka stream

2015-05-05 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.4 e8f847a41 -> becdb811d [SPARK-7113] [STREAMING] Support input information reporting for Direct Kafka stream Author: jerryshao Closes #5879 from jerryshao/SPARK-7113 and squashes the following commits: b0b506c [jerryshao] Address the

spark git commit: [SPARK-7113] [STREAMING] Support input information reporting for Direct Kafka stream

2015-05-05 Thread tdas
Repository: spark Updated Branches: refs/heads/master 8776fe0b9 -> 8436f7e98 [SPARK-7113] [STREAMING] Support input information reporting for Direct Kafka stream Author: jerryshao Closes #5879 from jerryshao/SPARK-7113 and squashes the following commits: b0b506c [jerryshao] Address the com

spark git commit: [SPARK-7341] [STREAMING] [TESTS] Fix the flaky test: org.apache.spark.stre...

2015-05-05 Thread tdas
Repository: spark Updated Branches: refs/heads/master 8436f7e98 -> 4d29867ed [SPARK-7341] [STREAMING] [TESTS] Fix the flaky test: org.apache.spark.stre... ...aming.InputStreamsSuite.socket input stream Remove non-deterministic "Thread.sleep" and use deterministic strategies to fix the flaky

spark git commit: [SPARK-7341] [STREAMING] [TESTS] Fix the flaky test: org.apache.spark.stre...

2015-05-05 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.4 becdb811d -> 063451068 [SPARK-7341] [STREAMING] [TESTS] Fix the flaky test: org.apache.spark.stre... ...aming.InputStreamsSuite.socket input stream Remove non-deterministic "Thread.sleep" and use deterministic strategies to fix the fl

spark git commit: [SPARK-6939] [STREAMING] [WEBUI] Add timeline and histogram graphs for streaming statistics

2015-05-05 Thread tdas
Repository: spark Updated Branches: refs/heads/master 47728db7c -> 489700c80 [SPARK-6939] [STREAMING] [WEBUI] Add timeline and histogram graphs for streaming statistics This is the initial work of SPARK-6939. Not yet ready for code review. Here are the screenshots: ![graph1](https://cloud.g

spark git commit: [SPARK-6939] [STREAMING] [WEBUI] Add timeline and histogram graphs for streaming statistics

2015-05-05 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.4 94ac9eba2 -> 8109c9e10 [SPARK-6939] [STREAMING] [WEBUI] Add timeline and histogram graphs for streaming statistics This is the initial work of SPARK-6939. Not yet ready for code review. Here are the screenshots: ![graph1](https://clo

spark git commit: [SPARK-7351] [STREAMING] [DOCS] Add spark.streaming.ui.retainedBatches to docs

2015-05-05 Thread tdas
Repository: spark Updated Branches: refs/heads/master 735bc3d04 -> fec7b29f5 [SPARK-7351] [STREAMING] [DOCS] Add spark.streaming.ui.retainedBatches to docs The default value will be changed to `1000` in #5533. So here I just used `1000`. Author: zsxwing Closes #5899 from zsxwing/SPARK-7351

spark git commit: [SPARK-7351] [STREAMING] [DOCS] Add spark.streaming.ui.retainedBatches to docs

2015-05-05 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.4 c68d0e235 -> 4c95fe5ff [SPARK-7351] [STREAMING] [DOCS] Add spark.streaming.ui.retainedBatches to docs The default value will be changed to `1000` in #5533. So here I just used `1000`. Author: zsxwing Closes #5899 from zsxwing/SPARK-

spark git commit: [SPARK-8882] [STREAMING] Add a new Receiver scheduling mechanism

2015-07-27 Thread tdas
Repository: spark Updated Branches: refs/heads/master ce89ff477 -> daa1964b6 [SPARK-8882] [STREAMING] Add a new Receiver scheduling mechanism The design doc: https://docs.google.com/document/d/1ZsoRvHjpISPrDmSjsGzuSu8UjwgbtmoCTzmhgTurHJw/edit?usp=sharing Author: zsxwing Closes #7276 from z

spark git commit: [SPARK-9335] [STREAMING] [TESTS] Make sure the test stream is deleted in KinesisBackedBlockRDDSuite

2015-07-27 Thread tdas
Repository: spark Updated Branches: refs/heads/master 9c5612f4e -> d93ab93d6 [SPARK-9335] [STREAMING] [TESTS] Make sure the test stream is deleted in KinesisBackedBlockRDDSuite KinesisBackedBlockRDDSuite should make sure delete the stream. Author: zsxwing Closes #7663 from zsxwing/fix-SPAR

spark git commit: [STREAMING] [HOTFIX] Ignore ReceiverTrackerSuite flaky test

2015-07-28 Thread tdas
Repository: spark Updated Branches: refs/heads/master 59b92add7 -> c5ed36953 [STREAMING] [HOTFIX] Ignore ReceiverTrackerSuite flaky test Author: Tathagata Das Closes #7738 from tdas/ReceiverTrackerSuite-hotfix and squashes the following commits: 00f0ee1 [Tathagata Das] ignore flaky t

spark git commit: [SPARK-8977] [STREAMING] Defines the RateEstimator interface, and impements the RateController

2015-07-29 Thread tdas
Repository: spark Updated Branches: refs/heads/master 069a4c414 -> 819be46e5 [SPARK-8977] [STREAMING] Defines the RateEstimator interface, and impements the RateController Based on #7471. - [x] add a test that exercises the publish path from driver to receiver - [ ] remove Serializable from

spark git commit: [SPARK-9335] [TESTS] Enable Kinesis tests only when files in extras/kinesis-asl are changed

2015-07-30 Thread tdas
Repository: spark Updated Branches: refs/heads/master 1221849f9 -> 76f2e393a [SPARK-9335] [TESTS] Enable Kinesis tests only when files in extras/kinesis-asl are changed Author: zsxwing Closes #7711 from zsxwing/SPARK-9335-test and squashes the following commits: c13ec2f [zsxwing] environs

spark git commit: [SPARK-9479] [STREAMING] [TESTS] Fix ReceiverTrackerSuite failure for maven build and other potential test failures in Streaming

2015-07-30 Thread tdas
Repository: spark Updated Branches: refs/heads/master 89cda69ec -> 0dbd6963d [SPARK-9479] [STREAMING] [TESTS] Fix ReceiverTrackerSuite failure for maven build and other potential test failures in Streaming See https://issues.apache.org/jira/browse/SPARK-9479 for the failure cause. The PR inc

spark git commit: [STREAMING] [TEST] [HOTFIX] Fixed Kinesis test to not throw weird errors when Kinesis tests are enabled without AWS keys

2015-07-30 Thread tdas
at org.apache.spark.streaming.kinesis.KinesisStreamSuite$$anonfun$3.apply(KinesisStreamSuite.scala:86) ``` This is because attempting to delete a non-existent Kinesis stream throws uncaught exception. This PR fixes it. Author: Tathagata Das Closes #7809 from tdas/kinesis-test-hotfix and squashes

<    2   3   4   5   6   7   8   9   10   11   >