GitHub user liu549676915 opened a pull request: https://github.com/apache/spark/pull/13882
Branch 1.6 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark branch-1.6 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13882.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13882 ---- commit 0afad6678431846a6eebda8d5891da9115884915 Author: RJ Nowling <rnowl...@gmail.com> Date: 2016-01-05T23:05:04Z [SPARK-12450][MLLIB] Un-persist broadcasted variables in KMeans SPARK-12450 . Un-persist broadcasted variables in KMeans. Author: RJ Nowling <rnowl...@gmail.com> Closes #10415 from rnowling/spark-12450. (cherry picked from commit 78015a8b7cc316343e302eeed6fe30af9f2961e8) Signed-off-by: Joseph K. Bradley <jos...@databricks.com> commit bf3dca2df4dd3be264691be1321e0c700d4f4e32 Author: BrianLondon <br...@seatgeek.com> Date: 2016-01-05T23:15:07Z [SPARK-12453][STREAMING] Remove explicit dependency on aws-java-sdk Successfully ran kinesis demo on a live, aws hosted kinesis stream against master and 1.6 branches. For reasons I don't entirely understand it required a manual merge to 1.5 which I did as shown here: https://github.com/BrianLondon/spark/commit/075c22e89bc99d5e99be21f40e0d72154a1e23a2 The demo ran successfully on the 1.5 branch as well. According to `mvn dependency:tree` it is still pulling a fairly old version of the aws-java-sdk (1.9.37), but this appears to have fixed the kinesis regression in 1.5.2. Author: BrianLondon <br...@seatgeek.com> Closes #10492 from BrianLondon/remove-only. (cherry picked from commit ff89975543b153d0d235c0cac615d45b34aa8fe7) Signed-off-by: Sean Owen <so...@cloudera.com> commit c3135d02176cdd679b4a0e4883895b9e9f001a55 Author: Yanbo Liang <yblia...@gmail.com> Date: 2016-01-06T06:35:41Z [SPARK-12393][SPARKR] Add read.text and write.text for SparkR Add ```read.text``` and ```write.text``` for SparkR. cc sun-rui felixcheung shivaram Author: Yanbo Liang <yblia...@gmail.com> Closes #10348 from yanboliang/spark-12393. (cherry picked from commit d1fea41363c175a67b97cb7b3fe89f9043708739) Signed-off-by: Shivaram Venkataraman <shiva...@cs.berkeley.edu> commit 175681914af953b7ce1b2971fef83a2445de1f94 Author: zero323 <matthew.szymkiew...@gmail.com> Date: 2016-01-06T19:58:33Z [SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not None If initial model passed to GMM is not empty it causes `net.razorvine.pickle.PickleException`. It can be fixed by converting `initialModel.weights` to `list`. Author: zero323 <matthew.szymkiew...@gmail.com> Closes #9986 from zero323/SPARK-12006. (cherry picked from commit fcd013cf70e7890aa25a8fe3cb6c8b36bf0e1f04) Signed-off-by: Joseph K. Bradley <jos...@databricks.com> commit d821fae0ecca6393d3632977797d72ba594d26a9 Author: Shixiong Zhu <shixi...@databricks.com> Date: 2016-01-06T20:03:01Z [SPARK-12617][PYSPARK] Move Py4jCallbackConnectionCleaner to Streaming Move Py4jCallbackConnectionCleaner to Streaming because the callback server starts only in StreamingContext. Author: Shixiong Zhu <shixi...@databricks.com> Closes #10621 from zsxwing/SPARK-12617-2. (cherry picked from commit 1e6648d62fb82b708ea54c51cd23bfe4f542856e) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> commit 8f0ead3e79beb2c5f2731ceaa34fe1c133763386 Author: huangzhaowei <carlmartin...@gmail.com> Date: 2016-01-06T20:48:57Z [SPARK-12672][STREAMING][UI] Use the uiRoot function instead of default root path to gain the streaming batch url. Author: huangzhaowei <carlmartin...@gmail.com> Closes #10617 from SaintBacchus/SPARK-12672. commit 39b0a348008b6ab532768b90fd578b77711af98c Author: Shixiong Zhu <shixi...@databricks.com> Date: 2016-01-06T21:53:25Z Revert "[SPARK-12672][STREAMING][UI] Use the uiRoot function instead of default root path to gain the streaming batch url." This reverts commit 8f0ead3e79beb2c5f2731ceaa34fe1c133763386. Will merge #10618 instead. commit 11b901b22b1cdaa6d19b1b73885627ac601be275 Author: Liang-Chi Hsieh <vii...@appier.com> Date: 2015-12-14T17:59:42Z [SPARK-12016] [MLLIB] [PYSPARK] Wrap Word2VecModel when loading it in pyspark JIRA: https://issues.apache.org/jira/browse/SPARK-12016 We should not directly use Word2VecModel in pyspark. We need to wrap it in a Word2VecModelWrapper when loading it in pyspark. Author: Liang-Chi Hsieh <vii...@appier.com> Closes #10100 from viirya/fix-load-py-wordvecmodel. (cherry picked from commit b51a4cdff3a7e640a8a66f7a9c17021f3056fd34) Signed-off-by: Joseph K. Bradley <jos...@databricks.com> commit 94af69c9be70b9d2cd95c26288e2af9599d61e5c Author: jerryshao <ss...@hortonworks.com> Date: 2016-01-07T05:28:29Z [SPARK-12673][UI] Add missing uri prepending for job description Otherwise the url will be failed to proxy to the right one if in YARN mode. Here is the screenshot: ![screen shot 2016-01-06 at 5 28 26 pm](https://cloud.githubusercontent.com/assets/850797/12139632/bbe78ecc-b49c-11e5-8932-94e8b3622a09.png) Author: jerryshao <ss...@hortonworks.com> Closes #10618 from jerryshao/SPARK-12673. (cherry picked from commit 174e72ceca41a6ac17ad05d50832ee9c561918c0) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> commit d061b852274c12784f3feb96c0cdcab39989f8e7 Author: Guillaume Poulin <poulin.guilla...@gmail.com> Date: 2016-01-07T05:34:46Z [SPARK-12678][CORE] MapPartitionsRDD clearDependencies MapPartitionsRDD was keeping a reference to `prev` after a call to `clearDependencies` which could lead to memory leak. Author: Guillaume Poulin <poulin.guilla...@gmail.com> Closes #10623 from gpoulin/map_partition_deps. (cherry picked from commit b6738520374637347ab5ae6c801730cdb6b35daa) Signed-off-by: Reynold Xin <r...@databricks.com> commit 34effc46cd54735cc660d8b43f0a190e91747a06 Author: Yin Huai <yh...@databricks.com> Date: 2016-01-07T06:03:31Z Revert "[SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not None" This reverts commit fcd013cf70e7890aa25a8fe3cb6c8b36bf0e1f04. Author: Yin Huai <yh...@databricks.com> Closes #10632 from yhuai/pythonStyle. (cherry picked from commit e5cde7ab11a43334fa01b1bb8904da5c0774bc62) Signed-off-by: Yin Huai <yh...@databricks.com> commit 47a58c799206d011587e03178a259974be47d3bc Author: zzcclp <xm_...@sina.com> Date: 2016-01-07T07:06:21Z [DOC] fix 'spark.memory.offHeap.enabled' default value to false modify 'spark.memory.offHeap.enabled' default value to false Author: zzcclp <xm_...@sina.com> Closes #10633 from zzcclp/fix_spark.memory.offHeap.enabled_default_value. (cherry picked from commit 84e77a15df18ba3f1cc871a3c52c783b46e52369) Signed-off-by: Reynold Xin <r...@databricks.com> commit 69a885a71cfe7c62179e784e7d9eee023d3bb6eb Author: zero323 <matthew.szymkiew...@gmail.com> Date: 2016-01-07T18:32:56Z [SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not None If initial model passed to GMM is not empty it causes net.razorvine.pickle.PickleException. It can be fixed by converting initialModel.weights to list. Author: zero323 <matthew.szymkiew...@gmail.com> Closes #10644 from zero323/SPARK-12006. (cherry picked from commit 592f64985d0d58b4f6a0366bf975e04ca496bdbe) Signed-off-by: Joseph K. Bradley <jos...@databricks.com> commit 017b73e69693cd151516f92640a95a4a66e02dff Author: Sameer Agarwal <sam...@databricks.com> Date: 2016-01-07T18:37:15Z [SPARK-12662][SQL] Fix DataFrame.randomSplit to avoid creating overlapping splits https://issues.apache.org/jira/browse/SPARK-12662 cc yhuai Author: Sameer Agarwal <sam...@databricks.com> Closes #10626 from sameeragarwal/randomsplit. (cherry picked from commit f194d9911a93fc3a78be820096d4836f22d09976) Signed-off-by: Reynold Xin <r...@databricks.com> commit 6ef823544dfbc8c9843bdedccfda06147a1a74fe Author: Darek Blasiak <darek.blas...@640labs.com> Date: 2016-01-07T21:15:40Z [SPARK-12598][CORE] bug in setMinPartitions There is a bug in the calculation of ```maxSplitSize```. The ```totalLen``` should be divided by ```minPartitions``` and not by ```files.size```. Author: Darek Blasiak <darek.blas...@640labs.com> Closes #10546 from datafarmer/setminpartitionsbug. (cherry picked from commit 8346518357f4a3565ae41e9a5ccd7e2c3ed6c468) Signed-off-by: Sean Owen <so...@cloudera.com> commit a7c36362fb9532183b7b6a0ad5020f02b816a9b3 Author: Shixiong Zhu <shixi...@databricks.com> Date: 2016-01-08T01:37:46Z [SPARK-12507][STREAMING][DOCUMENT] Expose closeFileAfterWrite and allowBatching configurations for Streaming /cc tdas brkyvz Author: Shixiong Zhu <shixi...@databricks.com> Closes #10453 from zsxwing/streaming-conf. (cherry picked from commit c94199e977279d9b4658297e8108b46bdf30157b) Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com> commit 0d96c54534d8bfca191c892b98397a176bc46152 Author: Shixiong Zhu <shixi...@databricks.com> Date: 2016-01-08T10:02:06Z [SPARK-12591][STREAMING] Register OpenHashMapBasedStateMap for Kryo (branch 1.6) backport #10609 to branch 1.6 Author: Shixiong Zhu <shixi...@databricks.com> Closes #10656 from zsxwing/SPARK-12591-branch-1.6. commit fe2cf342e2eddd7414bacf9f5702042a20c6d50f Author: Jeff Zhang <zjf...@apache.org> Date: 2016-01-08T19:38:46Z [DOCUMENTATION] doc fix of job scheduling spark.shuffle.service.enabled is spark application related configuration, it is not necessary to set it in yarn-site.xml Author: Jeff Zhang <zjf...@apache.org> Closes #10657 from zjffdu/doc-fix. (cherry picked from commit 00d9261724feb48d358679efbae6889833e893e0) Signed-off-by: Marcelo Vanzin <van...@cloudera.com> commit e4227cb3e19afafe3a7b5a2847478681db2f2044 Author: Udo Klein <g...@blinkenlight.net> Date: 2016-01-08T20:32:37Z fixed numVertices in transitive closure example Author: Udo Klein <g...@blinkenlight.net> Closes #10642 from udoklein/patch-2. (cherry picked from commit 8c70cb4c62a353bea99f37965dfc829c4accc391) Signed-off-by: Sean Owen <so...@cloudera.com> commit faf094c7c35baf0e73290596d4ca66b7d083ed5b Author: Thomas Graves <tgra...@apache.org> Date: 2016-01-08T20:38:19Z [SPARK-12654] sc.wholeTextFiles with spark.hadoop.cloneConf=true fail⦠â¦s on secure Hadoop https://issues.apache.org/jira/browse/SPARK-12654 So the bug here is that WholeTextFileRDD.getPartitions has: val conf = getConf in getConf if the cloneConf=true it creates a new Hadoop Configuration. Then it uses that to create a new newJobContext. The newJobContext will copy credentials around, but credentials are only present in a JobConf not in a Hadoop Configuration. So basically when it is cloning the hadoop configuration its changing it from a JobConf to Configuration and dropping the credentials that were there. NewHadoopRDD just uses the conf passed in for the getPartitions (not getConf) which is why it works. Author: Thomas Graves <tgra...@staydecay.corp.gq1.yahoo.com> Closes #10651 from tgravescs/SPARK-12654. (cherry picked from commit 553fd7b912a32476b481fd3f80c1d0664b6c6484) Signed-off-by: Tom Graves <tgra...@yahoo-inc.com> commit a6190508b20673952303eff32b3a559f0a264d03 Author: Michael Armbrust <mich...@databricks.com> Date: 2016-01-08T23:43:11Z [SPARK-12696] Backport Dataset Bug fixes to 1.6 We've fixed a lot of bugs in master, and since this is experimental in 1.6 we should consider back porting the fixes. The only thing that is obviously risky to me is 0e07ed3, we might try to remove that. Author: Wenchen Fan <wenc...@databricks.com> Author: gatorsmile <gatorsm...@gmail.com> Author: Liang-Chi Hsieh <vii...@gmail.com> Author: Cheng Lian <l...@databricks.com> Author: Nong Li <n...@databricks.com> Closes #10650 from marmbrus/dataset-backports. commit 8b5f23043322254c725c703c618ba3d3cc4a4240 Author: Yanbo Liang <yblia...@gmail.com> Date: 2016-01-09T06:59:51Z [SPARK-12645][SPARKR] SparkR support hash function Add ```hash``` function for SparkR ```DataFrame```. Author: Yanbo Liang <yblia...@gmail.com> Closes #10597 from yanboliang/spark-12645. (cherry picked from commit 3d77cffec093bed4d330969f1a996f3358b9a772) Signed-off-by: Shivaram Venkataraman <shiva...@cs.berkeley.edu> commit 7903b0610283a91c47f5df1aab069cf8930b4f27 Author: Josh Rosen <joshro...@databricks.com> Date: 2016-01-10T22:49:45Z [SPARK-10359][PROJECT-INFRA] Backport dev/test-dependencies script to branch-1.6 This patch backports the `dev/test-dependencies` script (from #10461) to branch-1.6. Author: Josh Rosen <joshro...@databricks.com> Closes #10680 from JoshRosen/test-deps-16-backport. commit 43b72d83e1d0c426d00d29e54ab7d14579700330 Author: Josh Rosen <joshro...@databricks.com> Date: 2016-01-11T08:36:52Z [SPARK-12734][BUILD] Backport Netty exclusion + Maven enforcer fixes to branch-1.6 This patch backports the Netty exclusion fixes from #10672 to branch-1.6. Author: Josh Rosen <joshro...@databricks.com> Closes #10691 from JoshRosen/netty-exclude-16-backport. commit d4cfd2acd62f2b0638a12bbbb48a38263c04eaf8 Author: Udo Klein <g...@blinkenlight.net> Date: 2016-01-11T09:30:08Z removed lambda from sortByKey() According to the documentation the sortByKey method does not take a lambda as an argument, thus the example is flawed. Removed the argument completely as this will default to ascending sort. Author: Udo Klein <g...@blinkenlight.net> Closes #10640 from udoklein/patch-1. (cherry picked from commit bd723bd53d9a28239b60939a248a4ea13340aad8) Signed-off-by: Sean Owen <so...@cloudera.com> commit ce906b33de64f55653b52376316aa2625fd86b47 Author: Jacek Laskowski <ja...@japila.pl> Date: 2016-01-11T19:29:15Z [STREAMING][MINOR] Typo fixes Author: Jacek Laskowski <ja...@japila.pl> Closes #10698 from jaceklaskowski/streaming-kafka-typo-fixes. (cherry picked from commit b313badaa049f847f33663c61cd70ee2f2cbebac) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> commit 3b32aa9e29506606d4ca2407aa65a1aab8794805 Author: Josh Rosen <joshro...@databricks.com> Date: 2016-01-11T20:56:43Z [SPARK-12734][HOTFIX] Build changes must trigger all tests; clean after install in dep tests This patch fixes a build/test issue caused by the combination of #10672 and a latent issue in the original `dev/test-dependencies` script. First, changes which _only_ touched build files were not triggering full Jenkins runs, making it possible for a build change to be merged even though it could cause failures in other tests. The `root` build module now depends on `build`, so all tests will now be run whenever a build-related file is changed. I also added a `clean` step to the Maven install step in `dev/test-dependencies` in order to address an issue where the dummy JARs stuck around and caused "multiple assembly JARs found" errors in tests. /cc zsxwing Author: Josh Rosen <joshro...@databricks.com> Closes #10704 from JoshRosen/fix-build-test-problems. (cherry picked from commit a44991453a43615028083ba9546f5cd93112f6bd) Signed-off-by: Josh Rosen <joshro...@databricks.com> commit dd2cf64f300ec42802dbea38b95047842de81870 Author: Brandon Bradley <bradleytas...@gmail.com> Date: 2016-01-11T22:21:50Z [SPARK-12758][SQL] add note to Spark SQL Migration guide about TimestampType casting Warning users about casting changes. Author: Brandon Bradley <bradleytas...@gmail.com> Closes #10708 from blbradley/spark-12758. (cherry picked from commit a767ee8a0599f5482717493a3298413c65d8ff89) Signed-off-by: Michael Armbrust <mich...@databricks.com> commit a6c9c68d8855e3a8bfc92f26b3877b92367087a4 Author: Yin Huai <yh...@databricks.com> Date: 2016-01-12T03:59:15Z [SPARK-11823] Ignores HiveThriftBinaryServerSuite's test jdbc cancel https://issues.apache.org/jira/browse/SPARK-11823 This test often hangs and times out, leaving hanging processes. Let's ignore it for now and improve the test. Author: Yin Huai <yh...@databricks.com> Closes #10715 from yhuai/SPARK-11823-ignore. (cherry picked from commit aaa2c3b628319178ca1f3f68966ff253c2de49cb) Signed-off-by: Josh Rosen <joshro...@databricks.com> commit 46fc7a12a30b82cf1bcaab0e987a98b4dace37fe Author: Tommy YU <tumm...@163.com> Date: 2016-01-12T13:20:04Z [SPARK-12638][API DOC] Parameter explanation not very accurate for rdd function "aggregate" Currently, RDD function aggregate's parameter doesn't explain well, especially parameter "zeroValue". It's helpful to let junior scala user know that "zeroValue" attend both "seqOp" and "combOp" phase. Author: Tommy YU <tumm...@163.com> Closes #10587 from Wenpei/rdd_aggregate_doc. (cherry picked from commit 9f0995bb0d0bbe5d9b15a1ca9fa18e246ff90d66) Signed-off-by: Sean Owen <so...@cloudera.com> ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org