[GitHub] spark pull request #13882: Branch 1.6

liu549676915 Thu, 23 Jun 2016 18:46:34 -0700

GitHub user liu549676915 opened a pull request:

    https://github.com/apache/spark/pull/13882


    Branch 1.6

    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
    
    
    (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark branch-1.6

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13882.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13882
    
----
commit 0afad6678431846a6eebda8d5891da9115884915
Author: RJ Nowling <rnowl...@gmail.com>
Date:   2016-01-05T23:05:04Z

    [SPARK-12450][MLLIB] Un-persist broadcasted variables in KMeans
    
    SPARK-12450 . Un-persist broadcasted variables in KMeans.
    
    Author: RJ Nowling <rnowl...@gmail.com>
    
    Closes #10415 from rnowling/spark-12450.
    
    (cherry picked from commit 78015a8b7cc316343e302eeed6fe30af9f2961e8)
    Signed-off-by: Joseph K. Bradley <jos...@databricks.com>

commit bf3dca2df4dd3be264691be1321e0c700d4f4e32
Author: BrianLondon <br...@seatgeek.com>
Date:   2016-01-05T23:15:07Z

    [SPARK-12453][STREAMING] Remove explicit dependency on aws-java-sdk
    
    Successfully ran kinesis demo on a live, aws hosted kinesis stream against 
master and 1.6 branches.  For reasons I don't entirely understand it required a 
manual merge to 1.5 which I did as shown here: 
https://github.com/BrianLondon/spark/commit/075c22e89bc99d5e99be21f40e0d72154a1e23a2
    
    The demo ran successfully on the 1.5 branch as well.
    
    According to `mvn dependency:tree` it is still pulling a fairly old version 
of the aws-java-sdk (1.9.37), but this appears to have fixed the kinesis 
regression in 1.5.2.
    
    Author: BrianLondon <br...@seatgeek.com>
    
    Closes #10492 from BrianLondon/remove-only.
    
    (cherry picked from commit ff89975543b153d0d235c0cac615d45b34aa8fe7)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit c3135d02176cdd679b4a0e4883895b9e9f001a55
Author: Yanbo Liang <yblia...@gmail.com>
Date:   2016-01-06T06:35:41Z

    [SPARK-12393][SPARKR] Add read.text and write.text for SparkR
    
    Add ```read.text``` and ```write.text``` for SparkR.
    cc sun-rui felixcheung shivaram
    
    Author: Yanbo Liang <yblia...@gmail.com>
    
    Closes #10348 from yanboliang/spark-12393.
    
    (cherry picked from commit d1fea41363c175a67b97cb7b3fe89f9043708739)
    Signed-off-by: Shivaram Venkataraman <shiva...@cs.berkeley.edu>

commit 175681914af953b7ce1b2971fef83a2445de1f94
Author: zero323 <matthew.szymkiew...@gmail.com>
Date:   2016-01-06T19:58:33Z

    [SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not None
    
    If initial model passed to GMM is not empty it causes 
`net.razorvine.pickle.PickleException`. It can be fixed by converting 
`initialModel.weights` to `list`.
    
    Author: zero323 <matthew.szymkiew...@gmail.com>
    
    Closes #9986 from zero323/SPARK-12006.
    
    (cherry picked from commit fcd013cf70e7890aa25a8fe3cb6c8b36bf0e1f04)
    Signed-off-by: Joseph K. Bradley <jos...@databricks.com>

commit d821fae0ecca6393d3632977797d72ba594d26a9
Author: Shixiong Zhu <shixi...@databricks.com>
Date:   2016-01-06T20:03:01Z

    [SPARK-12617][PYSPARK] Move Py4jCallbackConnectionCleaner to Streaming
    
    Move Py4jCallbackConnectionCleaner to Streaming because the callback server 
starts only in StreamingContext.
    
    Author: Shixiong Zhu <shixi...@databricks.com>
    
    Closes #10621 from zsxwing/SPARK-12617-2.
    
    (cherry picked from commit 1e6648d62fb82b708ea54c51cd23bfe4f542856e)
    Signed-off-by: Shixiong Zhu <shixi...@databricks.com>

commit 8f0ead3e79beb2c5f2731ceaa34fe1c133763386
Author: huangzhaowei <carlmartin...@gmail.com>
Date:   2016-01-06T20:48:57Z

    [SPARK-12672][STREAMING][UI] Use the uiRoot function instead of default 
root path to gain the streaming batch url.
    
    Author: huangzhaowei <carlmartin...@gmail.com>
    
    Closes #10617 from SaintBacchus/SPARK-12672.

commit 39b0a348008b6ab532768b90fd578b77711af98c
Author: Shixiong Zhu <shixi...@databricks.com>
Date:   2016-01-06T21:53:25Z

    Revert "[SPARK-12672][STREAMING][UI] Use the uiRoot function instead of 
default root path to gain the streaming batch url."
    
    This reverts commit 8f0ead3e79beb2c5f2731ceaa34fe1c133763386. Will merge 
#10618 instead.

commit 11b901b22b1cdaa6d19b1b73885627ac601be275
Author: Liang-Chi Hsieh <vii...@appier.com>
Date:   2015-12-14T17:59:42Z

    [SPARK-12016] [MLLIB] [PYSPARK] Wrap Word2VecModel when loading it in 
pyspark
    
    JIRA: https://issues.apache.org/jira/browse/SPARK-12016
    
    We should not directly use Word2VecModel in pyspark. We need to wrap it in 
a Word2VecModelWrapper when loading it in pyspark.
    
    Author: Liang-Chi Hsieh <vii...@appier.com>
    
    Closes #10100 from viirya/fix-load-py-wordvecmodel.
    
    (cherry picked from commit b51a4cdff3a7e640a8a66f7a9c17021f3056fd34)
    Signed-off-by: Joseph K. Bradley <jos...@databricks.com>

commit 94af69c9be70b9d2cd95c26288e2af9599d61e5c
Author: jerryshao <ss...@hortonworks.com>
Date:   2016-01-07T05:28:29Z

    [SPARK-12673][UI] Add missing uri prepending for job description
    
    Otherwise the url will be failed to proxy to the right one if in YARN mode. 
Here is the screenshot:
    
    ![screen shot 2016-01-06 at 5 28 26 
pm](https://cloud.githubusercontent.com/assets/850797/12139632/bbe78ecc-b49c-11e5-8932-94e8b3622a09.png)
    
    Author: jerryshao <ss...@hortonworks.com>
    
    Closes #10618 from jerryshao/SPARK-12673.
    
    (cherry picked from commit 174e72ceca41a6ac17ad05d50832ee9c561918c0)
    Signed-off-by: Shixiong Zhu <shixi...@databricks.com>

commit d061b852274c12784f3feb96c0cdcab39989f8e7
Author: Guillaume Poulin <poulin.guilla...@gmail.com>
Date:   2016-01-07T05:34:46Z

    [SPARK-12678][CORE] MapPartitionsRDD clearDependencies
    
    MapPartitionsRDD was keeping a reference to `prev` after a call to
    `clearDependencies` which could lead to memory leak.
    
    Author: Guillaume Poulin <poulin.guilla...@gmail.com>
    
    Closes #10623 from gpoulin/map_partition_deps.
    
    (cherry picked from commit b6738520374637347ab5ae6c801730cdb6b35daa)
    Signed-off-by: Reynold Xin <r...@databricks.com>

commit 34effc46cd54735cc660d8b43f0a190e91747a06
Author: Yin Huai <yh...@databricks.com>
Date:   2016-01-07T06:03:31Z

    Revert "[SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not 
None"
    
    This reverts commit fcd013cf70e7890aa25a8fe3cb6c8b36bf0e1f04.
    
    Author: Yin Huai <yh...@databricks.com>
    
    Closes #10632 from yhuai/pythonStyle.
    
    (cherry picked from commit e5cde7ab11a43334fa01b1bb8904da5c0774bc62)
    Signed-off-by: Yin Huai <yh...@databricks.com>

commit 47a58c799206d011587e03178a259974be47d3bc
Author: zzcclp <xm_...@sina.com>
Date:   2016-01-07T07:06:21Z

    [DOC] fix 'spark.memory.offHeap.enabled' default value to false
    
    modify 'spark.memory.offHeap.enabled' default value to false
    
    Author: zzcclp <xm_...@sina.com>
    
    Closes #10633 from zzcclp/fix_spark.memory.offHeap.enabled_default_value.
    
    (cherry picked from commit 84e77a15df18ba3f1cc871a3c52c783b46e52369)
    Signed-off-by: Reynold Xin <r...@databricks.com>

commit 69a885a71cfe7c62179e784e7d9eee023d3bb6eb
Author: zero323 <matthew.szymkiew...@gmail.com>
Date:   2016-01-07T18:32:56Z

    [SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not None
    
    If initial model passed to GMM is not empty it causes 
net.razorvine.pickle.PickleException. It can be fixed by converting 
initialModel.weights to list.
    
    Author: zero323 <matthew.szymkiew...@gmail.com>
    
    Closes #10644 from zero323/SPARK-12006.
    
    (cherry picked from commit 592f64985d0d58b4f6a0366bf975e04ca496bdbe)
    Signed-off-by: Joseph K. Bradley <jos...@databricks.com>

commit 017b73e69693cd151516f92640a95a4a66e02dff
Author: Sameer Agarwal <sam...@databricks.com>
Date:   2016-01-07T18:37:15Z

    [SPARK-12662][SQL] Fix DataFrame.randomSplit to avoid creating overlapping 
splits
    
    https://issues.apache.org/jira/browse/SPARK-12662
    
    cc yhuai
    
    Author: Sameer Agarwal <sam...@databricks.com>
    
    Closes #10626 from sameeragarwal/randomsplit.
    
    (cherry picked from commit f194d9911a93fc3a78be820096d4836f22d09976)
    Signed-off-by: Reynold Xin <r...@databricks.com>

commit 6ef823544dfbc8c9843bdedccfda06147a1a74fe
Author: Darek Blasiak <darek.blas...@640labs.com>
Date:   2016-01-07T21:15:40Z

    [SPARK-12598][CORE] bug in setMinPartitions
    
    There is a bug in the calculation of ```maxSplitSize```.  The 
```totalLen``` should be divided by ```minPartitions``` and not by 
```files.size```.
    
    Author: Darek Blasiak <darek.blas...@640labs.com>
    
    Closes #10546 from datafarmer/setminpartitionsbug.
    
    (cherry picked from commit 8346518357f4a3565ae41e9a5ccd7e2c3ed6c468)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit a7c36362fb9532183b7b6a0ad5020f02b816a9b3
Author: Shixiong Zhu <shixi...@databricks.com>
Date:   2016-01-08T01:37:46Z

    [SPARK-12507][STREAMING][DOCUMENT] Expose closeFileAfterWrite and 
allowBatching configurations for Streaming
    
    /cc tdas brkyvz
    
    Author: Shixiong Zhu <shixi...@databricks.com>
    
    Closes #10453 from zsxwing/streaming-conf.
    
    (cherry picked from commit c94199e977279d9b4658297e8108b46bdf30157b)
    Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com>

commit 0d96c54534d8bfca191c892b98397a176bc46152
Author: Shixiong Zhu <shixi...@databricks.com>
Date:   2016-01-08T10:02:06Z

    [SPARK-12591][STREAMING] Register OpenHashMapBasedStateMap for Kryo (branch 
1.6)
    
    backport #10609 to branch 1.6
    
    Author: Shixiong Zhu <shixi...@databricks.com>
    
    Closes #10656 from zsxwing/SPARK-12591-branch-1.6.

commit fe2cf342e2eddd7414bacf9f5702042a20c6d50f
Author: Jeff Zhang <zjf...@apache.org>
Date:   2016-01-08T19:38:46Z

    [DOCUMENTATION] doc fix of job scheduling
    
    spark.shuffle.service.enabled is spark application related configuration, 
it is not necessary to set it in yarn-site.xml
    
    Author: Jeff Zhang <zjf...@apache.org>
    
    Closes #10657 from zjffdu/doc-fix.
    
    (cherry picked from commit 00d9261724feb48d358679efbae6889833e893e0)
    Signed-off-by: Marcelo Vanzin <van...@cloudera.com>

commit e4227cb3e19afafe3a7b5a2847478681db2f2044
Author: Udo Klein <g...@blinkenlight.net>
Date:   2016-01-08T20:32:37Z

    fixed numVertices in transitive closure example
    
    Author: Udo Klein <g...@blinkenlight.net>
    
    Closes #10642 from udoklein/patch-2.
    
    (cherry picked from commit 8c70cb4c62a353bea99f37965dfc829c4accc391)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit faf094c7c35baf0e73290596d4ca66b7d083ed5b
Author: Thomas Graves <tgra...@apache.org>
Date:   2016-01-08T20:38:19Z

    [SPARK-12654] sc.wholeTextFiles with spark.hadoop.cloneConf=true failâ¦
    
    â¦s on secure Hadoop
    
    https://issues.apache.org/jira/browse/SPARK-12654
    
    So the bug here is that WholeTextFileRDD.getPartitions has:
    val conf = getConf
    in getConf if the cloneConf=true it creates a new Hadoop Configuration. 
Then it uses that to create a new newJobContext.
    The newJobContext will copy credentials around, but credentials are only 
present in a JobConf not in a Hadoop Configuration. So basically when it is 
cloning the hadoop configuration its changing it from a JobConf to 
Configuration and dropping the credentials that were there. NewHadoopRDD just 
uses the conf passed in for the getPartitions (not getConf) which is why it 
works.
    
    Author: Thomas Graves <tgra...@staydecay.corp.gq1.yahoo.com>
    
    Closes #10651 from tgravescs/SPARK-12654.
    
    (cherry picked from commit 553fd7b912a32476b481fd3f80c1d0664b6c6484)
    Signed-off-by: Tom Graves <tgra...@yahoo-inc.com>

commit a6190508b20673952303eff32b3a559f0a264d03
Author: Michael Armbrust <mich...@databricks.com>
Date:   2016-01-08T23:43:11Z

    [SPARK-12696] Backport Dataset Bug fixes to 1.6
    
    We've fixed a lot of bugs in master, and since this is experimental in 1.6 
we should consider back porting the fixes.  The only thing that is obviously 
risky to me is 0e07ed3, we might try to remove that.
    
    Author: Wenchen Fan <wenc...@databricks.com>
    Author: gatorsmile <gatorsm...@gmail.com>
    Author: Liang-Chi Hsieh <vii...@gmail.com>
    Author: Cheng Lian <l...@databricks.com>
    Author: Nong Li <n...@databricks.com>
    
    Closes #10650 from marmbrus/dataset-backports.

commit 8b5f23043322254c725c703c618ba3d3cc4a4240
Author: Yanbo Liang <yblia...@gmail.com>
Date:   2016-01-09T06:59:51Z

    [SPARK-12645][SPARKR] SparkR support hash function
    
    Add ```hash``` function for SparkR ```DataFrame```.
    
    Author: Yanbo Liang <yblia...@gmail.com>
    
    Closes #10597 from yanboliang/spark-12645.
    
    (cherry picked from commit 3d77cffec093bed4d330969f1a996f3358b9a772)
    Signed-off-by: Shivaram Venkataraman <shiva...@cs.berkeley.edu>

commit 7903b0610283a91c47f5df1aab069cf8930b4f27
Author: Josh Rosen <joshro...@databricks.com>
Date:   2016-01-10T22:49:45Z

    [SPARK-10359][PROJECT-INFRA] Backport dev/test-dependencies script to 
branch-1.6
    
    This patch backports the `dev/test-dependencies` script (from #10461) to 
branch-1.6.
    
    Author: Josh Rosen <joshro...@databricks.com>
    
    Closes #10680 from JoshRosen/test-deps-16-backport.

commit 43b72d83e1d0c426d00d29e54ab7d14579700330
Author: Josh Rosen <joshro...@databricks.com>
Date:   2016-01-11T08:36:52Z

    [SPARK-12734][BUILD] Backport Netty exclusion + Maven enforcer fixes to 
branch-1.6
    
    This patch backports the Netty exclusion fixes from #10672 to branch-1.6.
    
    Author: Josh Rosen <joshro...@databricks.com>
    
    Closes #10691 from JoshRosen/netty-exclude-16-backport.

commit d4cfd2acd62f2b0638a12bbbb48a38263c04eaf8
Author: Udo Klein <g...@blinkenlight.net>
Date:   2016-01-11T09:30:08Z

    removed lambda from sortByKey()
    
    According to the documentation the sortByKey method does not take a lambda 
as an argument, thus the example is flawed. Removed the argument completely as 
this will default to ascending sort.
    
    Author: Udo Klein <g...@blinkenlight.net>
    
    Closes #10640 from udoklein/patch-1.
    
    (cherry picked from commit bd723bd53d9a28239b60939a248a4ea13340aad8)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit ce906b33de64f55653b52376316aa2625fd86b47
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2016-01-11T19:29:15Z

    [STREAMING][MINOR] Typo fixes
    
    Author: Jacek Laskowski <ja...@japila.pl>
    
    Closes #10698 from jaceklaskowski/streaming-kafka-typo-fixes.
    
    (cherry picked from commit b313badaa049f847f33663c61cd70ee2f2cbebac)
    Signed-off-by: Shixiong Zhu <shixi...@databricks.com>

commit 3b32aa9e29506606d4ca2407aa65a1aab8794805
Author: Josh Rosen <joshro...@databricks.com>
Date:   2016-01-11T20:56:43Z

    [SPARK-12734][HOTFIX] Build changes must trigger all tests; clean after 
install in dep tests
    
    This patch fixes a build/test issue caused by the combination of #10672 and 
a latent issue in the original `dev/test-dependencies` script.
    
    First, changes which _only_ touched build files were not triggering full 
Jenkins runs, making it possible for a build change to be merged even though it 
could cause failures in other tests. The `root` build module now depends on 
`build`, so all tests will now be run whenever a build-related file is changed.
    
    I also added a `clean` step to the Maven install step in 
`dev/test-dependencies` in order to address an issue where the dummy JARs stuck 
around and caused "multiple assembly JARs found" errors in tests.
    
    /cc zsxwing
    
    Author: Josh Rosen <joshro...@databricks.com>
    
    Closes #10704 from JoshRosen/fix-build-test-problems.
    
    (cherry picked from commit a44991453a43615028083ba9546f5cd93112f6bd)
    Signed-off-by: Josh Rosen <joshro...@databricks.com>

commit dd2cf64f300ec42802dbea38b95047842de81870
Author: Brandon Bradley <bradleytas...@gmail.com>
Date:   2016-01-11T22:21:50Z

    [SPARK-12758][SQL] add note to Spark SQL Migration guide about 
TimestampType casting
    
    Warning users about casting changes.
    
    Author: Brandon Bradley <bradleytas...@gmail.com>
    
    Closes #10708 from blbradley/spark-12758.
    
    (cherry picked from commit a767ee8a0599f5482717493a3298413c65d8ff89)
    Signed-off-by: Michael Armbrust <mich...@databricks.com>

commit a6c9c68d8855e3a8bfc92f26b3877b92367087a4
Author: Yin Huai <yh...@databricks.com>
Date:   2016-01-12T03:59:15Z

    [SPARK-11823] Ignores HiveThriftBinaryServerSuite's test jdbc cancel
    
    https://issues.apache.org/jira/browse/SPARK-11823
    
    This test often hangs and times out, leaving hanging processes. Let's 
ignore it for now and improve the test.
    
    Author: Yin Huai <yh...@databricks.com>
    
    Closes #10715 from yhuai/SPARK-11823-ignore.
    
    (cherry picked from commit aaa2c3b628319178ca1f3f68966ff253c2de49cb)
    Signed-off-by: Josh Rosen <joshro...@databricks.com>

commit 46fc7a12a30b82cf1bcaab0e987a98b4dace37fe
Author: Tommy YU <tumm...@163.com>
Date:   2016-01-12T13:20:04Z

    [SPARK-12638][API DOC] Parameter explanation not very accurate for rdd 
function "aggregate"
    
    Currently, RDD function aggregate's parameter doesn't explain well, 
especially parameter "zeroValue".
    It's helpful to let junior scala user know that "zeroValue" attend both 
"seqOp" and "combOp" phase.
    
    Author: Tommy YU <tumm...@163.com>
    
    Closes #10587 from Wenpei/rdd_aggregate_doc.
    
    (cherry picked from commit 9f0995bb0d0bbe5d9b15a1ca9fa18e246ff90d66)
    Signed-off-by: Sean Owen <so...@cloudera.com>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13882: Branch 1.6

Reply via email to