spark git commit: [SPARK-19748][SQL] refresh function has a wrong order to do cache invalidate and regenerate the inmemory var for InMemoryFileIndex with FileStatusCache

2017-02-28 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 735303835 -> a350bc16d [SPARK-19748][SQL] refresh function has a wrong order to do cache invalidate and regenerate the inmemory var for InMemoryFileIndex with FileStatusCache ## What changes were proposed in this pull request? If we refre

spark git commit: [SPARK-19748][SQL] refresh function has a wrong order to do cache invalidate and regenerate the inmemory var for InMemoryFileIndex with FileStatusCache

2017-02-28 Thread wenchen
Repository: spark Updated Branches: refs/heads/branch-2.1 04fbb9e09 -> 4b4c3bf3f [SPARK-19748][SQL] refresh function has a wrong order to do cache invalidate and regenerate the inmemory var for InMemoryFileIndex with FileStatusCache ## What changes were proposed in this pull request? If we r

[1/2] spark git commit: [SPARK-19660][CORE][SQL] Replace the configuration property names that are deprecated in the version of Hadoop 2.6

2017-02-28 Thread srowen
Repository: spark Updated Branches: refs/heads/master a350bc16d -> 9b8eca65d http://git-wip-us.apache.org/repos/asf/spark/blob/9b8eca65/sql/hive/src/test/resources/golden/merge2-3-10266e3d5dd4c841c0d65030b1edba7c -- diff --git

[2/2] spark git commit: [SPARK-19660][CORE][SQL] Replace the configuration property names that are deprecated in the version of Hadoop 2.6

2017-02-28 Thread srowen
[SPARK-19660][CORE][SQL] Replace the configuration property names that are deprecated in the version of Hadoop 2.6 ## What changes were proposed in this pull request? Replace all the Hadoop deprecated configuration property names according to [DeprecatedProperties](https://hadoop.apache.org/doc

spark git commit: [SPARK-14489][ML][PYSPARK] ALS unknown user/item prediction strategy

2017-02-28 Thread mlnick
Repository: spark Updated Branches: refs/heads/master 9b8eca65d -> b40546651 [SPARK-14489][ML][PYSPARK] ALS unknown user/item prediction strategy This PR adds a param to `ALS`/`ALSModel` to set the strategy used when encountering unknown users or items at prediction time in `transform`. This

[2/2] spark git commit: [SPARK-19678][SQL] remove MetastoreRelation

2017-02-28 Thread lixiao
[SPARK-19678][SQL] remove MetastoreRelation ## What changes were proposed in this pull request? `MetastoreRelation` is used to represent table relation for hive tables, and provides some hive related information. We will resolve `SimpleCatalogRelation` to `MetastoreRelation` for hive tables, wh

[1/2] spark git commit: [SPARK-19678][SQL] remove MetastoreRelation

2017-02-28 Thread lixiao
Repository: spark Updated Branches: refs/heads/master b40546651 -> 7c7fc30b4 http://git-wip-us.apache.org/repos/asf/spark/blob/7c7fc30b/sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala -- diff --git a/sq

spark git commit: [SPARK-19677][SS] Committing a delta file atop an existing one should not fail on HDFS

2017-02-28 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 7c7fc30b4 -> 9734a928a [SPARK-19677][SS] Committing a delta file atop an existing one should not fail on HDFS ## What changes were proposed in this pull request? HDFSBackedStateStoreProvider fails to rename files on HDFS but not on the lo

spark git commit: [SPARK-19677][SS] Committing a delta file atop an existing one should not fail on HDFS

2017-02-28 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 4b4c3bf3f -> 947c0cd90 [SPARK-19677][SS] Committing a delta file atop an existing one should not fail on HDFS ## What changes were proposed in this pull request? HDFSBackedStateStoreProvider fails to rename files on HDFS but not on th

spark git commit: [SPARK-19677][SS] Committing a delta file atop an existing one should not fail on HDFS

2017-02-28 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 a6af60f25 -> dcfb05c86 [SPARK-19677][SS] Committing a delta file atop an existing one should not fail on HDFS ## What changes were proposed in this pull request? HDFSBackedStateStoreProvider fails to rename files on HDFS but not on th

spark git commit: [SPARK-19463][SQL] refresh cache after the InsertIntoHadoopFsRelationCommand

2017-02-28 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 9734a928a -> ce233f18e [SPARK-19463][SQL] refresh cache after the InsertIntoHadoopFsRelationCommand ## What changes were proposed in this pull request? If we first cache a DataSource table, then we insert some data into the table, we shou

spark git commit: [SPARK-19610][SQL] Support parsing multiline CSV files

2017-02-28 Thread wenchen
Repository: spark Updated Branches: refs/heads/master ce233f18e -> 7e5359be5 [SPARK-19610][SQL] Support parsing multiline CSV files ## What changes were proposed in this pull request? This PR proposes the support for multiple lines for CSV by resembling the multiline supports in JSON datasou

spark git commit: [MINOR][DOC] Update GLM doc to include tweedie distribution

2017-02-28 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 7e5359be5 -> d743ea4c7 [MINOR][DOC] Update GLM doc to include tweedie distribution Update GLM documentation to include the Tweedie distribution. #16344 jkbradley yanboliang Author: actuaryzhang Closes #17103 from actuaryzhang/doc. Pro

spark git commit: [SPARK-19769][DOCS] Update quickstart instructions

2017-02-28 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.1 947c0cd90 -> d887f7581 [SPARK-19769][DOCS] Update quickstart instructions ## What changes were proposed in this pull request? This change addresses the renaming of the `simple.sbt` build file to `build.sbt`. Newer versions of the sbt t

spark git commit: [SPARK-19769][DOCS] Update quickstart instructions

2017-02-28 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 dcfb05c86 -> c9c45d97b [SPARK-19769][DOCS] Update quickstart instructions ## What changes were proposed in this pull request? This change addresses the renaming of the `simple.sbt` build file to `build.sbt`. Newer versions of the sbt t

spark git commit: [SPARK-19373][MESOS] Base spark.scheduler.minRegisteredResourceRatio on registered cores rather than accepted cores

2017-02-28 Thread srowen
Repository: spark Updated Branches: refs/heads/master bf5987cbe -> ca3864d6e [SPARK-19373][MESOS] Base spark.scheduler.minRegisteredResourceRatio on registered cores rather than accepted cores ## What changes were proposed in this pull request? See JIRA ## How was this patch tested? Unit t

spark git commit: [SPARK-19769][DOCS] Update quickstart instructions

2017-02-28 Thread srowen
Repository: spark Updated Branches: refs/heads/master d743ea4c7 -> bf5987cbe [SPARK-19769][DOCS] Update quickstart instructions ## What changes were proposed in this pull request? This change addresses the renaming of the `simple.sbt` build file to `build.sbt`. Newer versions of the sbt tool

spark git commit: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-28 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master ca3864d6e -> 0fe8020f3 [SPARK-14503][ML] spark.ml API for FPGrowth ## What changes were proposed in this pull request? jira: https://issues.apache.org/jira/browse/SPARK-14503 Function parity: Add FPGrowth and AssociationRules to ML. desig

spark git commit: [SPARK-19572][SPARKR] Allow to disable hive in sparkR shell

2017-02-28 Thread felixcheung
Repository: spark Updated Branches: refs/heads/branch-2.1 d887f7581 -> f719cccdc [SPARK-19572][SPARKR] Allow to disable hive in sparkR shell ## What changes were proposed in this pull request? SPARK-15236 do this for scala shell, this ticket is for sparkR shell. This is not only for sparkR it

spark git commit: [SPARK-19572][SPARKR] Allow to disable hive in sparkR shell

2017-02-28 Thread felixcheung
Repository: spark Updated Branches: refs/heads/master 0fe8020f3 -> 731588056 [SPARK-19572][SPARKR] Allow to disable hive in sparkR shell ## What changes were proposed in this pull request? SPARK-15236 do this for scala shell, this ticket is for sparkR shell. This is not only for sparkR itself

spark git commit: [SPARK-19460][SPARKR] Update dataset used in R documentation, examples to reduce warning noise and confusions

2017-02-28 Thread felixcheung
Repository: spark Updated Branches: refs/heads/master 731588056 -> 89cd3845b [SPARK-19460][SPARKR] Update dataset used in R documentation, examples to reduce warning noise and confusions ## What changes were proposed in this pull request? Replace `iris` dataset with `Titanic` or other datase

spark git commit: [SPARK-19633][SS] FileSource read from FileSink

2017-02-28 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 89cd3845b -> 4913c92c2 [SPARK-19633][SS] FileSource read from FileSink ## What changes were proposed in this pull request? Right now file source always uses `InMemoryFileIndex` to scan files from a given path. But when reading the output