spark git commit: [SPARK-20392][SQL] Set barrier to prevent re-entering a tree

2017-05-25 Thread wenchen
Repository: spark Updated Branches: refs/heads/master f47700c9c -> 8ce0d8ffb [SPARK-20392][SQL] Set barrier to prevent re-entering a tree ## What changes were proposed in this pull request? It is reported that there is performance downgrade when applying ML pipeline for dataset with many

spark git commit: [SPARK-14659][ML] RFormula consistent with R when handling strings

2017-05-25 Thread yliang
Repository: spark Updated Branches: refs/heads/master 2dbe0c528 -> f47700c9c [SPARK-14659][ML] RFormula consistent with R when handling strings ## What changes were proposed in this pull request? When handling strings, the category dropped by RFormula and R are different: - RFormula drops the

spark git commit: [SPARK-20775][SQL] Added scala support from_json

2017-05-25 Thread wenchen
Repository: spark Updated Branches: refs/heads/master c1e7989c4 -> 2dbe0c528 [SPARK-20775][SQL] Added scala support from_json ## What changes were proposed in this pull request? from_json function required to take in a java.util.Hashmap. For other functions, a java wrapper is provided which

spark git commit: [SPARK-20888][SQL][DOCS] Document change of default setting of spark.sql.hive.caseSensitiveInferenceMode

2017-05-25 Thread wenchen
Repository: spark Updated Branches: refs/heads/branch-2.2 7a21de9e2 -> 289dd170c [SPARK-20888][SQL][DOCS] Document change of default setting of spark.sql.hive.caseSensitiveInferenceMode (Link to Jira: https://issues.apache.org/jira/browse/SPARK-20888) ## What changes were proposed in this

spark git commit: [SPARK-20888][SQL][DOCS] Document change of default setting of spark.sql.hive.caseSensitiveInferenceMode

2017-05-25 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 98c385298 -> c1e7989c4 [SPARK-20888][SQL][DOCS] Document change of default setting of spark.sql.hive.caseSensitiveInferenceMode (Link to Jira: https://issues.apache.org/jira/browse/SPARK-20888) ## What changes were proposed in this pull

spark git commit: [SPARK-20874][EXAMPLES] Add Structured Streaming Kafka Source to examples project

2017-05-25 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 7fc2347b5 -> 4f6fccf15 [SPARK-20874][EXAMPLES] Add Structured Streaming Kafka Source to examples project ## What changes were proposed in this pull request? Add Structured Streaming Kafka Source to the `examples` project so that

spark git commit: [SPARK-20874][EXAMPLES] Add Structured Streaming Kafka Source to examples project

2017-05-25 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 5ae1c6521 -> 7a21de9e2 [SPARK-20874][EXAMPLES] Add Structured Streaming Kafka Source to examples project ## What changes were proposed in this pull request? Add Structured Streaming Kafka Source to the `examples` project so that

spark git commit: [SPARK-20874][EXAMPLES] Add Structured Streaming Kafka Source to examples project

2017-05-25 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master e9f983df2 -> 98c385298 [SPARK-20874][EXAMPLES] Add Structured Streaming Kafka Source to examples project ## What changes were proposed in this pull request? Add Structured Streaming Kafka Source to the `examples` project so that people

spark git commit: [SPARK-19707][SPARK-18922][TESTS][SQL][CORE] Fix test failures/the invalid path check for sc.addJar on Windows

2017-05-25 Thread srowen
Repository: spark Updated Branches: refs/heads/master 7306d5569 -> e9f983df2 [SPARK-19707][SPARK-18922][TESTS][SQL][CORE] Fix test failures/the invalid path check for sc.addJar on Windows ## What changes were proposed in this pull request? This PR proposes two things: - A follow up for

spark git commit: [SPARK-19707][SPARK-18922][TESTS][SQL][CORE] Fix test failures/the invalid path check for sc.addJar on Windows

2017-05-25 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.2 022a4957d -> 5ae1c6521 [SPARK-19707][SPARK-18922][TESTS][SQL][CORE] Fix test failures/the invalid path check for sc.addJar on Windows ## What changes were proposed in this pull request? This PR proposes two things: - A follow up for

spark git commit: [SPARK-20741][SPARK SUBMIT] Added cleanup of JARs archive generated by SparkSubmit

2017-05-25 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.2 e01f1f222 -> 022a4957d [SPARK-20741][SPARK SUBMIT] Added cleanup of JARs archive generated by SparkSubmit ## What changes were proposed in this pull request? Deleted generated JARs archive after distribution to HDFS ## How was this

spark git commit: [SPARK-20741][SPARK SUBMIT] Added cleanup of JARs archive generated by SparkSubmit

2017-05-25 Thread srowen
Repository: spark Updated Branches: refs/heads/master 139da116f -> 7306d5569 [SPARK-20741][SPARK SUBMIT] Added cleanup of JARs archive generated by SparkSubmit ## What changes were proposed in this pull request? Deleted generated JARs archive after distribution to HDFS ## How was this

spark git commit: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert) param of PySpark FPGrowth.

2017-05-25 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.2 9cbf39f1c -> e01f1f222 [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert) param of PySpark FPGrowth. ## What changes were proposed in this pull request? Expose numPartitions (expert) param of PySpark FPGrowth. ## How was this

spark git commit: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert) param of PySpark FPGrowth.

2017-05-25 Thread yliang
Repository: spark Updated Branches: refs/heads/master 913a6bfe4 -> 139da116f [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert) param of PySpark FPGrowth. ## What changes were proposed in this pull request? Expose numPartitions (expert) param of PySpark FPGrowth. ## How was this

spark git commit: [SPARK-19281][FOLLOWUP][ML] Minor fix for PySpark FPGrowth.

2017-05-25 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.2 8896c4ee9 -> 9cbf39f1c [SPARK-19281][FOLLOWUP][ML] Minor fix for PySpark FPGrowth. ## What changes were proposed in this pull request? Follow-up for #17218, some minor fix for PySpark ```FPGrowth```. ## How was this patch tested?

spark git commit: [SPARK-19281][FOLLOWUP][ML] Minor fix for PySpark FPGrowth.

2017-05-25 Thread yliang
Repository: spark Updated Branches: refs/heads/master 3f94e64aa -> 913a6bfe4 [SPARK-19281][FOLLOWUP][ML] Minor fix for PySpark FPGrowth. ## What changes were proposed in this pull request? Follow-up for #17218, some minor fix for PySpark ```FPGrowth```. ## How was this patch tested? Existing

spark git commit: [SPARK-19659] Fetch big blocks to disk when shuffle-read.

2017-05-25 Thread wenchen
Repository: spark Updated Branches: refs/heads/branch-2.2 b52a06d70 -> 8896c4ee9 [SPARK-19659] Fetch big blocks to disk when shuffle-read. ## What changes were proposed in this pull request? Currently the whole block is fetched into memory(off heap by default) when shuffle-read. A block is

spark git commit: [SPARK-19659] Fetch big blocks to disk when shuffle-read.

2017-05-25 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 731462a04 -> 3f94e64aa [SPARK-19659] Fetch big blocks to disk when shuffle-read. ## What changes were proposed in this pull request? Currently the whole block is fetched into memory(off heap by default) when shuffle-read. A block is

spark git commit: [SPARK-20250][CORE] Improper OOM error when a task been killed while spilling data

2017-05-25 Thread wenchen
Repository: spark Updated Branches: refs/heads/branch-2.0 79fbfbbc7 -> ef0ebdde0 [SPARK-20250][CORE] Improper OOM error when a task been killed while spilling data Currently, when a task is calling spill() but it receives a killing request from driver (e.g., speculative task), the

spark git commit: [SPARK-20250][CORE] Improper OOM error when a task been killed while spilling data

2017-05-25 Thread wenchen
Repository: spark Updated Branches: refs/heads/branch-2.1 7015f6f0e -> 7fc2347b5 [SPARK-20250][CORE] Improper OOM error when a task been killed while spilling data Currently, when a task is calling spill() but it receives a killing request from driver (e.g., speculative task), the

spark git commit: [SPARK-20250][CORE] Improper OOM error when a task been killed while spilling data

2017-05-25 Thread wenchen
Repository: spark Updated Branches: refs/heads/branch-2.2 e0aa23939 -> b52a06d70 [SPARK-20250][CORE] Improper OOM error when a task been killed while spilling data ## What changes were proposed in this pull request? Currently, when a task is calling spill() but it receives a killing request