spark git commit: [SPARK-9335] [TESTS] Enable Kinesis tests only when files in extras/kinesis-asl are changed

2015-07-30 Thread tdas
Repository: spark Updated Branches: refs/heads/master 1221849f9 -> 76f2e393a [SPARK-9335] [TESTS] Enable Kinesis tests only when files in extras/kinesis-asl are changed Author: zsxwing Closes #7711 from zsxwing/SPARK-9335-test and squashes the following commits: c13ec2f [zsxwing] environs

spark git commit: Revert "[SPARK-9458] Avoid object allocation in prefix generation."

2015-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master 76f2e393a -> 4a8bb9d00 Revert "[SPARK-9458] Avoid object allocation in prefix generation." This reverts commit 9514d874f0cf61f1eb4ec4f5f66e053119f769c9. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.

spark git commit: Fix flaky HashedRelationSuite

2015-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master 4a8bb9d00 -> 5ba2d4406 Fix flaky HashedRelationSuite SparkEnv might not have been set in local unit tests. Author: Reynold Xin Closes #7784 from rxin/HashedRelationSuite and squashes the following commits: 435d64b [Reynold Xin] Fix flak

spark git commit: [SPARK-8838] [SQL] Add config to enable/disable merging part-files when merging parquet schema

2015-07-30 Thread lian
Repository: spark Updated Branches: refs/heads/master 5ba2d4406 -> 6175d6cfe [SPARK-8838] [SQL] Add config to enable/disable merging part-files when merging parquet schema JIRA: https://issues.apache.org/jira/browse/SPARK-8838 Currently all part-files are merged when merging parquet schema.

spark git commit: [SPARK-7368] [MLLIB] Add QR decomposition for RowMatrix

2015-07-30 Thread meng
Repository: spark Updated Branches: refs/heads/master 6175d6cfe -> d31c618e3 [SPARK-7368] [MLLIB] Add QR decomposition for RowMatrix jira: https://issues.apache.org/jira/browse/SPARK-7368 Add QR decomposition for RowMatrix. I'm not sure what's the blueprint about the distributed Matrix from c

spark git commit: [SPARK-5561] [MLLIB] Generalized PeriodicCheckpointer for RDDs and Graphs

2015-07-30 Thread meng
Repository: spark Updated Branches: refs/heads/master d31c618e3 -> c5815930b [SPARK-5561] [MLLIB] Generalized PeriodicCheckpointer for RDDs and Graphs PeriodicGraphCheckpointer was introduced for Latent Dirichlet Allocation (LDA), but it was meant to be generalized to work with Graphs, RDDs,

spark git commit: [SPARK-8998] [MLLIB] Distribute PrefixSpan computation for large projected databases

2015-07-30 Thread meng
Repository: spark Updated Branches: refs/heads/master c5815930b -> d212a3142 [SPARK-8998] [MLLIB] Distribute PrefixSpan computation for large projected databases Continuation of work by zhangjiajin Closes #7412 Author: zhangjiajin Author: Feynman Liang Author: zhang jiajin Closes #7783

spark git commit: [SPARK-] [MLLIB] minor fix on tokenizer doc

2015-07-30 Thread meng
Repository: spark Updated Branches: refs/heads/master d212a3142 -> 9c0501c5d [SPARK-] [MLLIB] minor fix on tokenizer doc A trivial fix for the comments of RegexTokenizer. Maybe this is too small, yet I just noticed it and think it can be quite misleading. I can create a jira if necessary. A

spark git commit: [SPARK-] [MLLIB] minor fix on tokenizer doc

2015-07-30 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.4 8dfdca46d -> 020dd30e5 [SPARK-] [MLLIB] minor fix on tokenizer doc A trivial fix for the comments of RegexTokenizer. Maybe this is too small, yet I just noticed it and think it can be quite misleading. I can create a jira if necessary

spark git commit: [SPARK-9225] [MLLIB] LDASuite needs unit tests for empty documents

2015-07-30 Thread meng
Repository: spark Updated Branches: refs/heads/master 9c0501c5d -> a6e53a9c8 [SPARK-9225] [MLLIB] LDASuite needs unit tests for empty documents Add unit tests for running LDA with empty documents. Both EMLDAOptimizer and OnlineLDAOptimizer are tested. feynmanliang Author: Meihua Wu Closes

spark git commit: [SPARK-9277] [MLLIB] SparseVector constructor must throw an error when declared number of elements less than array length

2015-07-30 Thread meng
Repository: spark Updated Branches: refs/heads/master a6e53a9c8 -> ed3cb1d21 [SPARK-9277] [MLLIB] SparseVector constructor must throw an error when declared number of elements less than array length Check that SparseVector size is at least as big as the number of indices/values provided. And

spark git commit: [MINOR] [MLLIB] fix doc for RegexTokenizer

2015-07-30 Thread meng
Repository: spark Updated Branches: refs/heads/master ed3cb1d21 -> 81464f2a8 [MINOR] [MLLIB] fix doc for RegexTokenizer This is #7791 for Python. hhbyyh Author: Xiangrui Meng Closes #7798 from mengxr/regex-tok-py and squashes the following commits: baa2dcd [Xiangrui Meng] fix doc for Regex

spark git commit: [MINOR] [MLLIB] fix doc for RegexTokenizer

2015-07-30 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.4 020dd30e5 -> 6e85064f4 [MINOR] [MLLIB] fix doc for RegexTokenizer This is #7791 for Python. hhbyyh Author: Xiangrui Meng Closes #7798 from mengxr/regex-tok-py and squashes the following commits: baa2dcd [Xiangrui Meng] fix doc for R

spark git commit: [SPARK-9248] [SPARKR] Closing curly-braces should always be on their own line

2015-07-30 Thread shivaram
Repository: spark Updated Branches: refs/heads/master 81464f2a8 -> 7492a33fd [SPARK-9248] [SPARKR] Closing curly-braces should always be on their own line ### JIRA [[SPARK-9248] Closing curly-braces should always be on their own line - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-924

spark git commit: [SPARK-9390][SQL] create a wrapper for array type

2015-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master 7492a33fd -> c0cc0eaec [SPARK-9390][SQL] create a wrapper for array type Author: Wenchen Fan Closes #7724 from cloud-fan/array-data and squashes the following commits: d0408a1 [Wenchen Fan] fix python 661e608 [Wenchen Fan] rebase f39256c

spark git commit: [SPARK-9267] [CORE] Retire stringify(Partial)?Value from Accumulators

2015-07-30 Thread srowen
Repository: spark Updated Branches: refs/heads/master c0cc0eaec -> 7bbf02f0b [SPARK-9267] [CORE] Retire stringify(Partial)?Value from Accumulators cc srowen Author: François Garillot Closes #7678 from huitseeker/master and squashes the following commits: 5e99f57 [François Garillot] [SPAR

spark git commit: [SPARK-9361] [SQL] Refactor new aggregation code to reduce the times of checking compatibility

2015-07-30 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 7bbf02f0b -> 5363ed715 [SPARK-9361] [SQL] Refactor new aggregation code to reduce the times of checking compatibility JIRA: https://issues.apache.org/jira/browse/SPARK-9361 Currently, we call `aggregate.Utils.tryConvert` in many places to

spark git commit: [SPARK-8297] [YARN] Scheduler backend is not notified in case node fails in YARN

2015-07-30 Thread vanzin
Repository: spark Updated Branches: refs/heads/master 5363ed715 -> e53534655 [SPARK-8297] [YARN] Scheduler backend is not notified in case node fails in YARN This change adds code to notify the scheduler backend when a container dies in YARN. Author: Mridul Muralidharan Author: Marcelo Vanz

spark git commit: [SPARK-9388] [YARN] Make executor info log messages easier to read.

2015-07-30 Thread vanzin
Repository: spark Updated Branches: refs/heads/master e53534655 -> ab78b1d2a [SPARK-9388] [YARN] Make executor info log messages easier to read. Author: Marcelo Vanzin Closes #7706 from vanzin/SPARK-9388 and squashes the following commits: 028b990 [Marcelo Vanzin] Single log statement. 3c5f

spark git commit: [SPARK-8850] [SQL] Enable Unsafe mode by default

2015-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master ab78b1d2a -> 520ec0ff9 [SPARK-8850] [SQL] Enable Unsafe mode by default This pull request enables Unsafe mode by default in Spark SQL. In order to do this, we had to fix a number of small issues: **List of fixed blockers**: - [x] Make so

spark git commit: [SPARK-9437] [CORE] avoid overflow in SizeEstimator

2015-07-30 Thread shivaram
Repository: spark Updated Branches: refs/heads/master 520ec0ff9 -> 06b6a074f [SPARK-9437] [CORE] avoid overflow in SizeEstimator https://issues.apache.org/jira/browse/SPARK-9437 Author: Imran Rashid Closes #7750 from squito/SPARK-9437_size_estimator_overflow and squashes the following comm

spark git commit: [SPARK-8174] [SPARK-8175] [SQL] function unix_timestamp, from_unixtime

2015-07-30 Thread davies
Repository: spark Updated Branches: refs/heads/master 06b6a074f -> 6d94bf6ac [SPARK-8174] [SPARK-8175] [SQL] function unix_timestamp, from_unixtime unix_timestamp(): long Gets current Unix timestamp in seconds. unix_timestamp(string|date): long Converts time string in format -MM-dd HH:mm:

spark git commit: [SPARK-9460] Fix prefix generation for UTF8String.

2015-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master 6d94bf6ac -> a20e743fb [SPARK-9460] Fix prefix generation for UTF8String. Previously we could be getting garbage data if the number of bytes is 0, or on JVMs that are 4 byte aligned, or when compressedoops is on. Author: Reynold Xin Clo

spark git commit: [SPARK-5567] [MLLIB] Add predict method to LocalLDAModel

2015-07-30 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master a20e743fb -> d8cfd531c [SPARK-5567] [MLLIB] Add predict method to LocalLDAModel jkbradley hhbyyh Adds `topicDistributions` to LocalLDAModel. Please review after #7757 is merged. Author: Feynman Liang Closes #7760 from feynmanliang/SPARK

spark git commit: [SPARK-8186] [SPARK-8187] [SPARK-8194] [SPARK-8198] [SPARK-9133] [SPARK-9290] [SQL] functions: date_add, date_sub, add_months, months_between, time-interval calculation

2015-07-30 Thread davies
Repository: spark Updated Branches: refs/heads/master d8cfd531c -> 1abf7dc16 [SPARK-8186] [SPARK-8187] [SPARK-8194] [SPARK-8198] [SPARK-9133] [SPARK-9290] [SQL] functions: date_add, date_sub, add_months, months_between, time-interval calculation This PR is based on #7589 , thanks to adrian-w

spark git commit: [SPARK-9454] Change LDASuite tests to use vector comparisons

2015-07-30 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 1abf7dc16 -> 89cda69ec [SPARK-9454] Change LDASuite tests to use vector comparisons jkbradley Changes the current hacky string-comparison for vector compares. Author: Feynman Liang Closes #7775 from feynmanliang/SPARK-9454-ldasuite-vecto

spark git commit: [SPARK-9479] [STREAMING] [TESTS] Fix ReceiverTrackerSuite failure for maven build and other potential test failures in Streaming

2015-07-30 Thread tdas
Repository: spark Updated Branches: refs/heads/master 89cda69ec -> 0dbd6963d [SPARK-9479] [STREAMING] [TESTS] Fix ReceiverTrackerSuite failure for maven build and other potential test failures in Streaming See https://issues.apache.org/jira/browse/SPARK-9479 for the failure cause. The PR inc

spark git commit: [SPARK-8671] [ML] Added isotonic regression to the pipeline API.

2015-07-30 Thread meng
Repository: spark Updated Branches: refs/heads/master 0dbd6963d -> 7f7a319c4 [SPARK-8671] [ML] Added isotonic regression to the pipeline API. Author: martinzapletal Closes #7517 from zapletal-martin/SPARK-8671-isotonic-regression-api and squashes the following commits: 8c435c1 [martinzaple

spark git commit: [SPARK-6684] [MLLIB] [ML] Add checkpointing to GBTs

2015-07-30 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 7f7a319c4 -> be7be6d4c [SPARK-6684] [MLLIB] [ML] Add checkpointing to GBTs Add checkpointing to GradientBoostedTrees, GBTClassifier, GBTRegressor CC: mengxr Author: Joseph K. Bradley Closes #7804 from jkbradley/gbt-checkpoint3 and squas

spark git commit: [SPARK-8742] [SPARKR] Improve SparkR error messages for DataFrame API

2015-07-30 Thread shivaram
Repository: spark Updated Branches: refs/heads/master e7905a939 -> 157840d1b [SPARK-8742] [SPARKR] Improve SparkR error messages for DataFrame API This patch improves SparkR error message reporting, especially with DataFrame API. When there is a user error (e.g., malformed SQL query), the mes

spark git commit: [SPARK-9463] [ML] Expose model coefficients with names in SparkR RFormula

2015-07-30 Thread meng
Repository: spark Updated Branches: refs/heads/master be7be6d4c -> e7905a939 [SPARK-9463] [ML] Expose model coefficients with names in SparkR RFormula Preview: ``` > summary(m) features coefficients 1(Intercept)1.6765001 2 Sepal_Length0.3498801 3 Species.vers

spark git commit: [SPARK-9199] [CORE] Update Tachyon dependency from 0.6.4 -> 0.7.0

2015-07-30 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 157840d1b -> 04c840910 [SPARK-9199] [CORE] Update Tachyon dependency from 0.6.4 -> 0.7.0 No new dependencies are added. The exclusion changes are due to the change in tachyon-client 0.7.0's project structure. There is no client side API c

spark git commit: [STREAMING] [TEST] [HOTFIX] Fixed Kinesis test to not throw weird errors when Kinesis tests are enabled without AWS keys

2015-07-30 Thread tdas
Repository: spark Updated Branches: refs/heads/master 04c840910 -> 1afdeb7b4 [STREAMING] [TEST] [HOTFIX] Fixed Kinesis test to not throw weird errors when Kinesis tests are enabled without AWS keys If Kinesis tests are enabled by env ENABLE_KINESIS_TESTS = 1 but no AWS credentials are found,

spark git commit: [SPARK-9408] [PYSPARK] [MLLIB] Refactor linalg.py to /linalg

2015-07-30 Thread meng
Repository: spark Updated Branches: refs/heads/master 1afdeb7b4 -> ca71cc8c8 [SPARK-9408] [PYSPARK] [MLLIB] Refactor linalg.py to /linalg This is based on MechCoder 's PR https://github.com/apache/spark/pull/7731. Hopefully it could pass tests. MechCoder I tried to make minimal changes. If t

spark git commit: [SPARK-7157][SQL] add sampleBy to DataFrame

2015-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master ca71cc8c8 -> df3266951 [SPARK-7157][SQL] add sampleBy to DataFrame This was previously committed but then reverted due to test failures (see #6769). Author: Xiangrui Meng Closes #7755 from rxin/SPARK-7157 and squashes the following comm

spark git commit: [SPARK-9458][SPARK-9469][SQL] Code generate prefix computation in sorting & moves unsafe conversion out of TungstenSort.

2015-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master df3266951 -> e7a0976e9 [SPARK-9458][SPARK-9469][SQL] Code generate prefix computation in sorting & moves unsafe conversion out of TungstenSort. Author: Reynold Xin Closes #7803 from rxin/SPARK-9458 and squashes the following commits: 5b

spark git commit: [SPARK-9425] [SQL] support DecimalType in UnsafeRow

2015-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master e7a0976e9 -> 0b1a464b6 [SPARK-9425] [SQL] support DecimalType in UnsafeRow This PR brings the support of DecimalType in UnsafeRow, for precision <= 18, it's settable, otherwise it's not settable. Author: Davies Liu Closes #7758 from dav

spark git commit: [SPARK-6319][SQL] Throw AnalysisException when using BinaryType on Join and Aggregate

2015-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master 0b1a464b6 -> 351eda0e2 [SPARK-6319][SQL] Throw AnalysisException when using BinaryType on Join and Aggregate JIRA: https://issues.apache.org/jira/browse/SPARK-6319 Spark SQL uses plain byte arrays to represent binary values. However, the

spark git commit: [SPARK-9077] [MLLIB] Improve error message for decision trees when numExamples < maxCategoriesPerFeature

2015-07-30 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 351eda0e2 -> 65fa4181c [SPARK-9077] [MLLIB] Improve error message for decision trees when numExamples < maxCategoriesPerFeature Improve error message when number of examples is less than arity of high-arity categorical feature CC jkbradl

spark git commit: [SPARK-9489] Remove unnecessary compatibility and requirements checks from Exchange

2015-07-30 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 65fa4181c -> 3c66ff727 [SPARK-9489] Remove unnecessary compatibility and requirements checks from Exchange While reviewing yhuai's patch for SPARK-2205 (#7773), I noticed that Exchange's `compatible` check may be incorrectly returning `fa

spark git commit: [SPARK-9472] [STREAMING] consistent hadoop configuration, streaming only

2015-07-30 Thread tdas
Repository: spark Updated Branches: refs/heads/master 3c66ff727 -> 9307f5653 [SPARK-9472] [STREAMING] consistent hadoop configuration, streaming only Author: cody koeninger Closes #7772 from koeninger/streaming-hadoop-config and squashes the following commits: 5267284 [cody koeninger] [SPA

spark git commit: [SPARK-8176] [SPARK-8197] [SQL] function to_date/ trunc

2015-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master 9307f5653 -> 83670fc9e [SPARK-8176] [SPARK-8197] [SQL] function to_date/ trunc This PR is based on #6988 , thanks to adrian-wang . This brings two SQL functions: to_date() and trunc(). Closes #6988 Author: Daoyuan Wang Author: Davies Li

spark git commit: [SPARK-7690] [ML] Multiclass classification Evaluator

2015-07-30 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 83670fc9e -> 4e5919bfb [SPARK-7690] [ML] Multiclass classification Evaluator Multiclass Classification Evaluator for ML Pipelines. F1 score, precision, recall, weighted precision and weighted recall are supported as available metrics. Au

spark git commit: [SPARK-9214] [ML] [PySpark] support ml.NaiveBayes for Python

2015-07-30 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 4e5919bfb -> 69b62f76f [SPARK-9214] [ML] [PySpark] support ml.NaiveBayes for Python support ml.NaiveBayes for Python Author: Yanbo Liang Closes #7568 from yanboliang/spark-9214 and squashes the following commits: 5ee3fd6 [Yanbo Liang] f

spark git commit: [SPARK-9152][SQL] Implement code generation for Like and RLike

2015-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master 69b62f76f -> 0244170b6 [SPARK-9152][SQL] Implement code generation for Like and RLike JIRA: https://issues.apache.org/jira/browse/SPARK-9152 This PR implements code generation for `Like` and `RLike`. Author: Liang-Chi Hsieh Closes #7561

spark git commit: [SPARK-9496][SQL]do not print the password in config

2015-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master 0244170b6 -> a3a85d73d [SPARK-9496][SQL]do not print the password in config https://issues.apache.org/jira/browse/SPARK-9496 We better do not print the password in log. Author: WangTaoTheTonic Closes #7815 from WangTaoTheTonic/master an

spark git commit: [SPARK-9496][SQL]do not print the password in config

2015-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.4 6e85064f4 -> 3d6a9214e [SPARK-9496][SQL]do not print the password in config https://issues.apache.org/jira/browse/SPARK-9496 We better do not print the password in log. Author: WangTaoTheTonic Closes #7815 from WangTaoTheTonic/maste