spark git commit: [SPARK-8005][SQL] Input file name

2015-07-29 Thread rxin
Repository: spark Updated Branches: refs/heads/master e127ec34d -> 1221849f9 [SPARK-8005][SQL] Input file name Users can now get the file name of the partition being read in. A thread local variable is in `SQLNewHadoopRDD` and is set when the partition is computed. `SQLNewHadoopRDD` is moved

spark git commit: [SPARK-9428] [SQL] Add test cases for null inputs for expression unit tests

2015-07-29 Thread davies
Repository: spark Updated Branches: refs/heads/master 712465b68 -> e127ec34d [SPARK-9428] [SQL] Add test cases for null inputs for expression unit tests JIRA: https://issues.apache.org/jira/browse/SPARK-9428 Author: Yijie Shen Closes #7748 from yjshen/string_cleanup and squashes the followi

spark git commit: HOTFIX: disable HashedRelationSuite.

2015-07-29 Thread rxin
Repository: spark Updated Branches: refs/heads/master e044705b4 -> 712465b68 HOTFIX: disable HashedRelationSuite. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/712465b6 Tree: http://git-wip-us.apache.org/repos/asf/spark/

spark git commit: [SPARK-9116] [SQL] [PYSPARK] support Python only UDT in __main__

2015-07-29 Thread davies
Repository: spark Updated Branches: refs/heads/master f5dd11339 -> e044705b4 [SPARK-9116] [SQL] [PYSPARK] support Python only UDT in __main__ Also we could create a Python UDT without having a Scala one, it's important for Python users. cc mengxr JoshRosen Author: Davies Liu Closes #7453

spark git commit: Fix reference to self.names in StructType

2015-07-29 Thread rxin
Repository: spark Updated Branches: refs/heads/master 27850af52 -> f5dd11339 Fix reference to self.names in StructType `names` is not defined in this context, I think you meant `self.names`. davies Author: Alex Angelini Closes #7766 from angelini/fix_struct_type_names and squashes the foll

spark git commit: [SPARK-9462][SQL] Initialize nondeterministic expressions in code gen fallback mode.

2015-07-29 Thread rxin
Repository: spark Updated Branches: refs/heads/master 07fd7d364 -> 27850af52 [SPARK-9462][SQL] Initialize nondeterministic expressions in code gen fallback mode. Author: Reynold Xin Closes #7767 from rxin/SPARK-9462 and squashes the following commits: ef3e2d9 [Reynold Xin] Removed println

spark git commit: [SPARK-9460] Avoid byte array allocation in StringPrefixComparator.

2015-07-29 Thread rxin
Repository: spark Updated Branches: refs/heads/master 9514d874f -> 07fd7d364 [SPARK-9460] Avoid byte array allocation in StringPrefixComparator. As of today, StringPrefixComparator converts the long values back to byte arrays in order to compare them. This patch optimizes this to compare the

spark git commit: [SPARK-9458] Avoid object allocation in prefix generation.

2015-07-29 Thread rxin
Repository: spark Updated Branches: refs/heads/master a200e6456 -> 9514d874f [SPARK-9458] Avoid object allocation in prefix generation. In our existing sort prefix generation code, we use expression's eval method to generate the prefix, which results in object allocation for every prefix. We

spark git commit: [SPARK-9440] [MLLIB] Add hyperparameters to LocalLDAModel save/load

2015-07-29 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 2a9fe4a4e -> a200e6456 [SPARK-9440] [MLLIB] Add hyperparameters to LocalLDAModel save/load jkbradley MechCoder Resolves blocking issue for SPARK-6793. Please review after #7705 is merged. Author: Feynman Liang Closes #7757 from feynmanl

spark git commit: [SPARK-6129] [MLLIB] [DOCS] Added user guide for evaluation metrics

2015-07-29 Thread meng
Repository: spark Updated Branches: refs/heads/master 37c2d1927 -> 2a9fe4a4e [SPARK-6129] [MLLIB] [DOCS] Added user guide for evaluation metrics Author: sethah Closes #7655 from sethah/Working_on_6129 and squashes the following commits: 253db2d [sethah] removed number formatting from exampl

spark git commit: [SPARK-9016] [ML] make random forest classifiers implement classification trait

2015-07-29 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 103d8cce7 -> 37c2d1927 [SPARK-9016] [ML] make random forest classifiers implement classification trait Implement the classification trait for RandomForestClassifiers. The plan is to use this in the future to providing thresholding for Rand

spark git commit: [SPARK-8921] [MLLIB] Add @since tags to mllib.stat

2015-07-29 Thread meng
Repository: spark Updated Branches: refs/heads/master 86505962e -> 103d8cce7 [SPARK-8921] [MLLIB] Add @since tags to mllib.stat Author: Bimal Tandel Closes #7730 from BimalTandel/branch_spark_8921 and squashes the following commits: 3ea230a [Bimal Tandel] Spark 8921 add @since tags Proje

spark git commit: [SPARK-9448][SQL] GenerateUnsafeProjection should not share expressions across instances.

2015-07-29 Thread rxin
Repository: spark Updated Branches: refs/heads/master 2cc212d56 -> 86505962e [SPARK-9448][SQL] GenerateUnsafeProjection should not share expressions across instances. We accidentally moved the list of expressions from the generated code instance to the class wrapper, and as a result, differe

spark git commit: [SPARK-6793] [MLLIB] OnlineLDAOptimizer LDA perplexity

2015-07-29 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 1b0099fc6 -> 2cc212d56 [SPARK-6793] [MLLIB] OnlineLDAOptimizer LDA perplexity Implements `logPerplexity` in `OnlineLDAOptimizer`. Also refactors inference code into companion object to enable future reuse (e.g. `predict` method). Author:

spark git commit: [SPARK-9411] [SQL] Make Tungsten page sizes configurable

2015-07-29 Thread rxin
Repository: spark Updated Branches: refs/heads/master b715933fc -> 1b0099fc6 [SPARK-9411] [SQL] Make Tungsten page sizes configurable We need to make page sizes configurable so we can reduce them in unit tests and increase them in real production workloads. These sizes are now controlled by

spark git commit: [SPARK-9436] [GRAPHX] Pregel simplification patch

2015-07-29 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master 5340dfaf9 -> b715933fc [SPARK-9436] [GRAPHX] Pregel simplification patch Pregel code contains two consecutive joins: ``` g.vertices.innerJoin(messages)(vprog) ... g = g.outerJoinVertices(newVerts) { (vid, old, newOpt) => newOpt.getOrElse(ol

spark git commit: [SPARK-9430][SQL] Rename IntervalType to CalendarIntervalType.

2015-07-29 Thread rxin
Repository: spark Updated Branches: refs/heads/master 819be46e5 -> 5340dfaf9 [SPARK-9430][SQL] Rename IntervalType to CalendarIntervalType. We want to introduce a new IntervalType in 1.6 that is based on only the number of microseoncds, so interval can be compared. Renaming the existing Inte

spark git commit: [SPARK-8977] [STREAMING] Defines the RateEstimator interface, and impements the RateController

2015-07-29 Thread tdas
Repository: spark Updated Branches: refs/heads/master 069a4c414 -> 819be46e5 [SPARK-8977] [STREAMING] Defines the RateEstimator interface, and impements the RateController Based on #7471. - [x] add a test that exercises the publish path from driver to receiver - [ ] remove Serializable from

spark git commit: [SPARK-746] [CORE] Added Avro Serialization to Kryo

2015-07-29 Thread irashid
Repository: spark Updated Branches: refs/heads/master 97906944e -> 069a4c414 [SPARK-746] [CORE] Added Avro Serialization to Kryo Added a custom Kryo serializer for generic Avro records to reduce the network IO involved during a shuffle. This compresses the schema and allows for users to regist

spark git commit: [SPARK-9127][SQL] Rand/Randn codegen fails with long seed.

2015-07-29 Thread rxin
Repository: spark Updated Branches: refs/heads/master 708794e8a -> 97906944e [SPARK-9127][SQL] Rand/Randn codegen fails with long seed. Author: Reynold Xin Closes #7747 from rxin/SPARK-9127 and squashes the following commits: e851418 [Reynold Xin] [SPARK-9127][SQL] Rand/Randn codegen fails

spark git commit: [SPARK-9251][SQL] do not order by expressions which still need evaluation

2015-07-29 Thread rxin
Repository: spark Updated Branches: refs/heads/master 15667a0af -> 708794e8a [SPARK-9251][SQL] do not order by expressions which still need evaluation as an offline discussion with rxin , it's weird to be computing stuff while doing sorting, we should only order by bound reference during exec