Repository: spark
Updated Branches:
refs/heads/master 4c08e2c08 -> b39e80d39
[SPARK-13761][ML] Remove remaining uses of validateParams
## What changes were proposed in this pull request?
Cleanups from [https://github.com/apache/spark/pull/11620]: remove remaining
uses of validateParams, and
Repository: spark
Updated Branches:
refs/heads/master ca9ef86c8 -> 917f4000b
[SPARK-13719][SQL] Parse JSON rows having an array type and a struct type in
the same fieild
## What changes were proposed in this pull request?
This https://github.com/apache/spark/pull/2400 added the support to
Repository: spark
Updated Branches:
refs/heads/master 750ed64cd -> bb1fda01f
[SPARK-13826][SQL] Addendum: update documentation for Datasets
## What changes were proposed in this pull request?
This patch updates documentations for Datasets. I also updated some internal
documentation for
Repository: spark
Updated Branches:
refs/heads/master 1d1de28a3 -> c4bd57602
[SPARK-12721][SQL] SQL Generation for Script Transformation
What changes were proposed in this pull request?
This PR is to convert to SQL from analyzed logical plans containing operator
`ScriptTransformation`.
Repository: spark
Updated Branches:
refs/heads/master b39e80d39 -> 1614485fd
[SPARK-10788][MLLIB][ML] Remove duplicate bins for decision trees
Decision trees in spark.ml (RandomForest.scala) communicate twice as much data
as needed for unordered categorical features. Here's an example.
Say
Repository: spark
Updated Branches:
refs/heads/master c4bd57602 -> ae6c677c8
[SPARK-13038][PYSPARK] Add load/save to pipeline
## What changes were proposed in this pull request?
JIRA issue: https://issues.apache.org/jira/browse/SPARK-13038
1. Add load/save to PySpark Pipeline and
Repository: spark
Updated Branches:
refs/heads/master 637a78f1d -> 5f3bda6fe
[SPARK-13838] [SQL] Clear variable code to prevent it to be re-evaluated in
BoundAttribute
JIRA: https://issues.apache.org/jira/browse/SPARK-13838
## What changes were proposed in this pull request?
We should also
Repository: spark
Updated Branches:
refs/heads/master 238fb485b -> 54794113a
[SPARK-13989] [SQL] Remove non-vectorized/unsafe-row parquet record reader
## What changes were proposed in this pull request?
This PR cleans up the new parquet record reader with the following changes:
1. Removes
Repository: spark
Updated Branches:
refs/heads/master 357d82d84 -> ea9ca6f04
[SPARK-13901][CORE] correct the logDebug information when jump to the next
locality level
JIRA Issue:https://issues.apache.org/jira/browse/SPARK-13901
In getAllowedLocalityLevel method of TaskSetManager,we get wrong
Repository: spark
Updated Branches:
refs/heads/master 77ba3021c -> d4d84936f
[SPARK-11011][SQL] Narrow type of UDT serialization
## What changes were proposed in this pull request?
Narrow down the parameter type of `UserDefinedType#serialize()`. Currently, the
parameter type is `Any`,
Repository: spark
Updated Branches:
refs/heads/master 5f6bdf97c -> eacd9d8ed
[SPARK-13360][PYSPARK][YARN] PYSPARK_DRIVER_PYTHON and PYSPARK_PYTHONâ¦
⦠is not picked up in yarn-cluster mode
Author: Jeff Zhang
Closes #11238 from zjffdu/SPARK-13360.
Project:
Repository: spark
Updated Branches:
refs/heads/master 82066a166 -> 30c18841e
Revert "[SPARK-13840][SQL] Split Optimizer Rule ColumnPruning to ColumnPruning
and EliminateOperator"
This reverts commit 99bd2f0e94657687834c5c59c4270c1484c9f595.
Project:
http://git-wip-us.apache.org/repos/asf/spark/blob/8ef3399a/mllib/src/main/scala/org/apache/spark/mllib/tree/model/treeEnsembleModels.scala
--
diff --git
Repository: spark
Updated Branches:
refs/heads/master 14c7236dc -> 9c23c818c
[SPARK-13977] [SQL] Brings back Shuffled hash join
## What changes were proposed in this pull request?
ShuffledHashJoin (also outer join) is removed in 1.6, in favor of
SortMergeJoin, which is more robust and also
Repository: spark
Updated Branches:
refs/heads/master ea9ca6f04 -> 8ef3399af
http://git-wip-us.apache.org/repos/asf/spark/blob/8ef3399a/sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala
--
diff --git
Repository: spark
Updated Branches:
refs/heads/master 8ef3399af -> 1974d1d34
[SPARK-12719][SQL] SQL generation support for Generate
## What changes were proposed in this pull request?
This PR adds SQL generation support for `Generate` operator. It always converts
`Generate` operator into
Repository: spark
Updated Branches:
refs/heads/master 1974d1d34 -> 65b75e66e
[SPARK-13776][WEBUI] Limit the max number of acceptors and selectors for Jetty
## What changes were proposed in this pull request?
As each acceptor/selector in Jetty will use one thread, the number of threads
Repository: spark
Updated Branches:
refs/heads/master 0f1015ffd -> 53f32a22d
[MINOR][DOC] Fix nits in JavaStreamingTestExample
## What changes were proposed in this pull request?
Fix some nits discussed in
https://github.com/apache/spark/pull/11776#issuecomment-198207419
use !rdd.isEmpty
Repository: spark
Updated Branches:
refs/heads/master dcaa01661 -> d630a203d
[SPARK-10680][TESTS] Increase 'connectionTimeout' to make
RequestTimeoutIntegrationSuite more stable
## What changes were proposed in this pull request?
Increase 'connectionTimeout' to make
Repository: spark
Updated Branches:
refs/heads/master 453455c47 -> 6037ed0a1
[SPARK-13976][SQL] do not remove sub-queries added by user when generate SQL
## What changes were proposed in this pull request?
We haven't figured out the corrected logical to add sub-queries yet, so we
should not
Repository: spark
Updated Branches:
refs/heads/master 2082a4956 -> dcaa01661
[SPARK-13897][SQL] RelationalGroupedDataset and KeyValueGroupedDataset
## What changes were proposed in this pull request?
Previously, Dataset.groupBy returns a GroupedData, and Dataset.groupByKey
returns a
Repository: spark
Updated Branches:
refs/heads/master 90a1d8db7 -> 10ef4f3e7
[SPARK-13826][SQL] Revises Dataset ScalaDoc
## What changes were proposed in this pull request?
This PR revises Dataset API ScalaDoc. All public methods are divided into the
following groups
* `groupname basic`:
Repository: spark
Updated Branches:
refs/heads/master ae6c677c8 -> 3f06eb72c
[SPARK-13613][ML] Provide ignored tests to export test dataset into CSV format
## What changes were proposed in this pull request?
Provide ignored test cases to export the test dataset into CSV format in
Repository: spark
Updated Branches:
refs/heads/master 7eef2463a -> c890c359b
[MINOR][SQL][BUILD] Remove duplicated lines
## What changes were proposed in this pull request?
This PR removes three minor duplicated lines. First one is making the following
unreachable code warning.
```
Repository: spark
Updated Branches:
refs/heads/master c100d31dd -> 7eef2463a
[SPARK-13118][SQL] Expression encoding for optional synthetic classes
## What changes were proposed in this pull request?
Fix expression generation for optional types.
Standard Java reflection causes issues when
Repository: spark
Updated Branches:
refs/heads/master 496d2a2b4 -> 9412547e7
[SPARK-13823][HOTFIX] Increase tryAcquire timeout and assert it succeeds to fix
failure on slow machines
## What changes were proposed in this pull request?
I'm seeing several PR builder builds fail after
Repository: spark
Updated Branches:
refs/heads/master 4ce2d24e2 -> b90c0206f
[SPARK-13922][SQL] Filter rows with null attributes in vectorized parquet reader
# What changes were proposed in this pull request?
It's common for many SQL operators to not care about reading `null` values for
Repository: spark
Updated Branches:
refs/heads/master 828213d4c -> edf8b8775
[SPARK-11891] Model export/import for RFormula and RFormulaModel
https://issues.apache.org/jira/browse/SPARK-11891
Author: Xusen Yin
Closes #9884 from yinxusen/SPARK-11891.
Project:
Repository: spark
Updated Branches:
refs/heads/master 5faba9fac -> 82066a166
[SPARK-13948] MiMa check should catch if the visibility changes to private
MiMa excludes are currently generated using both the current Spark version's
classes and Spark 1.2.0's classes, but this doesn't make sense:
Repository: spark
Updated Branches:
refs/heads/master d9e8f26d0 -> d9670f847
[SPARK-13894][SQL] SqlContext.range return type from DataFrame to DataSet
## What changes were proposed in this pull request?
https://issues.apache.org/jira/browse/SPARK-13894
Change the return type of the
Repository: spark
Updated Branches:
refs/heads/master 10ef4f3e7 -> 750ed64cd
[SPARK-13930] [SQL] Apply fast serialization on collect limit operator
## What changes were proposed in this pull request?
JIRA: https://issues.apache.org/jira/browse/SPARK-13930
Recently the fast serialization has
Repository: spark
Updated Branches:
refs/heads/master d9670f847 -> 91984978e
[SPARK-13816][GRAPHX] Add parameter checks for algorithms in Graphx
JIRA: https://issues.apache.org/jira/browse/SPARK-13816
## What changes were proposed in this pull request?
Add parameter checks for algorithms in
Repository: spark
Updated Branches:
refs/heads/master 1970d911d -> 2082a4956
[MINOR][DOCS] Use `spark-submit` instead of `sparkR` to submit R script.
## What changes were proposed in this pull request?
Since `sparkR` is not used for submitting R Scripts from Spark 2.0, a user
faces the
Repository: spark
Updated Branches:
refs/heads/master b90c0206f -> f96997ba2
[SPARK-13871][SQL] Support for inferring filters from data constraints
## What changes were proposed in this pull request?
This PR generalizes the `NullFiltering` optimizer rule in catalyst to
Repository: spark
Updated Branches:
refs/heads/master f96997ba2 -> 77ba3021c
[SPARK-13869][SQL] Remove redundant conditions while combining filters
## What changes were proposed in this pull request?
**[I'll link it to the JIRA once ASF JIRA is back online]**
This PR modifies the existing
Repository: spark
Updated Branches:
refs/heads/master 0acb32a3f -> 14c7236dc
[SPARK-14004][SQL][MINOR] AttributeReference and Alias should only use the
first qualifier to generate SQL strings
## What changes were proposed in this pull request?
Current implementations of
Repository: spark
Updated Branches:
refs/heads/master 91984978e -> 1d1de28a3
[SPARK-13827][SQL] Can't add subquery to an operator with same-name outputs
while generate SQL string
## What changes were proposed in this pull request?
This PR tries to solve a fundamental issue in the
[SPARK-13928] Move org.apache.spark.Logging into
org.apache.spark.internal.Logging
## What changes were proposed in this pull request?
Logging was made private in Spark 2.0. If we move it, then users would be able
to create a Logging trait themselves to avoid changing their own code.
## How
Repository: spark
Updated Branches:
refs/heads/master 92b70576e -> ca9ef86c8
http://git-wip-us.apache.org/repos/asf/spark/blob/ca9ef86c/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala
Repository: spark
Updated Branches:
refs/heads/master bb1fda01f -> 7783b6f38
[MINOR][ML] When trainingSummary is None, it should throw RuntimeException.
## What changes were proposed in this pull request?
When trainingSummary is None, it should throw ```RuntimeException```.
cc mengxr
## How
Repository: spark
Updated Branches:
refs/heads/master 6037ed0a1 -> 6c2d894a2
[SPARK-13921] Store serialized blocks as multiple chunks in MemoryStore
This patch modifies the BlockManager, MemoryStore, and several other storage
components so that serialized cached blocks are stored as multiple
Repository: spark
Updated Branches:
refs/heads/master 204c9dec2 -> 357d82d84
[SPARK-13629][ML] Add binary toggle Param to CountVectorizer
## What changes were proposed in this pull request?
It would be handy to add a binary toggle Param to CountVectorizer, as in the
scikit-learn one:
Repository: spark
Updated Branches:
refs/heads/master 2e0c5284f -> 238fb485b
[SPARK-13972][SQL][FOLLOW-UP] When creating the query execution for a converted
SQL query, we eagerly trigger analysis
## What changes were proposed in this pull request?
As part of testing generating SQL query from
http://git-wip-us.apache.org/repos/asf/spark/blob/8ef3399a/core/src/main/scala/org/apache/spark/internal/Logging.scala
--
diff --git a/core/src/main/scala/org/apache/spark/internal/Logging.scala
Repository: spark
Updated Branches:
refs/heads/master 85c42fda9 -> 27e1f3885
[SPARK-13034] PySpark ml.classification support export/import
## What changes were proposed in this pull request?
Add export/import for all estimators and transformers(which have Scala
implementation) under
Repository: spark
Updated Branches:
refs/heads/master 353778216 -> 2e0c5284f
[SPARK-13958] Executor OOM due to unbounded growth of pointer array inâ¦
## What changes were proposed in this pull request?
This change fixes the executor OOM which was recently introduced in PR
Repository: spark
Updated Branches:
refs/heads/master 65b75e66e -> 637a78f1d
[SPARK-13427][SQL] Support USING clause in JOIN.
## What changes were proposed in this pull request?
Support queries that JOIN tables with USING clause.
SELECT * from table1 JOIN table2 USING
USING clause can be
Repository: spark
Updated Branches:
refs/heads/master edf8b8775 -> 4c08e2c08
Revert "[SPARK-12719][HOTFIX] Fix compilation against Scala 2.10"
This reverts commit 3ee7996187bbef008c10681bc4e048c6383f5187.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit:
Repository: spark
Updated Branches:
refs/heads/master 3ee799618 -> 828213d4c
[SPARK-13937][PYSPARK][ML] Change JavaWrapper _java_obj from static to member
variable
## What changes were proposed in this pull request?
In PySpark wrapper.py JavaWrapper change _java_obj from an unused static
Repository: spark
Updated Branches:
refs/heads/master 3f06eb72c -> 6fc2b6541
[SPARK-11888][ML] Decision tree persistence in spark.ml
### What changes were proposed in this pull request?
Made these MLReadable and MLWritable: DecisionTreeClassifier,
DecisionTreeClassificationModel,
Repository: spark
Updated Branches:
refs/heads/master b39594472 -> 1970d911d
[SPARK-14018][SQL] Use 64-bit num records in BenchmarkWholeStageCodegen
## What changes were proposed in this pull request?
500L << 20 is actually pretty close to 32-bit int limit. I was trying to
increase this to
Repository: spark
Updated Branches:
refs/heads/master 9412547e7 -> 5f6bdf97c
[SPARK-13281][CORE] Switch broadcast of RDD to exception from warning
## What changes were proposed in this pull request?
In SparkContext, throw Illegalargumentexception when trying to broadcast rdd
directly,
Repository: spark
Updated Branches:
refs/heads/master d1c193a2f -> de1a84e56
[SPARK-13926] Automatically use Kryo serializer when shuffling RDDs with simple
types
Because ClassTags are available when constructing ShuffledRDD we can use them
to automatically use Kryo for shuffle
Repository: spark
Updated Branches:
refs/heads/master eacd9d8ed -> d9e8f26d0
[SPARK-13924][SQL] officially support multi-insert
## What changes were proposed in this pull request?
There is a feature of hive SQL called multi-insert. For example:
```
FROM src
INSERT OVERWRITE TABLE dest1
Repository: spark
Updated Branches:
refs/heads/master 917f4000b -> c100d31dd
[SPARK-13873] [SQL] Avoid copy of UnsafeRow when there is no join in whole
stage codegen
## What changes were proposed in this pull request?
We need to copy the UnsafeRow since a Join could produce multiple rows
Repository: spark
Updated Branches:
refs/heads/master c11ea2e41 -> b39594472
[SPARK-14012][SQL] Extract VectorizedColumnReader from
VectorizedParquetRecordReader
## What changes were proposed in this pull request?
This is a minor followup on https://github.com/apache/spark/pull/11799 that
56 matches
Mail list logo