[jira] [Commented] (SPARK-10670) Link to each language's API in codetabs in ML docs: spark.ml

2015-09-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905898#comment-14905898 ] Apache Spark commented on SPARK-10670: -- User 'hhbyyh' has created a pull request for

[jira] [Assigned] (SPARK-10670) Link to each language's API in codetabs in ML docs: spark.ml

2015-09-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10670: Assignee: (was: Apache Spark) > Link to each language's API in codetabs in ML docs: sp

[jira] [Assigned] (SPARK-10670) Link to each language's API in codetabs in ML docs: spark.ml

2015-09-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10670: Assignee: Apache Spark > Link to each language's API in codetabs in ML docs: spark.ml > --

[jira] [Commented] (SPARK-10659) DataFrames and SparkSQL saveAsParquetFile does not preserve REQUIRED (not nullable) flag in schema

2015-09-23 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905856#comment-14905856 ] Cheng Lian commented on SPARK-10659: This behavior had once been a hacky way to worka

[jira] [Comment Edited] (SPARK-10659) DataFrames and SparkSQL saveAsParquetFile does not preserve REQUIRED (not nullable) flag in schema

2015-09-23 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905856#comment-14905856 ] Cheng Lian edited comment on SPARK-10659 at 9/24/15 5:51 AM: -

[jira] [Updated] (SPARK-10763) Update Java MLLIB/ML tests to use simplified dataframe construction

2015-09-23 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10763: -- Assignee: holdenk > Update Java MLLIB/ML tests to use simplified dataframe construction > -

[jira] [Resolved] (SPARK-10763) Update Java MLLIB/ML tests to use simplified dataframe construction

2015-09-23 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-10763. --- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 8886 [https://gi

[jira] [Updated] (SPARK-10763) Update Java MLLIB/ML tests to use simplified dataframe construction

2015-09-23 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10763: -- Affects Version/s: 1.6.0 Target Version/s: 1.6.0 > Update Java MLLIB/ML tests to use simpl

[jira] [Created] (SPARK-10788) Decision Tree duplicates bins for unordered categorical features

2015-09-23 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-10788: - Summary: Decision Tree duplicates bins for unordered categorical features Key: SPARK-10788 URL: https://issues.apache.org/jira/browse/SPARK-10788 Project: S

[jira] [Commented] (SPARK-10770) SparkPlan.executeCollect/executeTake should return InternalRow rather than external Row

2015-09-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905817#comment-14905817 ] Apache Spark commented on SPARK-10770: -- User 'rxin' has created a pull request for t

[jira] [Assigned] (SPARK-10709) When loading a json dataset as a data frame, if the input path is wrong, the error message is very confusing

2015-09-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10709: Assignee: Apache Spark > When loading a json dataset as a data frame, if the input path is

[jira] [Assigned] (SPARK-10709) When loading a json dataset as a data frame, if the input path is wrong, the error message is very confusing

2015-09-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10709: Assignee: (was: Apache Spark) > When loading a json dataset as a data frame, if the in

[jira] [Commented] (SPARK-10709) When loading a json dataset as a data frame, if the input path is wrong, the error message is very confusing

2015-09-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905803#comment-14905803 ] Apache Spark commented on SPARK-10709: -- User 'navis' has created a pull request for

[jira] [Updated] (SPARK-10787) Reset ObjectOutputStream more often to prevent OOME

2015-09-23 Thread Ted Yu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-10787: --- Description: In the thread, Spark ClosureCleaner or java serializer OOM when trying to grow (http://search-h

[jira] [Assigned] (SPARK-10787) Reset ObjectOutputStream more often to prevent OOME

2015-09-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10787: Assignee: Apache Spark > Reset ObjectOutputStream more often to prevent OOME > ---

[jira] [Assigned] (SPARK-10787) Reset ObjectOutputStream more often to prevent OOME

2015-09-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10787: Assignee: (was: Apache Spark) > Reset ObjectOutputStream more often to prevent OOME >

[jira] [Commented] (SPARK-10787) Reset ObjectOutputStream more often to prevent OOME

2015-09-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905769#comment-14905769 ] Apache Spark commented on SPARK-10787: -- User 'tedyu' has created a pull request for

[jira] [Created] (SPARK-10787) Reset ObjectOutputStream more often to prevent OOME

2015-09-23 Thread Ted Yu (JIRA)
Ted Yu created SPARK-10787: -- Summary: Reset ObjectOutputStream more often to prevent OOME Key: SPARK-10787 URL: https://issues.apache.org/jira/browse/SPARK-10787 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-10086) Flaky StreamingKMeans test in PySpark

2015-09-23 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905731#comment-14905731 ] Tathagata Das commented on SPARK-10086: --- Actually never mind, its already in eventu

[jira] [Commented] (SPARK-10086) Flaky StreamingKMeans test in PySpark

2015-09-23 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905729#comment-14905729 ] Tathagata Das commented on SPARK-10086: --- You could use a maintain a counter for the

[jira] [Resolved] (SPARK-10692) Failed batches are never reported through the StreamingListener interface

2015-09-23 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das resolved SPARK-10692. --- Resolution: Fixed Fix Version/s: 1.6.0 1.5.1 > Failed batches are n

[jira] [Closed] (SPARK-10474) TungstenAggregation cannot acquire memory for pointer array after switching to sort-based

2015-09-23 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-10474. - Resolution: Fixed > TungstenAggregation cannot acquire memory for pointer array after switching > to sor

[jira] [Assigned] (SPARK-10786) SparkSQLCLIDriver should take the whole statement to generate the CommandProcessor

2015-09-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10786: Assignee: Apache Spark > SparkSQLCLIDriver should take the whole statement to generate the

[jira] [Assigned] (SPARK-10786) SparkSQLCLIDriver should take the whole statement to generate the CommandProcessor

2015-09-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10786: Assignee: (was: Apache Spark) > SparkSQLCLIDriver should take the whole statement to g

[jira] [Commented] (SPARK-10786) SparkSQLCLIDriver should take the whole statement to generate the CommandProcessor

2015-09-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905692#comment-14905692 ] Apache Spark commented on SPARK-10786: -- User 'SaintBacchus' has created a pull reque

[jira] [Created] (SPARK-10786) SparkSQLCLIDriver should take the whole statement to generate the CommandProcessor

2015-09-23 Thread SaintBacchus (JIRA)
SaintBacchus created SPARK-10786: Summary: SparkSQLCLIDriver should take the whole statement to generate the CommandProcessor Key: SPARK-10786 URL: https://issues.apache.org/jira/browse/SPARK-10786 Pr

[jira] [Updated] (SPARK-10741) Hive Query Having/OrderBy against Parquet table is not working

2015-09-23 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-10741: - Assignee: Wenchen Fan > Hive Query Having/OrderBy against Parquet table is not working > ---

[jira] [Resolved] (SPARK-6028) Provide an alternative RPC implementation based on the network transport module

2015-09-23 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-6028. Resolution: Fixed Fix Version/s: 1.6.0 > Provide an alternative RPC implementation based on t

[jira] [Assigned] (SPARK-10724) SQL's floor() returns DOUBLE

2015-09-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10724: Assignee: (was: Apache Spark) > SQL's floor() returns DOUBLE > ---

[jira] [Assigned] (SPARK-10724) SQL's floor() returns DOUBLE

2015-09-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10724: Assignee: Apache Spark > SQL's floor() returns DOUBLE > > >

[jira] [Commented] (SPARK-10724) SQL's floor() returns DOUBLE

2015-09-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905658#comment-14905658 ] Apache Spark commented on SPARK-10724: -- User 'navis' has created a pull request for

[jira] [Commented] (SPARK-10692) Failed batches are never reported through the StreamingListener interface

2015-09-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905653#comment-14905653 ] Apache Spark commented on SPARK-10692: -- User 'tdas' has created a pull request for t

[jira] [Updated] (SPARK-10043) Add window functions into SparkR

2015-09-23 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-10043: Target Version/s: 1.6.0 (was: 1.5.1, 1.6.0) > Add window functions into SparkR > -

[jira] [Updated] (SPARK-10692) Failed batches are never reported through the StreamingListener interface

2015-09-23 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-10692: Priority: Critical (was: Blocker) > Failed batches are never reported through the StreamingListene

[jira] [Updated] (SPARK-10538) java.lang.NegativeArraySizeException during join

2015-09-23 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-10538: Target Version/s: 1.5.2 (was: 1.5.1) > java.lang.NegativeArraySizeException during join >

[jira] [Updated] (SPARK-8115) Remove TestData

2015-09-23 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-8115: --- Target Version/s: 1.6.0 (was: 1.6.0, 1.5.1) > Remove TestData > --- > > K

[jira] [Updated] (SPARK-9841) Params.clear needs to be public

2015-09-23 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-9841: - Shepherd: Joseph K. Bradley Assignee: holdenk > Params.clear needs to be public >

[jira] [Updated] (SPARK-9798) CrossValidatorModel Documentation Improvements

2015-09-23 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-9798: - Shepherd: Joseph K. Bradley Target Version/s: 1.6.0 > CrossValidatorModel Docu

[jira] [Commented] (SPARK-9798) CrossValidatorModel Documentation Improvements

2015-09-23 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905573#comment-14905573 ] Joseph K. Bradley commented on SPARK-9798: -- This is a very small task, so I don't

[jira] [Created] (SPARK-10785) Scale QuantileDiscretizer using distributed binning

2015-09-23 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-10785: - Summary: Scale QuantileDiscretizer using distributed binning Key: SPARK-10785 URL: https://issues.apache.org/jira/browse/SPARK-10785 Project: Spark

[jira] [Updated] (SPARK-5890) Add QuantileDiscretizer

2015-09-23 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-5890: - Description: A `QuantileDiscretizer` takes a column with continuous features and outputs a

[jira] [Updated] (SPARK-5890) Add QuantileDiscretizer

2015-09-23 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-5890: - Summary: Add QuantileDiscretizer (was: Add FeatureDiscretizer) > Add QuantileDiscretizer

[jira] [Resolved] (SPARK-10731) The head() implementation of dataframe is very slow

2015-09-23 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-10731. - Resolution: Fixed Assignee: Reynold Xin Fix Version/s: 1.5.1 1.

[jira] [Resolved] (SPARK-10699) Support checkpointInterval can be disabled

2015-09-23 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley resolved SPARK-10699. --- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 8820 [ht

[jira] [Assigned] (SPARK-10741) Hive Query Having/OrderBy against Parquet table is not working

2015-09-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10741: Assignee: (was: Apache Spark) > Hive Query Having/OrderBy against Parquet table is not

[jira] [Commented] (SPARK-10741) Hive Query Having/OrderBy against Parquet table is not working

2015-09-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905479#comment-14905479 ] Apache Spark commented on SPARK-10741: -- User 'cloud-fan' has created a pull request

[jira] [Assigned] (SPARK-10741) Hive Query Having/OrderBy against Parquet table is not working

2015-09-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10741: Assignee: Apache Spark > Hive Query Having/OrderBy against Parquet table is not working >

[jira] [Updated] (SPARK-10086) Flaky StreamingKMeans test in PySpark

2015-09-23 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-10086: -- Description: Here's a report on investigating test failures in StreamingKMeans in PySpa

[jira] [Updated] (SPARK-10086) Flaky StreamingKMeans test in PySpark

2015-09-23 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-10086: -- Description: Here's a report on investigating test failures in StreamingKMeans in PySpa

[jira] [Updated] (SPARK-10668) Use WeightedLeastSquares in LinearRegression with L2 regularization if the number of features is small

2015-09-23 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10668: -- Shepherd: Xiangrui Meng > Use WeightedLeastSquares in LinearRegression with L2 regularization i

[jira] [Updated] (SPARK-10086) Flaky StreamingKMeans test in PySpark

2015-09-23 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-10086: -- Description: Here's a report on investigating test failures in StreamingKMeans in PySpa

[jira] [Resolved] (SPARK-10686) Add quantileCol to AFTSurvivalRegression

2015-09-23 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-10686. --- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 8836 [https://gi

[jira] [Commented] (SPARK-8616) SQLContext doesn't handle tricky column names when loading from JDBC

2015-09-23 Thread Rick Hillegas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905409#comment-14905409 ] Rick Hillegas commented on SPARK-8616: -- The following email thread may be useful for

[jira] [Created] (SPARK-10784) Flaky Streaming ML test umbrella

2015-09-23 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-10784: - Summary: Flaky Streaming ML test umbrella Key: SPARK-10784 URL: https://issues.apache.org/jira/browse/SPARK-10784 Project: Spark Issue Type: Umbrel

[jira] [Resolved] (SPARK-9715) Store numFeatures in all ML PredictionModel types

2015-09-23 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley resolved SPARK-9715. -- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 8675 [https

[jira] [Comment Edited] (SPARK-10741) Hive Query Having/OrderBy against Parquet table is not working

2015-09-23 Thread Ian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905327#comment-14905327 ] Ian edited comment on SPARK-10741 at 9/23/15 9:45 PM: -- Yes, going th

[jira] [Created] (SPARK-10783) Do track the pointer array in UnsafeInMemorySorter

2015-09-23 Thread Andrew Or (JIRA)
Andrew Or created SPARK-10783: - Summary: Do track the pointer array in UnsafeInMemorySorter Key: SPARK-10783 URL: https://issues.apache.org/jira/browse/SPARK-10783 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-4885) Enable fetched blocks to exceed 2 GB

2015-09-23 Thread Sai Nishanth Parepally (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905338#comment-14905338 ] Sai Nishanth Parepally commented on SPARK-4885: --- I am using spark 1.4.1 and

[jira] [Commented] (SPARK-10767) Make pyspark shared params codegen more consistent

2015-09-23 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905329#comment-14905329 ] holdenk commented on SPARK-10767: - My plan was to wait for that PR to go in and then do t

[jira] [Comment Edited] (SPARK-10741) Hive Query Having/OrderBy against Parquet table is not working

2015-09-23 Thread Ian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905327#comment-14905327 ] Ian edited comment on SPARK-10741 at 9/23/15 9:36 PM: -- Yes, going th

[jira] [Commented] (SPARK-10741) Hive Query Having/OrderBy against Parquet table is not working

2015-09-23 Thread Ian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905327#comment-14905327 ] Ian commented on SPARK-10741: - Yes, going through all rules when resolve Sort on Aggregate i

[jira] [Commented] (SPARK-10767) Make pyspark shared params codegen more consistent

2015-09-23 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905322#comment-14905322 ] Joseph K. Bradley commented on SPARK-10767: --- Oh, yeah, that is annoying. + 1

[jira] [Commented] (SPARK-10767) Make pyspark shared params codegen more consistent

2015-09-23 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905303#comment-14905303 ] holdenk commented on SPARK-10767: - Updated the description, sorry about that. This comes

[jira] [Commented] (SPARK-10474) TungstenAggregation cannot acquire memory for pointer array after switching to sort-based

2015-09-23 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905299#comment-14905299 ] Andrew Or commented on SPARK-10474: --- Alright, I think this should fix it for real: http

[jira] [Updated] (SPARK-10767) Make pyspark shared params codegen more consistent

2015-09-23 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-10767: Description: Namely "." shows up in some places in the template when using the param docstring and not in o

[jira] [Commented] (SPARK-10474) TungstenAggregation cannot acquire memory for pointer array after switching to sort-based

2015-09-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905291#comment-14905291 ] Apache Spark commented on SPARK-10474: -- User 'andrewor14' has created a pull request

[jira] [Assigned] (SPARK-10622) Race condition between scheduler and YARN executor status update

2015-09-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10622: Assignee: (was: Apache Spark) > Race condition between scheduler and YARN executor sta

[jira] [Commented] (SPARK-10622) Race condition between scheduler and YARN executor status update

2015-09-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905285#comment-14905285 ] Apache Spark commented on SPARK-10622: -- User 'vanzin' has created a pull request for

[jira] [Assigned] (SPARK-10622) Race condition between scheduler and YARN executor status update

2015-09-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10622: Assignee: Apache Spark > Race condition between scheduler and YARN executor status update

[jira] [Commented] (SPARK-10474) TungstenAggregation cannot acquire memory for pointer array after switching to sort-based

2015-09-23 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905279#comment-14905279 ] Andrew Or commented on SPARK-10474: --- Re-opening this because I found the real cause for

[jira] [Resolved] (SPARK-10733) TungstenAggregation cannot acquire page after switching to sort-based

2015-09-23 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-10733. --- Resolution: Duplicate Looks like this is a duplicate of SPARK-10474 after all. I'm closing this... >

[jira] [Reopened] (SPARK-10474) TungstenAggregation cannot acquire memory for pointer array after switching to sort-based

2015-09-23 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reopened SPARK-10474: --- > TungstenAggregation cannot acquire memory for pointer array after switching > to sort-based >

[jira] [Commented] (SPARK-10413) Model should support prediction on single instance

2015-09-23 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905230#comment-14905230 ] Joseph K. Bradley commented on SPARK-10413: --- For API, I think my main question

[jira] [Comment Edited] (SPARK-9836) Provide R-like summary statistics for ordinary least squares via normal equation solver

2015-09-23 Thread Mohamed Baddar (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904471#comment-14904471 ] Mohamed Baddar edited comment on SPARK-9836 at 9/23/15 8:39 PM:

[jira] [Commented] (SPARK-10767) Make pyspark shared params codegen more consistent

2015-09-23 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905196#comment-14905196 ] Joseph K. Bradley commented on SPARK-10767: --- What issues specifically? > Make

[jira] [Commented] (SPARK-10782) Duplicate examples for drop_duplicates and DropDuplicates

2015-09-23 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905180#comment-14905180 ] Sean Owen commented on SPARK-10782: --- Looks like you're right, feel free to make a PR wi

[jira] [Commented] (SPARK-10741) Hive Query Having/OrderBy against Parquet table is not working

2015-09-23 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905166#comment-14905166 ] Yin Huai commented on SPARK-10741: -- The second options sounds better. > Hive Query Havi

[jira] [Created] (SPARK-10782) Duplicate examples for drop_duplicates and DropDuplicates

2015-09-23 Thread Asoka Diggs (JIRA)
Asoka Diggs created SPARK-10782: --- Summary: Duplicate examples for drop_duplicates and DropDuplicates Key: SPARK-10782 URL: https://issues.apache.org/jira/browse/SPARK-10782 Project: Spark Issue

[jira] [Created] (SPARK-10781) Allow certain number of failed tasks and allow job to succeed

2015-09-23 Thread Thomas Graves (JIRA)
Thomas Graves created SPARK-10781: - Summary: Allow certain number of failed tasks and allow job to succeed Key: SPARK-10781 URL: https://issues.apache.org/jira/browse/SPARK-10781 Project: Spark

[jira] [Commented] (SPARK-10733) TungstenAggregation cannot acquire page after switching to sort-based

2015-09-23 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905083#comment-14905083 ] Yin Huai commented on SPARK-10733: -- Can you attach your query plan? > TungstenAggregati

[jira] [Comment Edited] (SPARK-10733) TungstenAggregation cannot acquire page after switching to sort-based

2015-09-23 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904964#comment-14904964 ] Yin Huai edited comment on SPARK-10733 at 9/23/15 7:23 PM: --- [~j

[jira] [Commented] (SPARK-10741) Hive Query Having/OrderBy against Parquet table is not working

2015-09-23 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905069#comment-14905069 ] Wenchen Fan commented on SPARK-10741: - This bug is caused by a conflict between 2 tri

[jira] [Updated] (SPARK-10494) Multiple Python UDFs together with aggregation or sort merge join may cause OOM (failed to acquire memory)

2015-09-23 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-10494: Fix Version/s: 1.6.0 > Multiple Python UDFs together with aggregation or sort merge join may cause

[jira] [Created] (SPARK-10780) Set initialModel in KMeans in Pipelines API

2015-09-23 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-10780: - Summary: Set initialModel in KMeans in Pipelines API Key: SPARK-10780 URL: https://issues.apache.org/jira/browse/SPARK-10780 Project: Spark Issue T

[jira] [Updated] (SPARK-10780) Set initialModel in KMeans in Pipelines API

2015-09-23 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-10780: -- Description: This is for the Scala version. After this is merged, create a JIRA for Py

[jira] [Created] (SPARK-10779) Set initialModel for KMeans model in PySpark (spark.mllib)

2015-09-23 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-10779: - Summary: Set initialModel for KMeans model in PySpark (spark.mllib) Key: SPARK-10779 URL: https://issues.apache.org/jira/browse/SPARK-10779 Project: Spark

[jira] [Created] (SPARK-10778) Implement toString for AssociationRules.Rule

2015-09-23 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-10778: - Summary: Implement toString for AssociationRules.Rule Key: SPARK-10778 URL: https://issues.apache.org/jira/browse/SPARK-10778 Project: Spark Issue Type: Ne

[jira] [Updated] (SPARK-10778) Implement toString for AssociationRules.Rule

2015-09-23 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10778: -- Description: pretty print for association rules, e.g. {code} {a, b, c} => {d}: 0.8 {code} w

[jira] [Resolved] (SPARK-10403) UnsafeRowSerializer can't work with UnsafeShuffleManager (tungsten-sort)

2015-09-23 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-10403. -- Resolution: Fixed Fix Version/s: 1.5.1 1.6.0 Issue resolved b

[jira] [Created] (SPARK-10777) order by fails when column is aliased and projection includes windowed aggregate

2015-09-23 Thread N Campbell (JIRA)
N Campbell created SPARK-10777: -- Summary: order by fails when column is aliased and projection includes windowed aggregate Key: SPARK-10777 URL: https://issues.apache.org/jira/browse/SPARK-10777 Project:

[jira] [Commented] (SPARK-10728) Failed to set Jenkins Identity header on email.

2015-09-23 Thread shane knapp (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904986#comment-14904986 ] shane knapp commented on SPARK-10728: - well, the spark project doesn't need to fix it

[jira] [Commented] (SPARK-10728) Failed to set Jenkins Identity header on email.

2015-09-23 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904979#comment-14904979 ] Xiangrui Meng commented on SPARK-10728: --- This is still an issue, though not high-pr

[jira] [Updated] (SPARK-10728) Failed to set Jenkins Identity header on email.

2015-09-23 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10728: -- Affects Version/s: (was: 1.6.0) > Failed to set Jenkins Identity header on email. > ---

[jira] [Assigned] (SPARK-10763) Update Java MLLIB/ML tests to use simplified dataframe construction

2015-09-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10763: Assignee: Apache Spark > Update Java MLLIB/ML tests to use simplified dataframe constructi

[jira] [Updated] (SPARK-10728) Failed to set Jenkins Identity header on email.

2015-09-23 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10728: -- Labels: (was: flaky-test) > Failed to set Jenkins Identity header on email. > ---

[jira] [Updated] (SPARK-10728) Failed to set Jenkins Identity header on email.

2015-09-23 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10728: -- Target Version/s: (was: 1.6.0) > Failed to set Jenkins Identity header on email. > --

[jira] [Assigned] (SPARK-10763) Update Java MLLIB/ML tests to use simplified dataframe construction

2015-09-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10763: Assignee: (was: Apache Spark) > Update Java MLLIB/ML tests to use simplified dataframe

[jira] [Commented] (SPARK-10763) Update Java MLLIB/ML tests to use simplified dataframe construction

2015-09-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904978#comment-14904978 ] Apache Spark commented on SPARK-10763: -- User 'holdenk' has created a pull request fo

[jira] [Commented] (SPARK-10733) TungstenAggregation cannot acquire page after switching to sort-based

2015-09-23 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904964#comment-14904964 ] Yin Huai commented on SPARK-10733: -- [~jameszhouyi] Another two places for logging are {

[jira] [Updated] (SPARK-10659) DataFrames and SparkSQL saveAsParquetFile does not preserve REQUIRED (not nullable) flag in schema

2015-09-23 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10659: --- Description: DataFrames currently automatically promotes all Parquet schema fields to optional when

  1   2   >