[GitHub] spark pull request #16928: [SPARK-18699][SQL] Put malformed tokens into a ne...

2017-02-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16928#discussion_r102363585 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala --- @@ -45,24 +46,41 @@ private[csv] class Univoci

[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation

2017-02-21 Thread sameeragarwal
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/16998 @viirya please correct me if I'm wrong but scanning through this patch, it appears that the underlying problem is that duplicating and tracking aliased constraints using a `Set` tends to blow

[GitHub] spark pull request #16928: [SPARK-18699][SQL] Put malformed tokens into a ne...

2017-02-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16928#discussion_r102363460 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala --- @@ -96,31 +96,44 @@ class CSVFileFormat extends

[GitHub] spark issue #16928: [SPARK-18699][SQL] Put malformed tokens into a new field...

2017-02-21 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/16928 @cloud-fan okay, so I'll make this pr pending for now. Then, I'll make a new pr to fix the json behaivour. --- If your project is set up for it, you can reply to this email and have your reply appea

[GitHub] spark pull request #16928: [SPARK-18699][SQL] Put malformed tokens into a ne...

2017-02-21 Thread maropu
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/16928#discussion_r102362652 --- Diff: python/pyspark/sql/readwriter.py --- @@ -303,8 +303,9 @@ def text(self, paths): def csv(self, path, schema=None, sep=None, encoding=None, q

[GitHub] spark issue #17015: [SPARK-19678][SQL] remove MetastoreRelation

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17015 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #17015: [SPARK-19678][SQL] remove MetastoreRelation

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17015 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73241/ Test FAILed. ---

[GitHub] spark pull request #16928: [SPARK-18699][SQL] Put malformed tokens into a ne...

2017-02-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16928#discussion_r102362562 --- Diff: python/pyspark/sql/readwriter.py --- @@ -303,8 +303,9 @@ def text(self, paths): def csv(self, path, schema=None, sep=None, encoding=None

[GitHub] spark issue #17015: [SPARK-19678][SQL] remove MetastoreRelation

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17015 **[Test build #73241 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73241/testReport)** for PR 17015 at commit [`247d3df`](https://github.com/apache/spark/commit/2

[GitHub] spark issue #16928: [SPARK-18699][SQL] Put malformed tokens into a new field...

2017-02-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16928 The CSV behavior makes more sense, we should send a new PR for json to fix it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If y

[GitHub] spark pull request #16923: [SPARK-19038][Hive][YARN] Correctly figure out ke...

2017-02-21 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/16923#discussion_r102362260 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -106,21 +106,27 @@ private[hive] class HiveClientImpl(

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/16970 @tdas I created https://issues.apache.org/jira/browse/SPARK-19690 to track the issue when joining a batch DataFrame with a streaming DataFrame. I will fix it in a separate PR to unblock this one as

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16970 **[Test build #73247 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73247/testReport)** for PR 16970 at commit [`78dfdfe`](https://github.com/apache/spark/commit/78

[GitHub] spark pull request #16923: [SPARK-19038][Hive][YARN] Correctly figure out ke...

2017-02-21 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/16923#discussion_r102361738 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -106,21 +106,27 @@ private[hive] class HiveClientImpl(

[GitHub] spark pull request #16923: [SPARK-19038][Hive][YARN] Correctly figure out ke...

2017-02-21 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/16923#discussion_r102361677 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -106,21 +106,33 @@ private[hive] class HiveClientImpl(

[GitHub] spark issue #17004: [SPARK-19670] [SQL] [TEST] Enable Bucketed Table Reading...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17004 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73239/ Test PASSed. ---

[GitHub] spark issue #17004: [SPARK-19670] [SQL] [TEST] Enable Bucketed Table Reading...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17004 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #17004: [SPARK-19670] [SQL] [TEST] Enable Bucketed Table Reading...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17004 **[Test build #73239 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73239/testReport)** for PR 17004 at commit [`3ecf187`](https://github.com/apache/spark/commit/3

[GitHub] spark issue #17011: [SPARK-19676][CORE] Flaky test: FsHistoryProviderSuite.S...

2017-02-21 Thread uncleGen
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/17011 @srowen @vanzin I will test in some other platforms. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have th

[GitHub] spark pull request #16785: [SPARK-19443][SQL] The function to generate const...

2017-02-21 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/16785#discussion_r102361204 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -314,7 +314,17 @@ abstract class UnaryNode

[GitHub] spark pull request #17015: [SPARK-19678][SQL] remove MetastoreRelation

2017-02-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17015#discussion_r102360866 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -349,36 +350,42 @@ object CatalogTypes {

[GitHub] spark issue #17015: [SPARK-19678][SQL] remove MetastoreRelation

2017-02-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17015 I don't think `UnresolvedCatalogRelation` is necessary. Data source table can treat `CatalogRelation` as unresolved but it's its own business. When we replace `CatalogRelation` to `LogicalRelation

[GitHub] spark pull request #16923: [SPARK-19038][Hive][YARN] Correctly figure out ke...

2017-02-21 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/16923#discussion_r102360204 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -106,21 +106,27 @@ private[hive] class HiveClientImpl(

[GitHub] spark pull request #16977: [SPARK-19651][CORE] ParallelCollectionRDD.collect...

2017-02-21 Thread cloud-fan
Github user cloud-fan closed the pull request at: https://github.com/apache/spark/pull/16977 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #16977: [SPARK-19651][CORE] ParallelCollectionRDD.collect should...

2017-02-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16977 I'm closing this as it breaks checkpoint, although we can work around this, but I don't think it worth a workaround to optimize `ParallelCollectionRDD` as it's only used for demo or tests. --- I

[GitHub] spark pull request #16923: [SPARK-19038][Hive][YARN] Correctly figure out ke...

2017-02-21 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/16923#discussion_r102359272 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -106,21 +106,27 @@ private[hive] class HiveClientImpl(

[GitHub] spark pull request #16923: [SPARK-19038][Hive][YARN] Correctly figure out ke...

2017-02-21 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/16923#discussion_r102358896 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -106,21 +106,27 @@ private[hive] class HiveClientImpl(

[GitHub] spark pull request #16923: [SPARK-19038][Hive][YARN] Correctly figure out ke...

2017-02-21 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/16923#discussion_r102358534 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -106,21 +106,27 @@ private[hive] class HiveClientImpl(

[GitHub] spark pull request #16977: [SPARK-19651][CORE] ParallelCollectionRDD.collect...

2017-02-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16977#discussion_r102357800 --- Diff: core/src/main/scala/org/apache/spark/rdd/ParallelCollectionRDD.scala --- @@ -105,6 +105,17 @@ private[spark] class ParallelCollectionRDD[T: Cla

[GitHub] spark issue #17015: [SPARK-19678][SQL] remove MetastoreRelation

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17015 **[Test build #73246 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73246/testReport)** for PR 17015 at commit [`b61910e`](https://github.com/apache/spark/commit/b6

[GitHub] spark pull request #16923: [SPARK-19038][Hive][YARN] Correctly figure out ke...

2017-02-21 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/16923#discussion_r102357394 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -106,21 +106,27 @@ private[hive] class HiveClientImpl(

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-21 Thread budde
Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 Updated the PR. Thanks for the work you've done on this! Hopefully I can have a PR for the builder interface up later this week. --- If your project is set up for it, you can reply to this email and

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16744 **[Test build #73245 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73245/testReport)** for PR 16744 at commit [`b4bf3a8`](https://github.com/apache/spark/commit/b4

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-21 Thread budde
Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r102356422 --- Diff: python/pyspark/streaming/kinesis.py --- @@ -67,6 +68,12 @@ def createStream(ssc, kinesisAppName, streamName, endpointUrl, regionName, :

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-21 Thread brkyvz
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/16744 Two final comments. Then I'll merge it pending tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-21 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r102356017 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/SerializableCredentialsProvider.scala --- @@ -0,0 +1,85 @@ +/* + *

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-21 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r102355832 --- Diff: python/pyspark/streaming/kinesis.py --- @@ -67,6 +68,12 @@ def createStream(ssc, kinesisAppName, streamName, endpointUrl, regionName,

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-21 Thread thunterdb
Github user thunterdb commented on the issue: https://github.com/apache/spark/pull/15770 @wangmiao1981 yes I had seen the discussions there. I believe that eventually PIC should be moved into graphframes, but we can have a simple API in `spark.ml` for the time being. --- If your pro

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16744 **[Test build #73244 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73244/testReport)** for PR 16744 at commit [`d15affb`](https://github.com/apache/spark/commit/d1

[GitHub] spark issue #17019: [SPARK-19652][UI] Do auth checks for REST API access (br...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17019 **[Test build #73243 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73243/testReport)** for PR 17019 at commit [`21006ff`](https://github.com/apache/spark/commit/21

[GitHub] spark issue #17019: [SPARK-19652][UI] Do auth checks for REST API access (br...

2017-02-21 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/17019 There as a conflict in the MimaExcludes list, just checking that I didn't break it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request #17019: [SPARK-19652][UI] Do auth checks for REST API acc...

2017-02-21 Thread vanzin
GitHub user vanzin opened a pull request: https://github.com/apache/spark/pull/17019 [SPARK-19652][UI] Do auth checks for REST API access (branch-2.1). The REST API has a security filter that performs auth checks based on the UI root's security manager. That works fine when t

[GitHub] spark pull request #16978: [SPARK-19652][UI] Do auth checks for REST API acc...

2017-02-21 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16978 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is ena

[GitHub] spark issue #16978: [SPARK-19652][UI] Do auth checks for REST API access.

2017-02-21 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/16978 Merging to master. Will also try 2.1 and 2.0 (where we made SSL fixes, so this might be useful). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHu

[GitHub] spark issue #16946: [SPARK-19554][UI,YARN] Allow SHS URL to be used for trac...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16946 **[Test build #73242 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73242/testReport)** for PR 16946 at commit [`5aef8eb`](https://github.com/apache/spark/commit/5a

[GitHub] spark pull request #17013: [SPARK-19666][SQL] Improve error message for Java...

2017-02-21 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17013#discussion_r102352788 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala --- @@ -123,7 +123,11 @@ object JavaTypeInference {

[GitHub] spark issue #17015: [SPARK-19678][SQL] remove MetastoreRelation

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17015 **[Test build #73241 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73241/testReport)** for PR 17015 at commit [`247d3df`](https://github.com/apache/spark/commit/24

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-21 Thread budde
Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 Missed updating a test, my mistake. Fixing now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feat

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16744 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16744 **[Test build #73240 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73240/testReport)** for PR 16744 at commit [`11b3b64`](https://github.com/apache/spark/commit/1

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16744 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73240/ Test FAILed. ---

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-21 Thread budde
Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r102351716 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/SerializableCredentialsProvider.scala --- @@ -0,0 +1,85 @@ +/* + * L

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-21 Thread budde
Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 @brkyvz I've updated the PR per your feedback. ```BasicAWSCredentials``` will raise a ```java.lang.IllegalArgumentException``` if either keypair value is null so I elected to wrap ```BasicCredentialsP

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16744 **[Test build #73240 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73240/testReport)** for PR 16744 at commit [`11b3b64`](https://github.com/apache/spark/commit/11

[GitHub] spark pull request #16977: [SPARK-19651][CORE] ParallelCollectionRDD.collect...

2017-02-21 Thread rezasafi
Github user rezasafi commented on a diff in the pull request: https://github.com/apache/spark/pull/16977#discussion_r102350887 --- Diff: core/src/main/scala/org/apache/spark/rdd/ParallelCollectionRDD.scala --- @@ -105,6 +105,17 @@ private[spark] class ParallelCollectionRDD[T: Clas

[GitHub] spark issue #16842: [SPARK-19304] [Streaming] [Kinesis] fix kinesis slow che...

2017-02-21 Thread brkyvz
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/16842 @srowen Do you know if we make the field of a case class an `Option` and default it as `None`, would it still fail Java deserialization --- If your project is set up for it, you can reply to this em

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15770 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73238/ Test PASSed. ---

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15770 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15770 **[Test build #73238 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73238/testReport)** for PR 15770 at commit [`f53765b`](https://github.com/apache/spark/commit/f

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-21 Thread budde
Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 @brkyvz I actually think that Scaladoc may be outdated– I double checked the current master branch and it looks like ```KinesisUtils.createStream()``` will still provide Some(SerializableAWSCredenti

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-21 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r102345983 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/SerializableCredentialsProvider.scala --- @@ -0,0 +1,73 @@ +/* + *

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-21 Thread brkyvz
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/16744 @budde The scaladocs mention ``` * @param awsAccessKeyId AWS AccessKeyId (if null, will use DefaultAWSCredentialsProviderChain) * @param awsSecretKey AWS SecretKey (if null, will use De

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-21 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 I am checking ALS out to understand your suggestions. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project doe

[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...

2017-02-21 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/13440 @thunterdb Can you take a look? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this featur

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-21 Thread budde
Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 So, if these values are ```null``` we'll still be passing them to construct a ```BasicCredentialsProvider``` to pass as ```STSCredentialsProvider.longLivedCredentialsProvider```. I could add a check

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-21 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 Yanbo Liang added a comment - 02/Nov/16 09:30 - edited I'm prefer to #1 and #3, but it looks like we can achieve both goals. Graph can be represented by GraphX/GraphFrame or DataFram

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-21 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 Joseph K. Bradley added a comment - 31/Oct/16 18:14 Miao Wang Sorry for the slow response here. I do want us to add PIC to spark.ml, but we should discuss the design before the PR. Coul

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-21 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 @thunterdb Thanks for your response. In the original JIRA, we have discussed why we want it to be a transformer. Let me find it and post it here. --- If your project is set up for it, you can

[GitHub] spark pull request #16842: [SPARK-19304] [Streaming] [Kinesis] fix kinesis s...

2017-02-21 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/16842#discussion_r10238 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala --- @@ -36,7 +36,8 @@ import org.apache.spark.u

[GitHub] spark pull request #16842: [SPARK-19304] [Streaming] [Kinesis] fix kinesis s...

2017-02-21 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/16842#discussion_r102343964 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala --- @@ -204,10 +208,11 @@ class KinesisSequence

[GitHub] spark issue #17004: [SPARK-19670] [SQL] [TEST] Enable Bucketed Table Reading...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17004 **[Test build #73239 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73239/testReport)** for PR 17004 at commit [`3ecf187`](https://github.com/apache/spark/commit/3e

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-21 Thread brkyvz
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/16744 Can't they still use `null` to use the `DefaultProviderChain`? It's still supported, right? We're only forcing them to provide a `messageHandler`. --- If your project is set up for it, you can reply

[GitHub] spark issue #17004: [SPARK-19670] [SQL] [TEST] Enable Bucketed Table Reading...

2017-02-21 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17004 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-21 Thread budde
Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 @brkyvz I share your concerns around expanding this API further than necessary. I think I'm okay with this as long as we're fairly confident the builder pattern work can be merged in the same Spark re

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-21 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r102340032 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisUtils.scala --- @@ -341,6 +601,127 @@ object KinesisUtils {

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-21 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r102340101 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisUtils.scala --- @@ -404,8 +785,112 @@ object KinesisUtils {

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-21 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r102340139 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisUtils.scala --- @@ -404,8 +785,112 @@ object KinesisUtils {

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-21 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r102339883 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisUtils.scala --- @@ -213,9 +347,135 @@ object KinesisUtils {

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-21 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r102339985 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisUtils.scala --- @@ -341,6 +601,127 @@ object KinesisUtils {

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-21 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r102339856 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisUtils.scala --- @@ -123,9 +123,143 @@ object KinesisUtils {

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-21 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r102339912 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisUtils.scala --- @@ -213,9 +347,135 @@ object KinesisUtils {

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-21 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r102339821 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisUtils.scala --- @@ -123,9 +123,143 @@ object KinesisUtils {

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-21 Thread brkyvz
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/16744 @budde I'm just concerned by the exponential blowoff of APIs. Here's my proposal. For both Java and Scala, let's just add the APIs with both STS token and AWS Key pair defined versions. I'm going

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-21 Thread thunterdb
Github user thunterdb commented on the issue: https://github.com/apache/spark/pull/15770 You are right, I had forgotten that for this algorithm, the input is the edges, and the output is the label for each of the vertices. This is a tricky algorithm to put as a transformer, si

[GitHub] spark pull request #17015: [SPARK-19678][SQL] remove MetastoreRelation

2017-02-21 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17015#discussion_r102338672 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -349,36 +350,42 @@ object CatalogTypes {

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15770 **[Test build #73238 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73238/testReport)** for PR 15770 at commit [`f53765b`](https://github.com/apache/spark/commit/f5

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-21 Thread budde
Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r102338189 --- Diff: python/pyspark/streaming/kinesis.py --- @@ -37,7 +37,8 @@ class KinesisUtils(object): def createStream(ssc, kinesisAppName, streamName, endp

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-21 Thread budde
Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r102338119 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisReceiver.scala --- @@ -78,8 +70,9 @@ case class SerializableAWSCreden

[GitHub] spark issue #17015: [SPARK-19678][SQL] remove MetastoreRelation

2017-02-21 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17015 `CatalogRelation` is always unresolved for data source tables, but already resolved for hive serde tables. Do you think we can have an unresolved `UnresolvedCatalogRelation` for both data source

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-21 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r102337912 --- Diff: python/pyspark/streaming/kinesis.py --- @@ -37,7 +37,8 @@ class KinesisUtils(object): def createStream(ssc, kinesisAppName, streamName, end

[GitHub] spark pull request #16975: [SPARK-19522] Fix executor memory in local-cluste...

2017-02-21 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/16975#discussion_r102337722 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -470,12 +470,25 @@ class SparkContext(config: SparkConf) extends Logging {

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2017-02-21 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r102337526 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/PowerIterationClusteringSuite.scala --- @@ -0,0 +1,153 @@ +/* + * Licensed to th

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-21 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r102337043 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisReceiver.scala --- @@ -78,8 +70,9 @@ case class SerializableAWSCrede

[GitHub] spark pull request #17013: [SPARK-19666][SQL] Improve error message for Java...

2017-02-21 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17013#discussion_r102336599 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala --- @@ -123,7 +123,11 @@ object JavaTypeInference {

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16970 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16970 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73236/ Test FAILed. ---

[GitHub] spark pull request #16714: [SPARK-16333][Core] Enable EventLoggingListener t...

2017-02-21 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/16714#discussion_r102336169 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -64,6 +64,12 @@ private[spark] class EventLoggingListener(

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16970 **[Test build #73236 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73236/testReport)** for PR 16970 at commit [`b2e9cb0`](https://github.com/apache/spark/commit/b

[GitHub] spark pull request #17013: [SPARK-19666][SQL] Improve error message for Java...

2017-02-21 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17013#discussion_r102335552 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala --- @@ -123,7 +123,11 @@ object JavaTypeInference {

[GitHub] spark pull request #16608: [SPARK-13721][SQL] Support outer generators in Da...

2017-02-21 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/16608#discussion_r102335398 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala --- @@ -163,9 +163,11 @@ object FunctionRegistry {

<    1   2   3   4   5   6   >