[GitHub] spark pull request #13531: [SPARK-15654] [SQL] fix non-splitable files for t...

2016-06-09 Thread maropu
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/13531#discussion_r66451629 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala --- @@ -298,6 +309,28 @@ trait FileFormat

[GitHub] spark issue #12173: [SPARK-13792][SQL] Limit logging of bad records in CSVRe...

2016-06-03 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/12173 @falaki ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13444: [SPARK-15530][SQL] Set #parallelism for file listing in ...

2016-06-03 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13444 @yhuai ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13413: [SPARK-15663][SQL] SparkSession.catalog.listFunct...

2016-06-03 Thread maropu
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/13413#discussion_r65796528 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -88,14 +106,6 @@ class SQLQuerySuite extends QueryTest

[GitHub] spark issue #13442: [SPARK-15654][SQL] Check if all the input files are spli...

2016-06-03 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13442 @rxin plz check this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13413: [SPARK-15663][SQL] SparkSession.catalog.listFunct...

2016-06-03 Thread maropu
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/13413#discussion_r65796510 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -58,15 +59,32 @@ class SQLQuerySuite extends QueryTest

[GitHub] spark pull request #13413: [SPARK-15663][SQL] SparkSession.catalog.listFunct...

2016-06-03 Thread maropu
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/13413#discussion_r65796505 --- Diff: python/pyspark/sql/tests.py --- @@ -1481,17 +1481,7 @@ def test_list_functions(self): spark.sql("CREATE DATABASE so

[GitHub] spark pull request: [SPARK-13484][SQL] Prevent illegal NULL propag...

2016-05-25 Thread maropu
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/11371#discussion_r64522917 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1443,6 +1445,32 @@ class Analyzer

[GitHub] spark pull request: [SPARK-15585][SQL] Fix NULL handling along wit...

2016-05-27 Thread maropu
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/13372 [SPARK-15585][SQL] Fix NULL handling along with a spark-csv behaivour ## What changes were proposed in this pull request? This pr fixes the behaviour of `format("csv").option(&qu

[GitHub] spark pull request: [SPARK-15585][SQL] Fix NULL handling along wit...

2016-05-27 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/13372#issuecomment-91079 @rxin Could you check this to satisfy your suggestion? If no problem, I'll also set default values in json options in a similar way. --- If your project is set up

[GitHub] spark issue #13137: [SPARK-15247][SQL] Set the default number of partitions ...

2016-06-14 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13137 @yhuai @liancheng okay, understood. Fixed and plz check again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #13137: [SPARK-15247][SQL] Set the default number of part...

2016-06-14 Thread maropu
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/13137#discussion_r67003218 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -795,11 +795,16 @@ private[sql] object

[GitHub] spark pull request: [SPARK-15585][SQL] Fix NULL handling along wit...

2016-05-28 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/13372#issuecomment-222316908 This is the meaningful suggestion. I think all the option validation should be done in a single place and, if there is an invalidate option, spark should throw clear

[GitHub] spark pull request: [SPARK-15585][SQL] Fix NULL handling along wit...

2016-05-28 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/13372#issuecomment-222310137 yea, it is a difficult question. We must define explicit a behaviour for `quote`, but I'm not sure about other options. So, An alternative idea is to define a special

[GitHub] spark pull request: [SPARK-13184][SQL] Add a datasource-specific o...

2016-05-29 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/13320#issuecomment-222346181 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-15528][SQL] Fix race condition in Numbe...

2016-05-29 Thread maropu
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/13391 [SPARK-15528][SQL] Fix race condition in NumberConverter ## What changes were proposed in this pull request? A local variable in NumberConverter is wrongly shared between threads. This pr

[GitHub] spark pull request: [SPARK-15247][SQL] Set the default number of p...

2016-05-30 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/13137#issuecomment-222465120 @liancheng ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-15528][SQL] Fix race condition in Numbe...

2016-05-30 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/13391#issuecomment-222465573 @rxin @tarekauel Seems this bug is a kind of critical issues, so we need to fix for the v2.0 release, thought? --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-13184][SQL] Add a datasource-specific o...

2016-05-26 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/13320#issuecomment-221915600 @rxin Could you check this again to satisfy your intention? Also, I'll make the description up-to-date later. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-15463][SQL] support creating dataframe ...

2016-05-25 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/13300#issuecomment-221747864 Do we still need the interface for RDD[String]? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...

2016-05-25 Thread maropu
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/13290#discussion_r64688437 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1448,6 +1450,37 @@ class Analyzer

[GitHub] spark issue #13442: [SPARK-15654][SQL] Check if all the input files are spli...

2016-06-02 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13442 @rxin yea, is the approach okay? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #13442: [SPARK-15654][SQL] Check if all the input files are spli...

2016-06-02 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13442 yes, not yet. I'll finish in a day. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-13184][SQL] Add a datasource-specific o...

2016-05-26 Thread maropu
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/13320 [SPARK-13184][SQL] Add a datasource-specific option minPartitions in HadoopFsRelation#options ## What changes were proposed in this pull request? This pr adds a new option `minPartitions

[GitHub] spark pull request: [SPARK-13184][SQL] Add a datasource-specific o...

2016-05-26 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/13320#issuecomment-221795557 oh, sorry. I'll check and fix it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #13444: [SPARK-15530][SQL] Set #parallelism for file listing in ...

2016-06-01 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13444 @yhuai I looked into the previous commit logs though, I'm also not sure why the option has a "threshold" at the end. However, as you said, I also thinks we can use this option

[GitHub] spark pull request #13444: [SPARK-15530][SQL] Set #parallelism for file list...

2016-06-01 Thread maropu
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/13444 [SPARK-15530][SQL] Set #parallelism for file listing in listLeafFilesInParallel ## What changes were proposed in this pull request? This pr is to set the number of parallelism to prevent file

[GitHub] spark issue #13442: [SPARK-15654][SQL] Check if all the input files are spli...

2016-06-01 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13442 oh, good suggestion. I looked around the related codes and it seems we can add a `canSplitFile` method in `FileFormat`. In the method, it instantiates `FileInputFormat` as corresponding to formats

[GitHub] spark issue #13137: [SPARK-15247][SQL] Set the default number of partitions ...

2016-06-01 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13137 @liancheng ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #12173: [SPARK-13792][SQL] Limit logging of bad records in CSVRe...

2016-06-01 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/12173 @falaki ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13442: [SPARK-15654][SQL] Check if all the input files a...

2016-06-01 Thread maropu
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/13442 [SPARK-15654][SQL] Check if all the input files are splittable in FileSourceStrategy ## What changes were proposed in this pull request? This pr is to check if all the input files

[GitHub] spark issue #13290: [SPARK-13484] [SQL] Prevent illegal NULL propagation whe...

2016-06-01 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13290 @yhuai LGTM. Could you add 'Close #113711' in a commit log if this pr merged into master? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

2016-05-31 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/10896 Thank for you comments! I'll check them in a few days. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-15528][SQL] Fix race condition in NumberConverter

2016-05-31 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/13391 thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request: [SPARK-15663] SparkSession.catalog.listFunctions shouldn...

2016-05-31 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/13413 I'm afraid this fix couldn't satisfies @rxin intention; it filters out not only built-in, but also user-defined temp funcs? --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #13847: [SPARK-16135][SQL] Implement hashCode and euqals in Unsa...

2016-06-22 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13847 Does the current implementation of `Vector.hashCode` have enough performance? If so, it's okay to follow the impl. to me. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #13802: [SPARK-16094][SQL] Support HashAggregateExec for non-par...

2016-06-22 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13802 No, I'd like to fix incorrect comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13847: [SPARK-16135][SQL] Implement hashCode and euqals in Unsa...

2016-06-22 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13847 At least, we'd be better to leave comments for that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #13802: [SPARK-16094][SQL] Support HashAggregateExec for non-par...

2016-06-22 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13802 @hvanhovell As for `UnsafeMapData`, could you check #13847? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #13847: [SPARK-16135][SQL] Implement hashCode and euqals in Unsa...

2016-06-22 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13847 okay, done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13847: [SPARK-16135][SQL] Implement hashCode and euqals in Unsa...

2016-06-22 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13847 It seems `UnsafeArrayData` already has its own `equals` and hashCode`. Currently, spark doesn't compare unsafe MapData though, I think this might cause implicit bugs. --- If your project

[GitHub] spark issue #13847: [SPARK-16135][SQL] Implement hashCode and euqals in Unsa...

2016-06-22 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13847 aha, yes. It'd better to take the same approach in `UnsafeRow`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #13847: [SPARK-16135][SQL] Implement hashCode and euqals in Unsa...

2016-06-22 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13847 okay, I'm fixing now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13847: [SPARK-16135][SQL] Implement hashCode and euqals ...

2016-06-22 Thread maropu
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/13847 [SPARK-16135][SQL] Implement hashCode and euqals in UnsafeMapData ## What changes were proposed in this pull request? This pr to implement `hashCode` and `euqals` in `UnsafeMapData` because

[GitHub] spark issue #13802: [SPARK-16094][SQL] Support HashAggregateExec for non-par...

2016-06-22 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13802 okay --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13847: [SPARK-16135][SQL] Implement hashCode and euqals ...

2016-06-22 Thread maropu
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/13847#discussion_r68100372 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -298,6 +298,10 @@ public UnsafeMapData getMap(int

[GitHub] spark pull request #13852: [SQL][DOC] Update a description for AggregateFunc...

2016-06-22 Thread maropu
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/13852 [SQL][DOC] Update a description for AggregateFunction#supportsPartial ## What changes were proposed in this pull request? Update a doc because it's stale. This is a trivial fix, so I didn't

[GitHub] spark issue #13802: [SPARK-16094][SQL] Support HashAggregateExec for non-par...

2016-06-21 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13802 @hvanhovell oh, I see. okay, I'll check we can implement mutable `ArrayData` and `MapData`. btw, I have some question; 1. Any reason to use `SortAggregateExec` for all the non-partial

[GitHub] spark issue #13802: [SPARK-16094][SQL] Support HashAggregateExec for non-par...

2016-06-22 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13802 As for `supportPartial`, I could understand that `collect` and `hive_udaf` has such a limitation though, how about `AggregateWindowFunction`? It seems these functions `RowNumber` and `Rank` work

[GitHub] spark pull request #13852: [SQL][DOC] Update a description for AggregateFunc...

2016-06-22 Thread maropu
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/13852#discussion_r68104565 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala --- @@ -174,8 +174,8 @@ sealed abstract class

[GitHub] spark issue #13852: [SQL][DOC] Update a description for AggregateFunction#su...

2016-06-22 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13852 @hvanhovell ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13802: [SPARK-16094][SQL] Support HashAggregateExec for non-par...

2016-06-22 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13802 Thanks for your explanation! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #13802: [SPARK-16094][SQL] Support HashAggregateExec for non-par...

2016-06-22 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13802 Is it okay to make a new pr to fix these? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13736: [SPARK-12113][SQL] Add some timing metrics for blocking ...

2016-06-17 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13736 This pr add a new metric as follows; https://cloud.githubusercontent.com/assets/692303/16151541/81d16cf0-34d8-11e6-9ecb-544c6a27d229.png;> --- If your project is set up for it, you can re

[GitHub] spark pull request #13802: [SPARK-16094][SQL] Support HashAggregateExec for ...

2016-06-21 Thread maropu
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/13802#discussion_r67861472 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala --- @@ -457,6 +457,36 @@ class DataFrameAggregateSuite extends QueryTest

[GitHub] spark pull request #13802: [SPARK-16094][SQL] Support HashAggregateExec for ...

2016-06-21 Thread maropu
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/13802 [SPARK-16094][SQL] Support HashAggregateExec for non-partial aggregates ## What changes were proposed in this pull request? The current spark cannot use `HashAggregateExec` for non-partial

[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

2016-06-19 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/10896 @hvanhovell okay, ready to review. After the v2.0 release, plz review this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #13736: [SPARK-12113][SQL] Add some timing metrics for blocking ...

2016-06-19 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13736 @rxin okay, ready to review. After the v2.0 release, plz check this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

2016-06-19 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/10896 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #13736: [SPARK-12113][SQL] Add some timing metrics for blocking ...

2016-06-19 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13736 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #13736: [SPARK-12113][SQL] Add some timing metrics for blocking ...

2016-06-18 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13736 I'm looking into this to fix bugs... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13592: [SPARK-15863][SQL][DOC] Initial SQL programming guide up...

2016-06-17 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13592 @liancheng Is it worth adding two parameters `spark.sql.files.maxPartitionBytes` and `spark.sql.files.openCostInBytes` in `Other Configuration Options`? They are kinds of internal parameters though

[GitHub] spark pull request #12173: [SPARK-13792][SQL] Limit logging of bad records i...

2016-06-20 Thread maropu
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/12173#discussion_r67648173 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CsvUtils.scala --- @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #12173: [SPARK-13792][SQL] Limit logging of bad records in CSVRe...

2016-06-20 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/12173 @rxin yea, the current implementation only holds `malformedLineNum` malformed lines on memory: https://github.com/apache/spark/pull/12173/files#diff-18b09be18156e81f965df293a2781aefR31

[GitHub] spark issue #13847: [SPARK-16135][SQL] Implement hashCode and euqals in Unsa...

2016-06-23 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13847 I'm now checking failed tests... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #13847: [SPARK-16135][SQL] Implement hashCode and euqals in Unsa...

2016-06-22 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13847 Thx, good direction. The current master doesn't throw any exception in an analyzer when map-typed data are passed into `collect_set`/`collect_list`. Probably, should we check the case

[GitHub] spark pull request #13797: Update docs

2016-06-20 Thread maropu
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/13797 Update docs ## What changes were proposed in this pull request? Update docs for two parameters `spark.sql.files.maxPartitionBytes` and `spark.sql.files.openCostInBytes ` in Other Configuration

[GitHub] spark issue #13797: Update docs

2016-06-20 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13797 @liancheng ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13795: [SPARK-13792][SQL] Limit logging of bad records i...

2016-06-20 Thread maropu
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/13795#discussion_r67809544 --- Diff: python/pyspark/sql/readwriter.py --- @@ -392,6 +392,10 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non

[GitHub] spark issue #13592: [SPARK-15863][SQL][DOC] Initial SQL programming guide up...

2016-06-20 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13592 @liancheng okay, I'll do that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #13795: [SPARK-13792][SQL] Limit logging of bad records in CSV d...

2016-06-20 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13795 @rxin okay, lgtm. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13892: [SPARK-16192][SQL] Add type checks in CheckAnalysis

2016-06-24 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13892 @cloud-fan Could you check this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #13852: [SQL][DOC] Update a description for AggregateFunction#su...

2016-06-24 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13852 @hvanhovell ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13892: [SPARK-16192][SQL] Add type checks in CheckAnalys...

2016-06-24 Thread maropu
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/13892 [SPARK-16192][SQL] Add type checks in CheckAnalysis ## What changes were proposed in this pull request? `CollectSet` cannot have map-typed data because MapTypeData does not implement `equals

[GitHub] spark pull request #13847: [SPARK-16135][SQL] Remove hashCode and euqals in ...

2016-06-24 Thread maropu
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/13847#discussion_r68386214 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala --- @@ -115,7 +115,7 @@ class CodeGenerationSuite

[GitHub] spark issue #13847: [SPARK-16135][SQL] Remove hashCode and euqals in ArrayBa...

2016-06-24 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13847 @hvanhovell please check again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

2016-06-16 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/10896 @hvanhovell okay, I'll finish the fix soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #13736: [SPARK-12113][SQL] Add some timing metrics for bl...

2016-06-17 Thread maropu
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/13736 [SPARK-12113][SQL] Add some timing metrics for blocking pipelines ## What changes were proposed in this pull request? This is rework based on #10116 ## How was this patch tested

[GitHub] spark pull request: [SPARK-11780][SQL] Add type aliases backwards ...

2016-01-13 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/10635#issuecomment-171535976 Sorry that yor're confused. The codes in the link can be compiled because `lib/` has `/spark-catalyst_2.10-2.0.0-SNAPSHOT.jar` modified by this patch. So, you

[GitHub] spark pull request: [SPARK-12401][SQL] Add integration tests for p...

2016-01-13 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/10596#issuecomment-171546955 @liancheng @yhuai ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-12476][SQL] Implement JdbcRelation#unha...

2016-01-13 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/10427#issuecomment-171544801 @yhuai ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-2827][GraphX] Add collectDegreeDist to ...

2016-01-13 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/10521#issuecomment-171544621 @andrewor14 ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-11780][SQL] Add type aliases backwards ...

2016-01-13 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/10635#issuecomment-171471897 @marmbrus @marmbrus checked; Codes below can be compiled in v1.5.2 though, they cannot be compile in v.1.6.0. https://github.com/maropu/spark-compat-test/blob

[GitHub] spark pull request: [SPARK-11780][SQL] Add type aliases backwards ...

2016-01-13 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/10635#issuecomment-171478073 I found that aggregation functions such as `Max` and `Min` in ` org.apache.spark.sql.catalyst.expressions.aggregate` has the same issue because the ticket in SPARK

[GitHub] spark pull request: [SPARK-12686][SQL] Support group-by push down ...

2016-01-14 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/10631#issuecomment-171866035 @rxin Could you give me any suggestion on this workaround? https://github.com/apache/spark/pull/10631/files#diff-d99813bd5bbc18277e4090475e4944cfR130 --- If your

[GitHub] spark pull request: [SPARK-12644][SQL] Update parquet reader to be...

2016-01-14 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/10593#issuecomment-171666309 @nongli Great work. One question; we need not have common APIs to realize batched de-serialization for other data sources like ORC? I know we currently have

[GitHub] spark pull request #13847: [SPARK-16135][SQL] Remove hashCode and euqals in ...

2016-06-24 Thread maropu
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/13847#discussion_r6847 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MapData.scala --- @@ -39,4 +39,7 @@ abstract class MapData extends Serializable

[GitHub] spark pull request #13892: [SPARK-16192][SQL] Add type checks in CheckAnalys...

2016-06-24 Thread maropu
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/13892#discussion_r68474189 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -73,9 +73,12 @@ trait CheckAnalysis extends

[GitHub] spark pull request #13852: [SQL][DOC] Update a description for AggregateFunc...

2016-06-24 Thread maropu
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/13852#discussion_r68474108 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala --- @@ -174,8 +174,8 @@ sealed abstract class

[GitHub] spark pull request #13892: [SPARK-16192][SQL] Add type checks in CheckAnalys...

2016-06-24 Thread maropu
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/13892#discussion_r68481381 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -73,9 +73,12 @@ trait CheckAnalysis extends

[GitHub] spark pull request #13892: [SPARK-16192][SQL] Add type checks in CollectSet

2016-06-24 Thread maropu
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/13892#discussion_r68482138 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala --- @@ -435,6 +435,23 @@ class AnalysisErrorSuite

[GitHub] spark pull request #13847: [SPARK-16135][SQL] Remove hashCode and euqals in ...

2016-06-25 Thread maropu
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/13847#discussion_r68490096 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MapData.scala --- @@ -19,6 +19,10 @@ package org.apache.spark.sql.catalyst.util

[GitHub] spark issue #11420: [SPARK-13493][SQL] Enable case sensitiveness in json sch...

2016-06-17 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/11420 @rxin Currently, do we have any policy to handle case-sensitivity inside spark? e.g., `postgresql` always holds names into lower case (https://www.postgresql.org/docs/9.5/static/sql-syntax

[GitHub] spark pull request: [SPARK-12995][GraphX] Remove deprecate APIs fr...

2016-02-07 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/10918#issuecomment-181030073 Ah, we cannot simply remove some tests because these tests target other tests for graph processing. So, we need to replace `mapReduceTriplets` with newer

[GitHub] spark pull request: [SPARK-12995][GraphX] Remove deprecate APIs fr...

2016-02-07 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/10918#issuecomment-181026937 Welcome back. We cannot remove `mapReduceTripelets` because some tests in graphx still use it. So, I'll make another pr to fix the tests, then remove

[GitHub] spark pull request: [SPARK-12995][GraphX] Remove deprecate APIs fr...

2016-02-07 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/10918#issuecomment-181026984 The position of the mima entry fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-12476][SQL] Implement JdbcRelation#unha...

2016-02-07 Thread maropu
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/10427#discussion_r52120299 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala --- @@ -90,6 +90,11 @@ private[sql] case class

[GitHub] spark pull request: [SPARK-13057][SQL] Add benchmark codes and the...

2016-02-07 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/10965#issuecomment-181048306 @nongli ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-2827][GraphX] Add collectDegreeDist to ...

2016-02-07 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/10521#issuecomment-181048261 @ankurdave ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-11691][SQL] Allow to specify compressio...

2016-02-09 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/9657#issuecomment-181786960 @zjffdu ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

<    1   2   3   4   5   6   7   8   9   10   >