[GitHub] spark pull request: [SPARK-15463][SQL] support creating dataframe ...

2016-05-27 Thread xwu0226
Github user xwu0226 commented on the pull request: https://github.com/apache/spark/pull/13300#issuecomment-46869 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feat

[GitHub] spark pull request: [SPARK-15463][SQL] support creating dataframe ...

2016-05-26 Thread xwu0226
Github user xwu0226 commented on the pull request: https://github.com/apache/spark/pull/13300#issuecomment-222043096 @rxin Please help double check! Many thanks!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your pr

[GitHub] spark pull request: [SPARK-15463][SQL] support creating dataframe ...

2016-05-25 Thread xwu0226
Github user xwu0226 commented on a diff in the pull request: https://github.com/apache/spark/pull/13300#discussion_r64694941 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVRelation.scala --- @@ -142,6 +145,75 @@ object CSVRelation extends Loggi

[GitHub] spark pull request: [SPARK-15463][SQL] support creating dataframe ...

2016-05-25 Thread xwu0226
Github user xwu0226 commented on a diff in the pull request: https://github.com/apache/spark/pull/13300#discussion_r64694834 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -42,16 +42,23 @@ private[csv] object CSVInferSc

[GitHub] spark pull request: [SPARK-15463][SQL] support creating dataframe ...

2016-05-25 Thread xwu0226
Github user xwu0226 commented on the pull request: https://github.com/apache/spark/pull/13300#issuecomment-221757824 @rxin OK. Thanks! will do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not hav

[GitHub] spark pull request: [SPARK-15463][SQL] support creating dataframe ...

2016-05-25 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/13300#issuecomment-221757608 Let's not add so many new APIs. You can just add the Dataset[String] one, since RDD[String] can be easily converted into Dataset. --- If your project is set up for it, y

[GitHub] spark pull request: [SPARK-15463][SQL] support creating dataframe ...

2016-05-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/13300#issuecomment-221755623 @xwu0226 Ah, for example, it seems a new method, `CSVInferSchema.inferSchemaFromRDD(..)`, seems introduced here while `CSVInferSchema.infer(...)` is already impleme

[GitHub] spark pull request: [SPARK-15463][SQL] support creating dataframe ...

2016-05-25 Thread xwu0226
Github user xwu0226 commented on the pull request: https://github.com/apache/spark/pull/13300#issuecomment-221754281 @HyukjinKwon Let me try to understand your question. Right now, we have `csv.DefaultSource` implementing `DataSourceRegister` and `csv.CSVRelation` contains some parsi

[GitHub] spark pull request: [SPARK-15463][SQL] support creating dataframe ...

2016-05-25 Thread xwu0226
Github user xwu0226 commented on the pull request: https://github.com/apache/spark/pull/13300#issuecomment-221748545 @maropu The API that converts Dataset[String] to DataFrame is using the one for RDD[String]. So i am thinking it could be beneficial to provide both there? --- If your

[GitHub] spark pull request: [SPARK-15463][SQL] support creating dataframe ...

2016-05-25 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/13300#issuecomment-221747864 Do we still need the interface for RDD[String]? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your p

[GitHub] spark pull request: [SPARK-15463][SQL] support creating dataframe ...

2016-05-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/13300#issuecomment-221744128 cc @rxin (@xwu0226 For me, I think it might be better if the structure of CSV and JSON data sources have the same structure so that we can fix up similar i

[GitHub] spark pull request: [SPARK-15463][SQL] support creating dataframe ...

2016-05-25 Thread xwu0226
Github user xwu0226 commented on the pull request: https://github.com/apache/spark/pull/13300#issuecomment-221703986 Yes. I am adding the Dataset[String] API also. will push soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-15463][SQL] support creating dataframe ...

2016-05-25 Thread pjfanning
Github user pjfanning commented on the pull request: https://github.com/apache/spark/pull/13300#issuecomment-221699965 Takeshi Yamamuro suggested on https://issues.apache.org/jira/browse/SPARK-15463 that the new API should take a Dataset[String] as input instead of an RDD[String] --

[GitHub] spark pull request: [SPARK-15463][SQL] support creating dataframe ...

2016-05-25 Thread xwu0226
Github user xwu0226 commented on the pull request: https://github.com/apache/spark/pull/13300#issuecomment-221661115 @HyukjinKwon @falaki Could you review the PR? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request: [SPARK-15463][SQL] support creating dataframe ...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13300#issuecomment-221654079 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your p

[GitHub] spark pull request: [SPARK-15463][SQL] support creating dataframe ...

2016-05-25 Thread xwu0226
GitHub user xwu0226 opened a pull request: https://github.com/apache/spark/pull/13300 [SPARK-15463][SQL] support creating dataframe out of RDD[String] for csv data ## What changes were proposed in this pull request? Currently only `DataFrameReader.json(rdd: RDD[String]): DataFra