Github user xwu0226 commented on the pull request:
https://github.com/apache/spark/pull/13300#issuecomment-46869
test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feat
Github user xwu0226 commented on the pull request:
https://github.com/apache/spark/pull/13300#issuecomment-222043096
@rxin Please help double check! Many thanks!!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your pr
Github user xwu0226 commented on a diff in the pull request:
https://github.com/apache/spark/pull/13300#discussion_r64694941
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVRelation.scala
---
@@ -142,6 +145,75 @@ object CSVRelation extends Loggi
Github user xwu0226 commented on a diff in the pull request:
https://github.com/apache/spark/pull/13300#discussion_r64694834
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
---
@@ -42,16 +42,23 @@ private[csv] object CSVInferSc
Github user xwu0226 commented on the pull request:
https://github.com/apache/spark/pull/13300#issuecomment-221757824
@rxin OK. Thanks! will do.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not hav
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/13300#issuecomment-221757608
Let's not add so many new APIs. You can just add the Dataset[String] one,
since RDD[String] can be easily converted into Dataset.
---
If your project is set up for it, y
Github user HyukjinKwon commented on the pull request:
https://github.com/apache/spark/pull/13300#issuecomment-221755623
@xwu0226 Ah, for example, it seems a new method,
`CSVInferSchema.inferSchemaFromRDD(..)`, seems introduced here while
`CSVInferSchema.infer(...)` is already impleme
Github user xwu0226 commented on the pull request:
https://github.com/apache/spark/pull/13300#issuecomment-221754281
@HyukjinKwon Let me try to understand your question. Right now, we have
`csv.DefaultSource` implementing `DataSourceRegister` and `csv.CSVRelation`
contains some parsi
Github user xwu0226 commented on the pull request:
https://github.com/apache/spark/pull/13300#issuecomment-221748545
@maropu The API that converts Dataset[String] to DataFrame is using the one
for RDD[String]. So i am thinking it could be beneficial to provide both there?
---
If your
Github user maropu commented on the pull request:
https://github.com/apache/spark/pull/13300#issuecomment-221747864
Do we still need the interface for RDD[String]?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your p
Github user HyukjinKwon commented on the pull request:
https://github.com/apache/spark/pull/13300#issuecomment-221744128
cc @rxin
(@xwu0226 For me, I think it might be better if the structure of CSV and
JSON data sources have the same structure so that we can fix up similar i
Github user xwu0226 commented on the pull request:
https://github.com/apache/spark/pull/13300#issuecomment-221703986
Yes. I am adding the Dataset[String] API also. will push soon.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user pjfanning commented on the pull request:
https://github.com/apache/spark/pull/13300#issuecomment-221699965
Takeshi Yamamuro suggested on
https://issues.apache.org/jira/browse/SPARK-15463 that the new API should take
a Dataset[String] as input instead of an RDD[String]
--
Github user xwu0226 commented on the pull request:
https://github.com/apache/spark/pull/13300#issuecomment-221661115
@HyukjinKwon @falaki Could you review the PR? Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/13300#issuecomment-221654079
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your p
GitHub user xwu0226 opened a pull request:
https://github.com/apache/spark/pull/13300
[SPARK-15463][SQL] support creating dataframe out of RDD[String] for csv
data
## What changes were proposed in this pull request?
Currently only `DataFrameReader.json(rdd: RDD[String]): DataFra
16 matches
Mail list logo