[jira] [Updated] (SPARK-10705) Stop converting internal rows to external rows in DataFrame.toJSON

2015-09-22 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10705: --- Assignee: Liang-Chi Hsieh > Stop converting internal rows to external rows in DataFrame.toJ

[jira] [Updated] (SPARK-10310) [Spark SQL] All result records will be popluated into ONE line during the script transform due to missing the correct line/filed delimiter

2015-09-21 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10310: --- Assignee: zhichao-li > [Spark SQL] All result records will be popluated into ONE line dur

[jira] [Resolved] (SPARK-10495) For json data source, date values are saved as int strings

2015-09-21 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-10495. Resolution: Fixed Fix Version/s: 1.5.1 1.6.0 Issue resolved by pull

[jira] [Commented] (SPARK-8118) Turn off noisy log output produced by Parquet 1.7.0

2015-09-21 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900941#comment-14900941 ] Cheng Lian commented on SPARK-8118: --- I believe PARQUET-369 is the root cause of this issue

Re: spark + parquet + schema name and metadata

2015-09-21 Thread Cheng Lian
Currently Spark SQL doesn't support customizing schema name and metadata. May I know why these two matters in your use case? Some Parquet data models, like parquet-avro, do support it, while some others don't (e.g. parquet-hive). Cheng On 9/21/15 7:13 AM, Borisa Zivkovic wrote: Hi, I am

[jira] [Updated] (SPARK-10449) StructType.merge shouldn't merge DecimalTypes with different precisions and/or scales

2015-09-18 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10449: --- Assignee: holdenk > StructType.merge shouldn't merge DecimalTypes with different precisi

[jira] [Resolved] (SPARK-10449) StructType.merge shouldn't merge DecimalTypes with different precisions and/or scales

2015-09-18 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-10449. Resolution: Fixed Fix Version/s: 1.5.1 1.6.0 Issue resolved by pull

[jira] [Created] (SPARK-10705) Stop converting internal rows to external rows in DataFrame.toJSON

2015-09-18 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-10705: -- Summary: Stop converting internal rows to external rows in DataFrame.toJSON Key: SPARK-10705 URL: https://issues.apache.org/jira/browse/SPARK-10705 Project: Spark

[jira] [Commented] (SPARK-10681) DateTimeUtils needs a method to parse string to SQL's timestamp value

2015-09-18 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876155#comment-14876155 ] Cheng Lian commented on SPARK-10681: {{DateTimeUtils}} is also used to read timestamps, which may

Re: parquet error

2015-09-18 Thread Cheng Lian
Not sure what's happening here, but I guess it's probably a dependency version issue. Could you please give vanilla Apache Spark a try to see whether its a CDH specific issue or not? Cheng On 9/17/15 11:44 PM, Chengi Liu wrote: Hi, I did some digging.. I believe the error is caused by

[jira] [Updated] (SPARK-10623) NoSuchElementException thrown when ORC predicate push-down is turned on

2015-09-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10623: --- Summary: NoSuchElementException thrown when ORC predicate push-down is turned on (was: turning

[jira] [Updated] (SPARK-10623) NoSuchElementException thrown when ORC predicate push-down is turned on

2015-09-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10623: --- Description: Turning on predicate pushdown for ORC datasources results

[jira] [Updated] (SPARK-10623) NoSuchElementException thrown when ORC predicate push-down is turned on

2015-09-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10623: --- Affects Version/s: 1.4.0 1.4.1 1.5.0

[jira] [Updated] (SPARK-10623) NoSuchElementException thrown when ORC predicate push-down is turned on

2015-09-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10623: --- Priority: Blocker (was: Major) > NoSuchElementException thrown when ORC predicate push-d

[jira] [Commented] (SPARK-10591) False negative in QueryTest.checkAnswer

2015-09-15 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745200#comment-14745200 ] Cheng Lian commented on SPARK-10591: Updated JIRA description. Actually we are handling {{NaN

[jira] [Updated] (SPARK-10591) False negative in QueryTest.checkAnswer

2015-09-15 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10591: --- Description: {{checkAnswer}} doesn't handle {{Map\[K, V\]}} properly. For example: {noformat} scala

[jira] [Created] (SPARK-10588) Saving a DataFrame containing only nulls to JSON doesn't work

2015-09-14 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-10588: -- Summary: Saving a DataFrame containing only nulls to JSON doesn't work Key: SPARK-10588 URL: https://issues.apache.org/jira/browse/SPARK-10588 Project: Spark

[jira] [Created] (SPARK-10591) False negative in QueryTest.checkAnswer

2015-09-14 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-10591: -- Summary: False negative in QueryTest.checkAnswer Key: SPARK-10591 URL: https://issues.apache.org/jira/browse/SPARK-10591 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-10591) False negative in QueryTest.checkAnswer

2015-09-14 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10591: --- Description: # For double and float, {{NaN == NaN}} is always {{false}} # {{checkAnswer}} doesn't

[jira] [Updated] (PARQUET-371) Bumps Thrift version to 0.9.0

2015-09-11 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated PARQUET-371: --- Summary: Bumps Thrift version to 0.9.0 (was: Add thrift9 Maven profile for parquet-format) > Bu

[jira] [Updated] (SPARK-10301) For struct type, if parquet's global schema has less fields than a file's schema, data reading will fail

2015-09-11 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10301: --- Description: We hit this issue when reading a complex Parquet dateset without turning on schema

[jira] [Updated] (PARQUET-371) Bumps Thrift version to 0.9.0

2015-09-11 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated PARQUET-371: --- Description: Thrift 0.7.0 is too old a version, and it doesn't compile on Mac. Would be nice to bump

[jira] [Updated] (SPARK-10301) For struct type, if parquet's global schema has less fields than a file's schema, data reading will fail

2015-09-11 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10301: --- Description: We hit this issue when reading a complex Parquet dateset without turning on schema

[jira] [Updated] (SPARK-9424) Document recent Parquet changes in Spark 1.5

2015-09-11 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-9424: -- Target Version/s: 1.5.0 (was: 1.5.1) Fix Version/s: 1.5.0 > Document recent Parquet chan

[jira] [Resolved] (SPARK-9424) Document recent Parquet changes in Spark 1.5

2015-09-11 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-9424. --- Resolution: Fixed Resolved by https://github.com/apache/spark/pull/8467 [~srowen] Actually this one

[jira] [Resolved] (SPARK-10472) UserDefinedType.typeName gives wrong result

2015-09-11 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-10472. Resolution: Fixed Resolved by https://github.com/apache/spark/pull/8640

[jira] [Updated] (SPARK-10472) UserDefinedType.typeName gives wrong result

2015-09-11 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10472: --- Fix Version/s: 1.6.0 > UserDefinedType.typeName gives wrong res

[jira] [Comment Edited] (PARQUET-370) Nested records are not properly read if none of their fields are requested

2015-09-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734568#comment-14734568 ] Cheng Lian edited comment on PARQUET-370 at 9/10/15 11:43 AM: -- A complete

Re: Spark-shell throws Hive error when SQLContext.parquetFile, v1.3

2015-09-10 Thread Cheng Lian
If you don't need to interact with Hive, you may compile Spark without using the -Phive flag to eliminate Hive dependencies. In this way, the sqlContext instance in Spark shell will be of type SQLContext instead of HiveContext. The reason behind the Hive metastore error is probably due to

[jira] [Commented] (PARQUET-371) Add thrift9 Maven profile for parquet-format

2015-09-10 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740068#comment-14740068 ] Cheng Lian commented on PARQUET-371: That would be even nicer. I'll update my PR. > Add thri

Re: How to read compressed parquet file

2015-09-09 Thread Cheng Lian
You need to use "har://" instead of "hdfs://" to read HAR files. Just tested against Spark 1.5, and it works as expected. Cheng On 9/9/15 3:29 PM, 李铖 wrote: I think too many parquet files may be affect reading capability,so I use hadoop archive to combine them,but

[jira] [Created] (PARQUET-371) Add thrift9 Maven profile for parquet-format

2015-09-09 Thread Cheng Lian (JIRA)
Cheng Lian created PARQUET-371: -- Summary: Add thrift9 Maven profile for parquet-format Key: PARQUET-371 URL: https://issues.apache.org/jira/browse/PARQUET-371 Project: Parquet Issue Type

Re: Split content into multiple Parquet files

2015-09-08 Thread Cheng Lian
In Spark 1.4 and 1.5, you can do something like this: df.write.partitionBy("key").parquet("/datasink/output-parquets") BTW, I'm curious about how did you do it without partitionBy using saveAsHadoopFile? Cheng On 9/8/15 2:34 PM, Adrien Mogenet wrote: Hi there, We've spent several hours to

[jira] [Commented] (PARQUET-370) Nested records are not properly read if none of their fields are requested

2015-09-08 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734568#comment-14734568 ] Cheng Lian commented on PARQUET-370: A complete sample code for reproducing this issue against

Re: Parquet Array Support Broken?

2015-09-08 Thread Cheng Lian
f the file is created in Spark On Mon, Sep 7, 2015 at 3:06 PM, Ruslan Dautkhanov <dautkha...@gmail.com <mailto:dautkha...@gmail.com>> wrote: Read response from Cheng Lian <lian.cs@gmail.com <mailto:lian.cs@gmail.com>> on Aug/27th - it looks the same prob

Re: Design docs

2015-09-08 Thread Cheng Lian
Thrift as the good media to do this? Something like: https://github.com/adobe-research/spark-parquet-thrift-example On Tue, Sep 8, 2015 at 10:59 AM, Cheng Lian <lian.cs@gmail.com> wrote: Parquet only provides a limited set of types as building blocks. Although we can add more original

[jira] [Resolved] (SPARK-9170) ORC data source creates a schema with lowercase table names

2015-09-08 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-9170. --- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 7520 [https://github.com

[jira] [Created] (SPARK-10472) UserDefinedType.typeName gives wrong result

2015-09-07 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-10472: -- Summary: UserDefinedType.typeName gives wrong result Key: SPARK-10472 URL: https://issues.apache.org/jira/browse/SPARK-10472 Project: Spark Issue Type: Bug

[jira] [Commented] (PARQUET-369) Shading SLF4J prevents SLF4J locating org.slf4j.impl.StaticLoggerBinder

2015-09-07 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14733506#comment-14733506 ] Cheng Lian commented on PARQUET-369: Here is a more concrete version in another thread https

[jira] [Updated] (PARQUET-369) Shading SLF4J prevents SLF4J locating org.slf4j.impl.StaticLoggerBinder

2015-09-07 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated PARQUET-369: --- Description: Parquet-format shades SLF4J to {{parquet.org.slf4j}} (see [here|https://github.com

[jira] [Resolved] (SPARK-10434) Parquet compatibility with 1.4 is broken when writing arrays that may contain nulls

2015-09-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-10434. Resolution: Fixed Fix Version/s: 1.5.1 1.6.0 Issue resolved by pull

[jira] [Updated] (SPARK-10434) Parquet compatibility with 1.4 is broken when writing arrays that may contain nulls

2015-09-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10434: --- Target Version/s: 1.6.0, 1.5.1 (was: 1.5.0, 1.5.1) > Parquet compatibility with 1.4 is broken w

[jira] [Created] (PARQUET-369) Shading SLF4J prevents SLF4J locating org.slf4j.impl.StaticLoggerBinder

2015-09-05 Thread Cheng Lian (JIRA)
Cheng Lian created PARQUET-369: -- Summary: Shading SLF4J prevents SLF4J locating org.slf4j.impl.StaticLoggerBinder Key: PARQUET-369 URL: https://issues.apache.org/jira/browse/PARQUET-369 Project: Parquet

[jira] [Commented] (SPARK-10442) select cast('false' as boolean) returns true

2015-09-04 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730657#comment-14730657 ] Cheng Lian commented on SPARK-10442: The reason is that all non-empty strings are converted to {{true

[jira] [Updated] (SPARK-10310) [Spark SQL] All result records will be popluated into ONE line during the script transform due to missing the correct line/filed delimeter

2015-09-04 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10310: --- Description: There is real case using python stream script in Spark SQL query. We found that all

Re: Parquet partitioning for unique identifier

2015-09-04 Thread Cheng Lian
(valueContainsNull = false) |-- imp2: map (nullable = true) ||-- key: string ||-- value: double (valueContainsNull = false) |-- imp3: map (nullable = true) ||-- key: string ||-- value: double (valueContainsNull = false) On Thu, Sep 3, 2015 at 11:27 PM, Cheng Lian <lian

[jira] [Created] (SPARK-10448) Parquet schema merging should NOT merge UDT

2015-09-04 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-10448: -- Summary: Parquet schema merging should NOT merge UDT Key: SPARK-10448 URL: https://issues.apache.org/jira/browse/SPARK-10448 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-10449) StructType.merge shouldn't merge DecimalTypes with different precisions and/or scales

2015-09-04 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-10449: -- Summary: StructType.merge shouldn't merge DecimalTypes with different precisions and/or scales Key: SPARK-10449 URL: https://issues.apache.org/jira/browse/SPARK-10449

Re: Parquet partitioning for unique identifier

2015-09-04 Thread Cheng Lian
Could you please provide the full stack track of the OOM exception? Another common case of Parquet OOM is super wide tables, say hundred or thousands of columns. And in this case, the number of rows is mostly irrelevant. Cheng On 9/4/15 1:24 AM, Kohki Nishio wrote: let's say I have a data

[jira] [Commented] (SPARK-10434) Parquet compatibility with 1.4 is broken when writing arrays that may contain nulls

2015-09-04 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730388#comment-14730388 ] Cheng Lian commented on SPARK-10434: Another concern of mine is that, Parquet files written

[jira] [Updated] (PARQUET-364) Parquet-avro cannot decode Avro/Thrift array of primitive array (e.g. array<array>)

2015-09-04 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated PARQUET-364: --- Summary: Parquet-avro cannot decode Avro/Thrift array of primitive array (e.g. array<ar

[jira] [Created] (SPARK-10434) Parquet compatibility with 1.4 is broken when writing arrays that may contain nulls

2015-09-03 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-10434: -- Summary: Parquet compatibility with 1.4 is broken when writing arrays that may contain nulls Key: SPARK-10434 URL: https://issues.apache.org/jira/browse/SPARK-10434

[jira] [Commented] (SPARK-10434) Parquet compatibility with 1.4 is broken when writing arrays that may contain nulls

2015-09-03 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730326#comment-14730326 ] Cheng Lian commented on SPARK-10434: It's true that in general forwards-compatibility is not easy

Re: [ compress in-memory column storage used in sparksql cache table ]

2015-09-02 Thread Cheng Lian
Yeah, two of the reasons why the built-in in-memory columnar storage doesn't achieve comparable compression ratio as Parquet are: 1. The in-memory columnar representation doesn't handle nested types. So array/map/struct values are not compressed. 2. Parquet may use more than one kind of

Re: Schema From parquet file

2015-09-01 Thread Cheng Lian
What exactly do you mean by "get schema from a parquet file"? - If you are trying to inspect Parquet files, parquet-tools can be pretty neat: https://github.com/Parquet/parquet-mr/issues/321 - If you are trying to get Parquet schema of Parquet MessageType, you may resort to readFooterX() and

Re: Group by specific key and save as parquet

2015-09-01 Thread Cheng Lian
Starting from Spark 1.4, you can do this via dynamic partitioning: sqlContext.table("trade").write.partitionBy("date").parquet("/tmp/path") Cheng On 9/1/15 8:27 AM, gtinside wrote: Hi , I have a set of data, I need to group by specific key and then save as parquet. Refer to the code snippet

[jira] [Created] (SPARK-10395) Simplify CatalystReadSupport

2015-09-01 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-10395: -- Summary: Simplify CatalystReadSupport Key: SPARK-10395 URL: https://issues.apache.org/jira/browse/SPARK-10395 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-10400) Rename or deprecate SQL option "spark.sql.parquet.followParquetFormatSpec"

2015-09-01 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-10400: -- Summary: Rename or deprecate SQL option "spark.sql.parquet.followParquetFormatSpec" Key: SPARK-10400 URL: https://issues.apache.org/jira/browse/SPARK-10400

[jira] [Issue Comment Deleted] (SPARK-10289) A direct write API for testing Parquet compatibility

2015-08-31 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10289: --- Comment: was deleted (was: User 'liancheng' has created a pull request for this issue: https

[jira] [Updated] (SPARK-10289) A direct write API for testing Parquet compatibility

2015-08-31 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10289: --- Fix Version/s: 1.6.0 > A direct write API for testing Parquet compatibil

[jira] [Resolved] (SPARK-10289) A direct write API for testing Parquet compatibility

2015-08-31 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-10289. Resolution: Fixed Resolved by https://github.com/apache/spark/pull/8454 > A direct write

[jira] [Updated] (SPARK-10365) Support Parquet logical type TIMESTAMP_MICROS

2015-08-31 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10365: --- Target Version/s: (was: 1.6.0) > Support Parquet logical type TIMESTAMP_MIC

[jira] [Updated] (SPARK-10365) Support Parquet logical type TIMESTAMP_MICROS

2015-08-31 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10365: --- Summary: Support Parquet logical type TIMESTAMP_MICROS (was: Support Parquet logical types

[jira] [Created] (SPARK-10367) Support Parquet logical type INTERVAL

2015-08-31 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-10367: -- Summary: Support Parquet logical type INTERVAL Key: SPARK-10367 URL: https://issues.apache.org/jira/browse/SPARK-10367 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-10366) Support Parquet logical type DATE

2015-08-31 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-10366: -- Summary: Support Parquet logical type DATE Key: SPARK-10366 URL: https://issues.apache.org/jira/browse/SPARK-10366 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-10365) Support Parquet logical types TIMESTAMP_MILLIS and TIMESTAMP_MICROS

2015-08-31 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-10365: -- Summary: Support Parquet logical types TIMESTAMP_MILLIS and TIMESTAMP_MICROS Key: SPARK-10365 URL: https://issues.apache.org/jira/browse/SPARK-10365 Project: Spark

[jira] [Created] (SPARK-10364) Support Parquet logical type TIMESTAMP_MILLIS

2015-08-31 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-10364: -- Summary: Support Parquet logical type TIMESTAMP_MILLIS Key: SPARK-10364 URL: https://issues.apache.org/jira/browse/SPARK-10364 Project: Spark Issue Type: Sub

[jira] [Updated] (SPARK-8824) Support Parquet time related logical types

2015-08-31 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-8824: -- Summary: Support Parquet time related logical types (was: Support Parquet logical types

[jira] [Updated] (SPARK-10365) Support Parquet logical type TIMESTAMP_MICROS

2015-08-31 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10365: --- Description: Didn't assign target version for this ticket because neither the most recent parquet

[jira] [Updated] (SPARK-10301) For struct type, if parquet's global schema has less fields than a file's schema, data reading will fail

2015-08-30 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10301: --- Description: We hit this issue when reading a complex Parquet dateset without turning on schema

[jira] [Updated] (SPARK-10301) For struct type, if parquet's global schema has less fields than a file's schema, data reading will fail

2015-08-30 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10301: --- Description: We hit this issue when reading a complex Parquet dateset without turning on schema

[jira] [Commented] (SPARK-10301) For struct type, if parquet's global schema has less fields than a file's schema, data reading will fail

2015-08-30 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721447#comment-14721447 ] Cheng Lian commented on SPARK-10301: Updated ticket description to provide a more

[jira] [Created] (PARQUET-367) parquet-cat -j doesn't show all records

2015-08-27 Thread Cheng Lian (JIRA)
Cheng Lian created PARQUET-367: -- Summary: parquet-cat -j doesn't show all records Key: PARQUET-367 URL: https://issues.apache.org/jira/browse/PARQUET-367 Project: Parquet Issue Type: Bug

[jira] [Created] (SPARK-10321) OrcRelation doesn't override sizeInBytes

2015-08-27 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-10321: -- Summary: OrcRelation doesn't override sizeInBytes Key: SPARK-10321 URL: https://issues.apache.org/jira/browse/SPARK-10321 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-9170) ORC data source creates a schema with lowercase table names

2015-08-27 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-9170: -- Assignee: Liang-Chi Hsieh ORC data source creates a schema with lowercase table names

[jira] [Updated] (SPARK-9424) Document recent Parquet changes in Spark 1.5

2015-08-26 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-9424: -- Description: Specifically, the following changes need to be documented/explained: - Metadata discovery

[jira] [Created] (SPARK-10289) A direct write API for testing Parquet compatibility

2015-08-26 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-10289: -- Summary: A direct write API for testing Parquet compatibility Key: SPARK-10289 URL: https://issues.apache.org/jira/browse/SPARK-10289 Project: Spark Issue Type

[jira] [Updated] (SPARK-10289) A direct write API for testing Parquet compatibility

2015-08-26 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10289: --- Issue Type: Sub-task (was: Test) Parent: SPARK-6774 A direct write API for testing Parquet

[jira] [Updated] (SPARK-10289) A direct write API for testing Parquet compatibility

2015-08-26 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10289: --- Description: Due to a set of unfortunate historical issues, it's relatively hard to achieve full

Re: Spark 1.3.1 saveAsParquetFile hangs on app exit

2015-08-26 Thread Cheng Lian
Could you please show jstack result of the hanged process? Thanks! Cheng On 8/26/15 10:46 PM, cingram wrote: I have a simple test that is hanging when using s3a with spark 1.3.1. Is there something I need to do to cleanup the S3A file system? The write to S3 appears to have worked but this job

[jira] [Commented] (SPARK-10229) Wrong jline dependency when compiled against Scala 2.11

2015-08-25 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711646#comment-14711646 ] Cheng Lian commented on SPARK-10229: Sorry, I was using {{-Pscala-2.11}}. Thanks

[jira] [Created] (SPARK-10229) Wrong jline dependency when compiled against Scala 2.11

2015-08-25 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-10229: -- Summary: Wrong jline dependency when compiled against Scala 2.11 Key: SPARK-10229 URL: https://issues.apache.org/jira/browse/SPARK-10229 Project: Spark Issue

[jira] [Resolved] (SPARK-10229) Wrong jline dependency when compiled against Scala 2.11

2015-08-25 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-10229. Resolution: Not A Problem I was using {{-Pscala-2.11}} since {{scala-2.11}} is a POM profile

[jira] [Resolved] (SPARK-10177) Parquet support interprets timestamp values differently from Hive 0.14.0+

2015-08-25 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-10177. Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 8400 [https

[jira] [Resolved] (SPARK-10197) Add null check in wrapperFor (inside HiveInspectors).

2015-08-25 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-10197. Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 8407 [https

[jira] [Commented] (HIVE-6394) Implement Timestmap in ParquetSerde

2015-08-25 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/HIVE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711046#comment-14711046 ] Cheng Lian commented on HIVE-6394: -- While testing Spark SQL 1.5-SNAPSHOT for Parquet/Hive

[jira] [Updated] (SPARK-10177) Parquet support interprets timestamp values differently from Hive 0.14.0+

2015-08-24 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10177: --- Description: Running the following SQL under Hive 0.14.0+ (tested against 0.14.0 and 1.2.1

[jira] [Commented] (SPARK-10177) Parquet support interprets timestamp values differently from Hive 0.14.0+

2015-08-24 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709430#comment-14709430 ] Cheng Lian commented on SPARK-10177: I'm not quite familiar with Julian date

[jira] [Updated] (SPARK-10177) Parquet support interprets timestamp values differently from Hive 0.14.0+

2015-08-24 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10177: --- Description: Running the following SQL under Hive 0.14.0+ (tested against 0.14.0 and 1.2.1

[jira] [Commented] (SPARK-10177) Parquet support interprets timestamp values differently from Hive 0.14.0+

2015-08-24 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708957#comment-14708957 ] Cheng Lian commented on SPARK-10177: [~davies] I'm not sure whether

[jira] [Updated] (SPARK-10177) Parquet support interprets timestamp values differently from Hive 0.14.0+

2015-08-24 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10177: --- Description: Running the following SQL under Hive 0.14.0+ (tested against 0.14.0 and 1.2.1

[jira] [Created] (HIVE-11625) Map instances with null keys are not written to Parquet tables

2015-08-24 Thread Cheng Lian (JIRA)
Cheng Lian created HIVE-11625: - Summary: Map instances with null keys are not written to Parquet tables Key: HIVE-11625 URL: https://issues.apache.org/jira/browse/HIVE-11625 Project: Hive Issue

[jira] [Updated] (SPARK-10177) Parquet support interprets timestamp values differently from Hive 0.14.0+

2015-08-24 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10177: --- Description: Running the following SQL under Hive 0.14.0+ (tested against 0.14.0 and 1.2.1

[jira] [Updated] (SPARK-10177) Parquet support interprets timestamp values differently from Hive 0.14.0+

2015-08-24 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10177: --- Description: Running the following SQL under Hive 0.14.0+ (tested against 0.14.0 and 1.2.1

[jira] [Updated] (SPARK-10177) Parquet support interprets timestamp values differently from Hive 0.14.0+

2015-08-24 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-10177: --- Attachment: 00_0 Attached the Parquet file generated by the Hive 0.14.0 SQL statement mentioned

[jira] [Updated] (HIVE-11625) Map instances with null keys are not properly handled for Parquet tables

2015-08-24 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/HIVE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated HIVE-11625: -- Description: Hive allows maps with null keys: {code:sql} hive select map(null, 'foo', 1, 'bar', null

[jira] [Commented] (HIVE-11625) Map instances with null keys are not properly handled for Parquet tables

2015-08-24 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/HIVE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709076#comment-14709076 ] Cheng Lian commented on HIVE-11625: --- Sorry, according to the following statements

[jira] [Commented] (HIVE-11625) Map instances with null keys are not properly handled for Parquet tables

2015-08-24 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/HIVE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709081#comment-14709081 ] Cheng Lian commented on HIVE-11625: --- Updated ticket description according to my comment

[jira] [Commented] (HIVE-11625) Map instances with null keys are not written to Parquet tables

2015-08-24 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/HIVE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708885#comment-14708885 ] Cheng Lian commented on HIVE-11625: --- I meant to open this issue as a Parquet bug

[jira] [Updated] (SPARK-8580) Test Parquet interoperability and compatibility with other libraries/systems

2015-08-24 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-8580: -- Summary: Test Parquet interoperability and compatibility with other libraries/systems (was: Add

<    6   7   8   9   10   11   12   13   14   15   >