[
https://issues.apache.org/jira/browse/PARQUET-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646980#comment-17646980
]
ASF GitHub Bot commented on PARQUET-2075:
-
wgtmac opened a new pull request, #1014:
URL:
wgtmac opened a new pull request, #1014:
URL: https://github.com/apache/parquet-mr/pull/1014
### Jira
- This patch aims to solve the first step of
[PARQUET-2075](https://issues.apache.org/jira/browse/PARQUET-2075).
### Tests
- Make sure all tasks pass, especially
>
> i think there's a good case for turning it on as (a) there are lots of
> other filesystems out there, including NTFS on windows laptops, *and*
> there's the risk of corruption of data in flight from the hdfs data node
> processes where the CRC checks place and the actual reader code.
Yep,
[
https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646943#comment-17646943
]
ASF GitHub Bot commented on PARQUET-2159:
-
jatin-bhateja commented on code in PR #1011:
URL:
jatin-bhateja commented on code in PR #1011:
URL: https://github.com/apache/parquet-mr/pull/1011#discussion_r1048036480
##
parquet-encoding/src/main/java/org/apache/parquet/column/values/bitpacking/BytePacker.java:
##
@@ -105,4 +116,16 @@ public void unpack8Values(final byte[]
[
https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646941#comment-17646941
]
ASF GitHub Bot commented on PARQUET-2159:
-
jatin-bhateja commented on code in PR #1011:
URL:
jatin-bhateja commented on code in PR #1011:
URL: https://github.com/apache/parquet-mr/pull/1011#discussion_r1048036480
##
parquet-encoding/src/main/java/org/apache/parquet/column/values/bitpacking/BytePacker.java:
##
@@ -105,4 +116,16 @@ public void unpack8Values(final byte[]
[
https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646899#comment-17646899
]
ASF GitHub Bot commented on PARQUET-2159:
-
jiangjiguang commented on PR #1011:
URL:
jiangjiguang commented on PR #1011:
URL: https://github.com/apache/parquet-mr/pull/1011#issuecomment-1350287549
> This work looks promising! It would be great if you can add some
micro-benchmark to parquet-benchmarks.
@wgtmac I have add the micro-benchmark to parquet-benchmarks, this
[
https://issues.apache.org/jira/browse/PARQUET-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646685#comment-17646685
]
ASF GitHub Bot commented on PARQUET-2218:
-
mapleFU commented on PR #188:
URL:
mapleFU commented on PR #188:
URL: https://github.com/apache/parquet-format/pull/188#issuecomment-1348743274
The change looks good to me! Thanks a lot!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
[
https://issues.apache.org/jira/browse/PARQUET-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646675#comment-17646675
]
ASF GitHub Bot commented on PARQUET-1539:
-
pitrou commented on PR #126:
URL:
pitrou commented on PR #126:
URL: https://github.com/apache/parquet-format/pull/126#issuecomment-1348718160
I opened https://github.com/apache/parquet-format/pull/188 to clarify the
wording.
--
This is an automated message from the Apache Git Service.
To respond to the message, please
[
https://issues.apache.org/jira/browse/PARQUET-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646674#comment-17646674
]
ASF GitHub Bot commented on PARQUET-2218:
-
pitrou commented on PR #188:
URL:
[
https://issues.apache.org/jira/browse/PARQUET-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646672#comment-17646672
]
ASF GitHub Bot commented on PARQUET-2218:
-
pitrou opened a new pull request, #188:
URL:
pitrou commented on PR #188:
URL: https://github.com/apache/parquet-format/pull/188#issuecomment-1348714880
@bbraams @gszadovszky @mapleFU thoughts?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
pitrou opened a new pull request, #188:
URL: https://github.com/apache/parquet-format/pull/188
When trying to implement CRC computation in Parquet C++, we found the
wording to be ambiguous.
Clarify that CRC computation happens on the exact binary serialization
(instead of a
[
https://issues.apache.org/jira/browse/PARQUET-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Antoine Pitrou updated PARQUET-2218:
Description:
The format spec on CRC checksumming felt ambiguous when trying to implement
Antoine Pitrou created PARQUET-2218:
---
Summary: [Format] Clarify CRC computation
Key: PARQUET-2218
URL: https://issues.apache.org/jira/browse/PARQUET-2218
Project: Parquet
Issue Type:
[
https://issues.apache.org/jira/browse/PARQUET-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646656#comment-17646656
]
ASF GitHub Bot commented on PARQUET-1539:
-
pitrou commented on PR #126:
URL:
pitrou commented on PR #126:
URL: https://github.com/apache/parquet-format/pull/126#issuecomment-1348674622
@wgtmac No particular rule, no. AFAIU we only synchronize when we want to
get meaningful spec changes.
--
This is an automated message from the Apache Git Service.
To respond to
[
https://issues.apache.org/jira/browse/PARQUET-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646655#comment-17646655
]
ASF GitHub Bot commented on PARQUET-1539:
-
wgtmac commented on PR #126:
URL:
wgtmac commented on PR #126:
URL: https://github.com/apache/parquet-format/pull/126#issuecomment-1348672686
> And, yes, it would probably be nice to make the spec wording clearer. I
can try to submit something.
Quick question: is there any rule to sync the `parquet.thrift` file from
[
https://issues.apache.org/jira/browse/PARQUET-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646613#comment-17646613
]
ASF GitHub Bot commented on PARQUET-1539:
-
mapleFU commented on PR #126:
URL:
mapleFU commented on PR #126:
URL: https://github.com/apache/parquet-format/pull/126#issuecomment-1348441147
> And, yes, it would probably be nice to make the spec wording clearer. I
can try to submit something.
OK, thanks for your patient. I updated the descriptions in
[
https://issues.apache.org/jira/browse/PARQUET-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646612#comment-17646612
]
Antoine Pitrou commented on PARQUET-1629:
-
[~mwish] for the record. Perhaps you would be
[
https://issues.apache.org/jira/browse/PARQUET-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646609#comment-17646609
]
ASF GitHub Bot commented on PARQUET-1539:
-
pitrou commented on PR #126:
URL:
pitrou commented on PR #126:
URL: https://github.com/apache/parquet-format/pull/126#issuecomment-1348435655
And, yes, it would probably be nice to make the spec wording clearer. I can
try to submit something.
--
This is an automated message from the Apache Git Service.
To respond to the
[
https://issues.apache.org/jira/browse/PARQUET-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646610#comment-17646610
]
ASF GitHub Bot commented on PARQUET-1539:
-
pitrou commented on PR #126:
URL:
pitrou commented on PR #126:
URL: https://github.com/apache/parquet-format/pull/126#issuecomment-1348433863
It seems it was done deliberately in parquet-mr and all Parquet committers
there agreed that it was how the spec should be interpreted:
https://github.com/apache/parquet-mr/pull/647
[
https://issues.apache.org/jira/browse/PARQUET-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646595#comment-17646595
]
ASF GitHub Bot commented on PARQUET-1539:
-
mapleFU commented on PR #126:
URL:
mapleFU commented on PR #126:
URL: https://github.com/apache/parquet-format/pull/126#issuecomment-1348405417
So, should we update the `parquet-format`, or just keep it here and not
implement crc in parquet c++ version? @pitrou
--
This is an automated message from the Apache Git
[
https://issues.apache.org/jira/browse/PARQUET-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646591#comment-17646591
]
ASF GitHub Bot commented on PARQUET-1539:
-
pitrou commented on PR #126:
URL:
pitrou commented on PR #126:
URL: https://github.com/apache/parquet-format/pull/126#issuecomment-1348378187
It does seem that parquet-mr writes a CRC value for dictionary pages...
[
https://issues.apache.org/jira/browse/PARQUET-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646585#comment-17646585
]
ASF GitHub Bot commented on PARQUET-1539:
-
mapleFU commented on PR #126:
URL:
mapleFU commented on PR #126:
URL: https://github.com/apache/parquet-format/pull/126#issuecomment-1348324323
> (also cc @mapleFU, who's working on CRC support for Parquet C++)
Hi, all, I have a question here, the format says:
```
/** The 32bit CRC for the page, to be be
[
https://issues.apache.org/jira/browse/PARQUET-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646534#comment-17646534
]
ASF GitHub Bot commented on PARQUET-1539:
-
pitrou commented on PR #126:
URL:
pitrou commented on PR #126:
URL: https://github.com/apache/parquet-format/pull/126#issuecomment-1348081137
@bbraams @gszadovszky Could you explain why the spec's wording is so complex?
It seems to me that the CRC is basically computed over the entire serialized
data exactly as it's
38 matches
Mail list logo