Dear All,
I am trying to write data from a table into a parquet file using the
following code : (using parquet-hadoop 1.8.1 version)
*ParquetWriter writer =
AvroParquetWriter.builder(path).withSchema(schema)*
*.withCompressionCodec(CompressionCodecName.SNAPPY)*
*
[
https://issues.apache.org/jira/browse/PARQUET-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699732#comment-17699732
]
Gabor Szadovszky commented on PARQUET-2255:
---
But we don't build the dictionary for filtering
[
https://issues.apache.org/jira/browse/PARQUET-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699716#comment-17699716
]
Gang Wu commented on PARQUET-2255:
--
I think there is a similar issue in the dictionary encoding of
[
https://issues.apache.org/jira/browse/PARQUET-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699712#comment-17699712
]
Gabor Szadovszky commented on PARQUET-2255:
---
Bloom filters are for searching for exact
[
https://issues.apache.org/jira/browse/PARQUET-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699705#comment-17699705
]
ASF GitHub Bot commented on PARQUET-2257:
-
wgtmac commented on code in PR #194:
URL:
[
https://issues.apache.org/jira/browse/PARQUET-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699707#comment-17699707
]
ASF GitHub Bot commented on PARQUET-2257:
-
wgtmac commented on code in PR #194:
URL:
wgtmac commented on code in PR #194:
URL: https://github.com/apache/parquet-format/pull/194#discussion_r1134163134
##
src/main/thrift/parquet.thrift:
##
@@ -753,6 +753,9 @@ struct ColumnMetaData {
/** Byte offset from beginning of file to Bloom filter data. **/
14:
wgtmac commented on code in PR #194:
URL: https://github.com/apache/parquet-format/pull/194#discussion_r1134163134
##
src/main/thrift/parquet.thrift:
##
@@ -753,6 +753,9 @@ struct ColumnMetaData {
/** Byte offset from beginning of file to Bloom filter data. **/
14:
[
https://issues.apache.org/jira/browse/PARQUET-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699700#comment-17699700
]
ASF GitHub Bot commented on PARQUET-2257:
-
mapleFU commented on code in PR #194:
URL:
mapleFU commented on code in PR #194:
URL: https://github.com/apache/parquet-format/pull/194#discussion_r1134158075
##
src/main/thrift/parquet.thrift:
##
@@ -753,6 +753,9 @@ struct ColumnMetaData {
/** Byte offset from beginning of file to Bloom filter data. **/
14:
[
https://issues.apache.org/jira/browse/PARQUET-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699699#comment-17699699
]
ASF GitHub Bot commented on PARQUET-2257:
-
wgtmac commented on PR #194:
URL:
wgtmac commented on PR #194:
URL: https://github.com/apache/parquet-format/pull/194#issuecomment-1466350305
@emkornfield @pitrou @gszadovszky @shangxinli @mapleFU
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
[
https://issues.apache.org/jira/browse/PARQUET-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699696#comment-17699696
]
ASF GitHub Bot commented on PARQUET-2257:
-
wgtmac opened a new pull request, #194:
URL:
wgtmac opened a new pull request, #194:
URL: https://github.com/apache/parquet-format/pull/194
The specs has only added `bloom_filter_offset` to locate the bloom filter.
The reader cannot load the bloom filter in a single shot until it parses the
bloom filter header to get the total size.
[
https://issues.apache.org/jira/browse/PARQUET-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699694#comment-17699694
]
Gang Wu commented on PARQUET-2256:
--
Apache ORC supports compression of bloom filter. It would be nice
Gang Wu created PARQUET-2257:
Summary: [Format] Add bloom_filter_length to ColumnMetaData
Key: PARQUET-2257
URL: https://issues.apache.org/jira/browse/PARQUET-2257
Project: Parquet
Issue Type:
[
https://issues.apache.org/jira/browse/PARQUET-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gang Wu updated PARQUET-2256:
-
Component/s: (was: parquet-cpp)
> Adding Compression for BloomFilter
>
[
https://issues.apache.org/jira/browse/PARQUET-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gang Wu updated PARQUET-2256:
-
Component/s: parquet-format
> Adding Compression for BloomFilter
> --
[
https://issues.apache.org/jira/browse/PARQUET-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699691#comment-17699691
]
Gang Wu commented on PARQUET-2255:
--
cc [~gszadovszky] [~emkornfi...@gmail.com]
> BloomFilter and
[
https://issues.apache.org/jira/browse/PARQUET-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699686#comment-17699686
]
Gang Wu commented on PARQUET-2255:
--
These are good questions. Let me try to answer them from the
Xuwei Fu created PARQUET-2256:
-
Summary: Adding Compression for BloomFilter
Key: PARQUET-2256
URL: https://issues.apache.org/jira/browse/PARQUET-2256
Project: Parquet
Issue Type: Improvement
Xuwei Fu created PARQUET-2255:
-
Summary: BloomFilter and float point is ambiguous
Key: PARQUET-2255
URL: https://issues.apache.org/jira/browse/PARQUET-2255
Project: Parquet
Issue Type:
22 matches
Mail list logo