Writing to Parquet

2023-03-13 Thread srinivasarao vundavalli
Dear All, I am trying to write data from a table into a parquet file using the following code : (using parquet-hadoop 1.8.1 version) *ParquetWriter writer = AvroParquetWriter.builder(path).withSchema(schema)* *.withCompressionCodec(CompressionCodecName.SNAPPY)* *

[jira] [Commented] (PARQUET-2255) BloomFilter and float point is ambiguous

2023-03-13 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699732#comment-17699732 ] Gabor Szadovszky commented on PARQUET-2255: --- But we don't build the dictionary for filtering

[jira] [Commented] (PARQUET-2255) BloomFilter and float point is ambiguous

2023-03-13 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699716#comment-17699716 ] Gang Wu commented on PARQUET-2255: -- I think there is a similar issue in the dictionary encoding of

[jira] [Commented] (PARQUET-2255) BloomFilter and float point is ambiguous

2023-03-13 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699712#comment-17699712 ] Gabor Szadovszky commented on PARQUET-2255: --- Bloom filters are for searching for exact

[jira] [Commented] (PARQUET-2257) [Format] Add bloom_filter_length to ColumnMetaData

2023-03-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699705#comment-17699705 ] ASF GitHub Bot commented on PARQUET-2257: - wgtmac commented on code in PR #194: URL:

[jira] [Commented] (PARQUET-2257) [Format] Add bloom_filter_length to ColumnMetaData

2023-03-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699707#comment-17699707 ] ASF GitHub Bot commented on PARQUET-2257: - wgtmac commented on code in PR #194: URL:

[GitHub] [parquet-format] wgtmac commented on a diff in pull request #194: PARQUET-2257: Add bloom_filter_length to ColumnMetaData

2023-03-13 Thread via GitHub
wgtmac commented on code in PR #194: URL: https://github.com/apache/parquet-format/pull/194#discussion_r1134163134 ## src/main/thrift/parquet.thrift: ## @@ -753,6 +753,9 @@ struct ColumnMetaData { /** Byte offset from beginning of file to Bloom filter data. **/ 14:

[GitHub] [parquet-format] wgtmac commented on a diff in pull request #194: PARQUET-2257: Add bloom_filter_length to ColumnMetaData

2023-03-13 Thread via GitHub
wgtmac commented on code in PR #194: URL: https://github.com/apache/parquet-format/pull/194#discussion_r1134163134 ## src/main/thrift/parquet.thrift: ## @@ -753,6 +753,9 @@ struct ColumnMetaData { /** Byte offset from beginning of file to Bloom filter data. **/ 14:

[jira] [Commented] (PARQUET-2257) [Format] Add bloom_filter_length to ColumnMetaData

2023-03-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699700#comment-17699700 ] ASF GitHub Bot commented on PARQUET-2257: - mapleFU commented on code in PR #194: URL:

[GitHub] [parquet-format] mapleFU commented on a diff in pull request #194: PARQUET-2257: Add bloom_filter_length to ColumnMetaData

2023-03-13 Thread via GitHub
mapleFU commented on code in PR #194: URL: https://github.com/apache/parquet-format/pull/194#discussion_r1134158075 ## src/main/thrift/parquet.thrift: ## @@ -753,6 +753,9 @@ struct ColumnMetaData { /** Byte offset from beginning of file to Bloom filter data. **/ 14:

[jira] [Commented] (PARQUET-2257) [Format] Add bloom_filter_length to ColumnMetaData

2023-03-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699699#comment-17699699 ] ASF GitHub Bot commented on PARQUET-2257: - wgtmac commented on PR #194: URL:

[GitHub] [parquet-format] wgtmac commented on pull request #194: PARQUET-2257: Add bloom_filter_length to ColumnMetaData

2023-03-13 Thread via GitHub
wgtmac commented on PR #194: URL: https://github.com/apache/parquet-format/pull/194#issuecomment-1466350305 @emkornfield @pitrou @gszadovszky @shangxinli @mapleFU -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[jira] [Commented] (PARQUET-2257) [Format] Add bloom_filter_length to ColumnMetaData

2023-03-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699696#comment-17699696 ] ASF GitHub Bot commented on PARQUET-2257: - wgtmac opened a new pull request, #194: URL:

[GitHub] [parquet-format] wgtmac opened a new pull request, #194: PARQUET-2257: Add bloom_filter_length to ColumnMetaData

2023-03-13 Thread via GitHub
wgtmac opened a new pull request, #194: URL: https://github.com/apache/parquet-format/pull/194 The specs has only added `bloom_filter_offset` to locate the bloom filter. The reader cannot load the bloom filter in a single shot until it parses the bloom filter header to get the total size.

[jira] [Commented] (PARQUET-2256) Adding Compression for BloomFilter

2023-03-13 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699694#comment-17699694 ] Gang Wu commented on PARQUET-2256: -- Apache ORC supports compression of bloom filter. It would be nice

[jira] [Created] (PARQUET-2257) [Format] Add bloom_filter_length to ColumnMetaData

2023-03-13 Thread Gang Wu (Jira)
Gang Wu created PARQUET-2257: Summary: [Format] Add bloom_filter_length to ColumnMetaData Key: PARQUET-2257 URL: https://issues.apache.org/jira/browse/PARQUET-2257 Project: Parquet Issue Type:

[jira] [Updated] (PARQUET-2256) Adding Compression for BloomFilter

2023-03-13 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2256: - Component/s: (was: parquet-cpp) > Adding Compression for BloomFilter >

[jira] [Updated] (PARQUET-2256) Adding Compression for BloomFilter

2023-03-13 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2256: - Component/s: parquet-format > Adding Compression for BloomFilter > --

[jira] [Commented] (PARQUET-2255) BloomFilter and float point is ambiguous

2023-03-13 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699691#comment-17699691 ] Gang Wu commented on PARQUET-2255: -- cc [~gszadovszky] [~emkornfi...@gmail.com] > BloomFilter and

[jira] [Commented] (PARQUET-2255) BloomFilter and float point is ambiguous

2023-03-13 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699686#comment-17699686 ] Gang Wu commented on PARQUET-2255: -- These are good questions. Let me try to answer them from the

[jira] [Created] (PARQUET-2256) Adding Compression for BloomFilter

2023-03-13 Thread Xuwei Fu (Jira)
Xuwei Fu created PARQUET-2256: - Summary: Adding Compression for BloomFilter Key: PARQUET-2256 URL: https://issues.apache.org/jira/browse/PARQUET-2256 Project: Parquet Issue Type: Improvement

[jira] [Created] (PARQUET-2255) BloomFilter and float point is ambiguous

2023-03-13 Thread Xuwei Fu (Jira)
Xuwei Fu created PARQUET-2255: - Summary: BloomFilter and float point is ambiguous Key: PARQUET-2255 URL: https://issues.apache.org/jira/browse/PARQUET-2255 Project: Parquet Issue Type: