[jira] [Commented] (PARQUET-1976) Use net.alchim31.maven:scala-maven-plugin instead of org.scala-tools:maven-scala-plugin

2021-02-09 Thread Michael Heuer (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281920#comment-17281920 ] Michael Heuer commented on PARQUET-1976: Re: Scala 2.12.12, note commen

[jira] [Commented] (PARQUET-1894) Please fix the related Shaded Jackson Databind CVEs

2020-08-01 Thread Michael Heuer (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17169321#comment-17169321 ] Michael Heuer commented on PARQUET-1894: I would love to hear otherwise, b

Re: Preparing release 1.11.1

2020-01-28 Thread Michael Heuer
Hello Gabor, These Parquet/Avro/Spark version incompatibilities are not new with Parquet 1.11.0, similar issues were present when Spark depended on Avro 1.7.x and also depended on a Parquet version which required Avro 1.8.x. Perhaps if the parquet-avro test scope dependency did not exclude the

Re: Preparing release 1.11.1

2020-01-24 Thread Michael Heuer
bor > > On Thu, Jan 23, 2020 at 7:08 PM Michael Heuer <mailto:heue...@gmail.com>> wrote: > >> For example, >> >> https://github.com/bigdatagenomics/adam/pull/2245 >> <https://github.com/bigdatagenomics/adam/pull/2245> < >> https:/

Re: Preparing release 1.11.1

2020-01-23 Thread Michael Heuer
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) > On Jan 23, 2020, at 10:52 AM, Michael Heuer wrote: > > Hello Gabor, > > This Spark PR upgrades Parquet but does not upgrade Avro, note the exclusion > for parquet-avro

Re: Parquet Verbose Logging

2020-01-23 Thread Michael Heuer
Hello David, As I mentioned on PARQUET-1758, we have been frustrated by overly verbose logging in Parquet for a long time. Various workarounds have been more or less successful, e.g. https://github.com/bigdatagenomics/adam/issues/851 I wou

Re: Preparing release 1.11.1

2020-01-23 Thread Michael Heuer
Hello Gabor, This Spark PR upgrades Parquet but does not upgrade Avro, note the exclusion for parquet-avro https://github.com/apache/spark/pull/26804/files#diff-600376dffeb79835ede4a0b285078036R2104

[jira] [Commented] (PARQUET-1758) InternalParquetRecordReader Logging it Too Verbose

2020-01-12 Thread Michael Heuer (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013869#comment-17013869 ] Michael Heuer commented on PARQUET-1758: +1, excessive logging from Par

Re: [VOTE] Release Apache Parquet 1.11.0 RC7

2019-11-22 Thread Michael Heuer
e caller and the callee shipped by parquet-mr. > So, I'm quite sure it is a classpath issue. It seems that the 1.11 version > of the parquet-column jar is not on the classpath. > > > On Fri, Nov 22, 2019 at 1:44 AM Michael Heuer wrote: > >> The dependency v

Re: [VOTE] Release Apache Parquet 1.11.0 RC7

2019-11-21 Thread Michael Heuer
er called before by AvroSchemaConverter) of this >> method is kept untouched. It seems to me that the Parquet version on Spark >> executor mismatch: parquet-avro is on 1.11.0, but parquet-column is still >> on an older version. >> >> On Thu, Nov 21, 2019 at 11:4

Re: [VOTE] Release Apache Parquet 1.11.0 RC7

2019-11-21 Thread Michael Heuer
chael, >> >> Unfortunately, I don't have too much experience on Spark. But if spark uses >> the parquet-mr library in an embedded way (that's how Hive uses it) it is >> required to re-build Spark with 1.11 RC parquet-mr. >> >> Regards, >> Gabor >

Re: [VOTE] Release Apache Parquet 1.11.0 RC7

2019-11-20 Thread Michael Heuer
his is heading out-of-scope for the Parquet mailing list. michael > On Nov 20, 2019, at 10:00 AM, Michael Heuer wrote: > > I am willing to do some benchmarking on genomic data at scale but am not > quite sure what the Spark target version for 1.11.0 might be. Will Pa

Re: [VOTE] Release Apache Parquet 1.11.0 RC7

2019-11-20 Thread Michael Heuer
I am willing to do some benchmarking on genomic data at scale but am not quite sure what the Spark target version for 1.11.0 might be. Will Parquet 1.11.0 be compatible in Spark 2.4.x? Updating from 1.10.1 to 1.11.0 breaks at runtime in our build … D 0, localhost, executor driver): java.lang.N

[jira] [Commented] (PARQUET-1645) Bump Apache Avro to 1.9.1

2019-11-07 Thread Michael Heuer (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969351#comment-16969351 ] Michael Heuer commented on PARQUET-1645: I am very curious about

[jira] [Commented] (PARQUET-1241) [C++] Use LZ4 frame format

2019-11-03 Thread Michael Heuer (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965675#comment-16965675 ] Michael Heuer commented on PARQUET-1241: For JVM implementations, note

Re: block-vs-frame LZ4 in Parquet

2019-10-27 Thread Michael Heuer
> We are waiting on a volunteer to come forward and sort out the proper > implementation in Parquet C++ for LZ4. It is disabled in the meantime > I think > > On Wed, Oct 23, 2019 at 10:11 AM Michael Heuer wrote: >> >> Hello, >> >> There are a few issues

Re: PARQUET-1441/parquet-mr #560 in 1.11.0 release?

2019-10-24 Thread Michael Heuer
t/1e5fda5310687b0856e74f00a4ea420b6b1ab34d >> So >> it will be included in 1.11. >> >> Cheers, Fokko Driesprong >> >> Op wo 23 okt. 2019 om 16:52 schreef Michael Heuer : >> >>> Hello, >>> >>> https://github.com/apache/parquet-mr/pu

block-vs-frame LZ4 in Parquet

2019-10-23 Thread Michael Heuer
Hello, There are a few issues related to block-vs-frame LZ4 compression in Parquet/Arrow and related https://issues.apache.org/jira/browse/PARQUET-1241 https://issues.apache.org/jira/browse/PARQUET-1515

PARQUET-1441/parquet-mr #560 in 1.11.0 release?

2019-10-23 Thread Michael Heuer
Hello, https://github.com/apache/parquet-mr/pull/560 was merged to master in May but I am still not able to confirm whether this fix will be in the upcoming Parquet 1.11.0 release. The linked JIRA issue https://issues.apache.org/jira/browse/PARQU

Re: Stalebot

2019-10-23 Thread Michael Heuer
I personally would rather we not use such a bot. If there are long running pull requests, ping the author and ask if the changes are still relevant, and if so, ask the author to rebase. > On Oct 23, 2019, at 9:12 AM, Xinli shang wrote: > > Agree with Junjie for the longer limit. We have sever

Re: Floating point data compression for Apache Parquet

2019-07-12 Thread Michael Heuer
Hello Martin, I'm willing to run some tests at scale on our genomics data when a parquet-mr pull request for the Java implementation is ready. Cheers, michael > On Jul 11, 2019, at 1:09 PM, Radev, Martin wrote: > > Dear all, > > > I created a Jira issue for the new feature and also mad

Re: [VOTE] Release Apache Parquet 1.11.0 RC6

2019-05-31 Thread Michael Heuer
Might https://github.com/apache/parquet-mr/pull/560 be included in the next 1.11.0 release candidate? michael > On May 31, 2019, at 11:09 AM, Ryan Blue wrote: > > I'm hoping to find some time to get a release candidate out next week or > th

Re: add new custom encodings

2019-04-24 Thread Michael Heuer
s://github.com/apache/parquet-format/blob/master/Encodings.md > > or something different? > > - Wes > > On Wed, Apr 24, 2019 at 1:30 PM Michael Heuer wrote: >> >> Hello, >> >> Might someone be able to point me to the Parquet APIs I would need to work >

add new custom encodings

2019-04-24 Thread Michael Heuer
Hello, Might someone be able to point me to the Parquet APIs I would need to work with to add new custom encodings for String columns? We have a few different bit-packing encodings that take advantage of the particular data stored in those columns. Thank you in advance, michael

[jira] [Commented] (PARQUET-1441) SchemaParseException: Can't redefine: list in AvroIndexedRecordConverter

2018-11-27 Thread Michael Heuer (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700834#comment-16700834 ] Michael Heuer commented on PARQUET-1441: Sorry, which compatibility check

[jira] [Commented] (PARQUET-1441) SchemaParseException: Can't redefine: list in AvroIndexedRecordConverter

2018-11-26 Thread Michael Heuer (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16699567#comment-16699567 ] Michael Heuer commented on PARQUET-1441: Note as mentioned above that w

[jira] [Commented] (PARQUET-1441) SchemaParseException: Can't redefine: list in AvroIndexedRecordConverter

2018-10-10 Thread Michael Heuer (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16645480#comment-16645480 ] Michael Heuer commented on PARQUET-1441: I've found I can get a simi

[jira] [Created] (PARQUET-1441) SchemaParseException: Can't redefine: list in AvroIndexedRecordConverter

2018-10-09 Thread Michael Heuer (JIRA)
Michael Heuer created PARQUET-1441: -- Summary: SchemaParseException: Can't redefine: list in AvroIndexedRecordConverter Key: PARQUET-1441 URL: https://issues.apache.org/jira/browse/PARQUET