Re: Improving Parquet Dedupe

2024-10-09 Thread Steve Loughran
flatbuffer would be the obvious place that would be no compatibility issues with existing readers. Also: that looks like a large amount of information to capture statistics on. Has anyone approached them yet? On Wed, 9 Oct 2024 at 03:39, Gang Wu wrote: > Thanks Antoine for sharing the blog post

Re: today's Parquet sync

2024-09-25 Thread Steve Loughran
https://datadog.zoom.us/j/94032548940?pwd=9hAK4ZiSBHR8pgQ0QGoQTUB79J34L0.1 On Wed, 25 Sept 2024 at 17:07, Aihua Xu wrote: > Hi Julien, > > Where can I find the meeting link or can you share with me? > > Thanks, > Aihua > > On 2024/09/25 15:00:08 Julien Le Dem wrote: > > The Parquet Sync is happe

Re: [VOTE] Apache Parquet Java 1.14.2 RC1

2024-09-20 Thread Steve Loughran
On Wed, 21 Aug 2024 at 23:31, Julien Le Dem wrote: > Thank you Gang, > I wish I had found [1] before, I would not have built one! It is > *inconvenient* that the official tap just removes the previous version > when they upgrade the thrift formula. > Julien > > homebrew isn't a real package mana

Re: [DISCUSS] Moving Variant to Parquet Details

2024-09-11 Thread Steve Loughran
I'm thinking about some implementation issues, especially that well-known obsession of mine: demonstrating the correctness of specifications through machine readable formats such as TLA+, JUnit and scalatest (*) 1. the spec and at least some test suites should be closely linked, so that all

Re: Avro version support in parquet-java

2024-08-27 Thread Steve Loughran
hadoop uses its own shaded avro 1.11 lib internally (hadoop-thirdparty/shaded-avro-1.11, I think it is stuck on the public classpath as some stuff bridges to avro (org.apache.hadoop.fs.AvroFSInput} . I'm tempted to tag that as Deprecated to move people off it so that there's no public avro dependen

Re: [ANNOUNCE] New Parquet Committer: Xuwei Fu

2024-07-12 Thread Steve Loughran
congrats Xuwei! On Thu, 11 Jul 2024 at 07:24, Gang Wu wrote: > On behalf of the Apache Parquet PMC, I'm happy to announce that Xuwei Fu > has accepted an invitation to become a committer on Apache Parquet. > Welcome, and thank you for your contributions! > > Thanks, > Gang >

Re: [DISCUSS] Parquet sync day and time

2024-07-09 Thread Steve Loughran
i actually think publicising the date/times on the list would be good; I missed this (though as I was off last week, not sure if I'd have turned up) On Tue, 9 Jul 2024 at 10:33, Gang Wu wrote: > Thanks for the discussion! > > I'm in GMT+8 so I would prefer 8am-10am PT, though it is already > mid

Re: [VOTE] Migration of parquet-* issues from Jira to GitHub

2024-06-17 Thread Steve Loughran
=0 I'm going to miss * the ability cross reference stuff from other jira projects * the simplicity of being able to use a string like "PARQUET-123" to refer to an issue * the ease of being able to set up your ide and web browser to go from a reference like this to a jira page * maybe uber-JIRAs I

Re: [Parquet-java] Are there release instructions documented any place?

2024-05-31 Thread Steve Loughran
now is your chance to improve them. FWIW Hadoop has a separate module to automate as many of the operations which can be done https://github.com/apache/hadoop-release-support 1. this includes things like; patching the x86 tarball with the arm binaries and generating new checksums 2. GPG

Re: [DISCUSS] Arrow dropping Java 8 support

2024-05-31 Thread Steve Loughran
1. Hadoop doesn't use arrow. 2. The Hadoop team would love to drop java 8 and in the last release said "will happen soon" 3. all the client stuff is happy java17+, it's just some of the server side stuff which is a bit of pain point. 4. Hadoop 3.2.x is needed to move beyond java8.

Re: [DISCUSS] Integration testing

2024-05-30 Thread Steve Loughran
On Tue, 28 May 2024 at 14:37, Andrew Lamb wrote: > One thing we could that might move the burden on to implementations rather > than some central CI job (which is a substantial effort, I agree, having > worked with the arrow ne) > > Have you any slides/docs on the experience here? > Perhaps we

Re: [DISCUSS] Encoding improvements (follow-up from Parquet "V3" discussion)

2024-05-30 Thread Steve Loughran
be good for a benchmark to be targetable at cloud storage; local stores, especially those with SSD, hide a lot of the costs of datalakes On Tue, 28 May 2024 at 07:17, Micah Kornfield wrote: > As a follow-up to the "V3" Discussions [1][2] I wanted to start a thread on > improvements to encodings.

Re: Is Parquet Meant As a Standalone Database or is a Catalog/Metastore Required?

2024-05-24 Thread Steve Loughran
; > > would be considered to be pushing reasonable boundaries. To some > extent > > > these might be solvable by having libraries have better defaults (e.g. > > only > > > collecting/writing statistics by default for the first N columns). > > > > > >

Re: Interest in Parquet V3

2024-05-22 Thread Steve Loughran
On Tue, 21 May 2024 at 22:40, Jan Finis wrote: > Thanks Weston for posting here! > > I appreciate this a lot, as it gives us the opportunity to discuss modern > formats in depth with the authors themselves, who probably know the design > trade-offs they took best and thus can give us a deeper und

Re: Is Parquet Meant As a Standalone Database or is a Catalog/Metastore Required?

2024-05-21 Thread Steve Loughran
I wish people would use avro over CSV. Not just for the schema or more complex structures, but because the parser recognises corrupt files. Oh, and the well defined serialization formats for things like "string" and "number" that said, I generate CSV in test/utility code because it is trivial do i

Re: [DISCUSS] Parquet Reference Implementation ?

2024-05-17 Thread Steve Loughran
I'd argue the compatibility across implementation is "can they correctly read the data generated by the others?", so there's less of an RI than compliance testing, the way closed source stuff often works. Specification 1. Files generated by the implementation which are believed to match the

Re: Interest in Parquet V3

2024-05-15 Thread Steve Loughran
On Tue, 14 May 2024 at 17:48, Julien Le Dem wrote: > +1 on Micah starting a doc and following up by commenting in it. > +maybe some conf call where people of interest can talk about it. > > @Raphael, Wish Maple: agreed that changing the metadata representation is > less important. Most engine

Re: [DISCUSS] Parquet Reference Implementation ?

2024-05-14 Thread Steve Loughran
1., yes. IMO parquet-mr should be the RI, though a feature could only be declared as "done" when there is >1 implementation 2. What about interoperability and compliance testing? that is, rather than an RI, a set of test suites which somehow every impl has to pass. Tricky cross-platform though. On

Re: Interest in Parquet V3

2024-05-14 Thread Steve Loughran
BTW, has everyone read "An Empirical Evaluation of Columnar Storage Formats"? https://arxiv.org/abs/2304.05028 good review of how things could be better with real numbers. Highlights that encoding plugins may be inefficient, based on the ORC experience. w.r.t metadata 1. could the old and t

Re: Interest in Parquet V3

2024-05-13 Thread Steve Loughran
call it parquet.ml then. which is what I've had in my head as I was thinking about this last week. as the datatypes and the library uses (GPUs, ...) would be targeted at this. I'd also like a design optimised for high-latency cloud storage where seek sucks but parallel reads are easy, and we can l

Re: Parquet Sync meeting notes - April 23 2024

2024-04-25 Thread Steve Loughran
In my vector IO PR it was raising false positives about a new class. Maybe the process should be something like "submitter needs approval for extra exclusions" more troublesome: once a release has shipped, those exclusions should be stripped from the master branch, as then a regression is a genu

Re: Parquet Sync meeting notes - April 23 2024

2024-04-24 Thread Steve Loughran
where is the timetable for these calls? I think I'd like to join in if the timing works for me (UK) On Tue, 23 Apr 2024 at 15:31, Xinli shang wrote: > 4/23/2024 > > Attendee Fokko Driesprong, Vinoo Ganesh, Xinli Shang > > > Parquet-mr 1.14 release: > > 1. Fokko and Gang will discuss starting th

testing with hadoop 3.4.0

2024-03-23 Thread Steve Loughran
has anyone else been testing parquet with hadoop 3.4.0? the hadoop release is out and it all compiled nicely, but i'm seeing what looks like jackson complaints on TestInputOutput 17:54:03.642 [Thread-417] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local1837703063_0014 java.lang.Exception:

Re: Newly-registered IANA Media Type for Parquet

2024-03-15 Thread Steve Loughran
This is awesome! Congratulations! For anyone who wants to get this header onto files going on S3 it is fairly straightforward on hadoop 3.3.5+ FSDataOutputStream out = fs.createFile("s3a://data/output.parquet") .overwrite(true) // saves a HEAD .opt("fs.s3a.create.header.Content-Type", "appli

Re: [DISCUSS] Parquet 1.14.0 and looking forward

2024-02-22 Thread Steve Loughran
Apologies for not making any progress -been too busy with releases. This week I am helping Hadoop 3.4.0 out the door. Hopefully we will only need one more iteration to get the packaging right (essentially strip out as many transient JARs as we can). My release module does actually build parquet as

Re: Guidelines for working on parquet-mr?

2024-01-15 Thread Steve Loughran
hadn't see that or why; will look at it. I'll probably suggestg one use of our LogExactlyOnce logger at info there's another option here, for testing, which is : downgrade the logging on that class? it'd work across so many more releases On Sat, 13 Jan 2024 at 22:28, Atour Mousavi Gourabi wrote:

Re: Pitch for Pcodec Encoding in Parquet

2024-01-04 Thread Steve Loughran
On Wed, 3 Jan 2024 at 05:10, Martin Loncaric wrote: > I'd like to propose and get feedback on a new encoding for numerical > columns: pco. I just did a blog post demonstrating how this would perform > on various real-world datasets >

[jira] [Commented] (PARQUET-2346) Bump org.slf4j:slf4j-api from 1.7.12 to 2.0.9

2023-09-13 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764774#comment-17764774 ] Steve Loughran commented on PARQUET-2346: - I don't know what happen

[jira] [Comment Edited] (PARQUET-2346) Bump org.slf4j:slf4j-api from 1.7.12 to 2.0.9

2023-09-13 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764774#comment-17764774 ] Steve Loughran edited comment on PARQUET-2346 at 9/13/23 4:2

[jira] [Commented] (PARQUET-2346) Bump org.slf4j:slf4j-api from 1.7.12 to 2.0.9

2023-09-12 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764248#comment-17764248 ] Steve Loughran commented on PARQUET-2346: - what is this going to do in t

[jira] [Commented] (PARQUET-2338) CVE-2022-25168 in hadoop-common

2023-08-21 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17756912#comment-17756912 ] Steve Loughran commented on PARQUET-2338: - pr #1065 did thi

Re: building parquet macbook m1 with thrift 0.15.0

2023-06-15 Thread Steve Loughran
rite a docker container to build parquet on my > macbook. I spent a couple of hours trying and failing to build it directly > and got a docker solution working in far less time. > > On Wed, Jun 14, 2023 at 7:50 AM Steve Loughran > > wrote: > > > How do people get a version

building parquet macbook m1 with thrift 0.15.0

2023-06-14 Thread Steve Loughran
How do people get a version of the native thrift binaries onto their macbook such that parquet build? 1. as homebrew is on 0.18.1, and if you try to build with that you can see that thrift has added some new things to implement. 2. try to rebuild thrift 0.15 and you end up in cmake pain

[jira] [Commented] (PARQUET-2128) Bump Thrift to 0.16.0

2023-06-14 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732509#comment-17732509 ] Steve Loughran commented on PARQUET-2128: - homebrew doesn't have any

[jira] [Commented] (PARQUET-2171) Implement vectored IO in parquet file format

2023-05-19 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17724339#comment-17724339 ] Steve Loughran commented on PARQUET-2171: - mukund, is there a PR up for

[jira] [Commented] (PARQUET-2276) ParquetReader reads do not work with Hadoop version 2.8.5

2023-05-01 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17718203#comment-17718203 ] Steve Loughran commented on PARQUET-2276: - [~a2l] really? hadoop 2.8?

[jira] [Commented] (PARQUET-2289) Avoid using hasCapability

2023-04-19 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713993#comment-17713993 ] Steve Loughran commented on PARQUET-2289: - I'm not convinced here.

[jira] [Commented] (PARQUET-2276) ParquetReader reads do not work with Hadoop version 2.8.5

2023-04-14 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17712325#comment-17712325 ] Steve Loughran commented on PARQUET-2276: - hadoop 2.8 shipped 5 years

[jira] [Commented] (PARQUET-2277) Bump hadoop.version from 3.2.3 to 3.3.5

2023-04-14 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17712323#comment-17712323 ] Steve Loughran commented on PARQUET-2277: - happy. Have you considered cut

[jira] [Commented] (PARQUET-1989) Deep verification of encrypted files

2023-04-12 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17711367#comment-17711367 ] Steve Loughran commented on PARQUET-1989: - you might want to have a de

[jira] [Commented] (PARQUET-2224) Publish SBOM artifacts

2023-03-28 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17705940#comment-17705940 ] Steve Loughran commented on PARQUET-2224: - it's not spark, its a cycl

[jira] [Commented] (PARQUET-2224) Publish SBOM artifacts

2023-03-27 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17705646#comment-17705646 ] Steve Loughran commented on PARQUET-2224: - +SPARK-42380 > Publi

[jira] [Commented] (PARQUET-2224) Publish SBOM artifacts

2023-03-27 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17705645#comment-17705645 ] Steve Loughran commented on PARQUET-2224: - HADOOP-18641. didnt' actua

[jira] [Commented] (PARQUET-2224) Publish SBOM artifacts

2023-03-27 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17705327#comment-17705327 ] Steve Loughran commented on PARQUET-2224: - we had to roll this back

[jira] [Commented] (PARQUET-2239) Replace log4j1 with reload4j

2023-02-06 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684607#comment-17684607 ] Steve Loughran commented on PARQUET-2239: - good, but trickier than you t

[jira] [Resolved] (PARQUET-2173) Fix parquet build against hadoop 3.3.3+

2023-02-02 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved PARQUET-2173. - Fix Version/s: 1.13.0 Resolution: Fixed > Fix parquet build against had

Re: parquet checksum coverage

2022-12-02 Thread Steve Loughran
hecksummed > > 2. I see that verification as set > >by "parquet.page.verify-checksum.enabled" is false by default. Why > > isn't it > >on? is there a significant performance hit. > > Sorry I don't know the answer to this. > > On Mon, N

[jira] [Commented] (PARQUET-2216) Parquet writer classes don't close underlying output stream in case of errors.

2022-12-02 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642453#comment-17642453 ] Steve Loughran commented on PARQUET-2216: - * OutputFile may not imple

parquet checksum coverage

2022-11-14 Thread Steve Loughran
hi I am busy dealing with a bug where the Azure abfs connector can get the prefetch data blocks of one thread/task overwritten by those of another task whose input stream was closed while a prefetch was in progress. https://issues.apache.org/jira/browse/HADOOP-18521 I have not been able to trigge

[jira] [Created] (PARQUET-2173) Fix parquet build against hadoop 3.3.3+

2022-08-16 Thread Steve Loughran (Jira)
Steve Loughran created PARQUET-2173: --- Summary: Fix parquet build against hadoop 3.3.3+ Key: PARQUET-2173 URL: https://issues.apache.org/jira/browse/PARQUET-2173 Project: Parquet Issue Type

[jira] [Commented] (PARQUET-2171) Implement vectored IO in parquet file format

2022-08-11 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578637#comment-17578637 ] Steve Loughran commented on PARQUET-2171: - bq. I have found ByteBuffe

Re: Fail to read back written large parquet file

2022-08-05 Thread Steve Loughran
tha has to be an integer wraparound...something is using a signed int for position, so when it goes above 2GB it goes negative, and a seek(negative value) is rejected. fix: find the variable and make it a long On Thu, 4 Aug 2022 at 11:09, Jozef Vilcek wrote: > I came across a case where a job

[jira] [Resolved] (PARQUET-2158) Upgrade Hadoop dependency to version 3.2.0

2022-08-01 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved PARQUET-2158. - Fix Version/s: 1.13.0 Resolution: Fixed > Upgrade Hadoop dependency to vers

[jira] [Resolved] (PARQUET-2150) parquet-protobuf to compile on mac M1

2022-07-19 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved PARQUET-2150. - Resolution: Not A Problem with PARQUET-2155 this problem is implicitly fixed

[jira] [Updated] (PARQUET-2165) remove deprecated PathGlobPattern and DeprecatedFieldProjectionFilter to compile on hadoop 3.2+

2022-07-12 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated PARQUET-2165: Summary: remove deprecated PathGlobPattern and DeprecatedFieldProjectionFilter to

[jira] [Created] (PARQUET-2165) remove deprecated PathGlobPattern to compile on hadoop 3.2+

2022-07-12 Thread Steve Loughran (Jira)
Steve Loughran created PARQUET-2165: --- Summary: remove deprecated PathGlobPattern to compile on hadoop 3.2+ Key: PARQUET-2165 URL: https://issues.apache.org/jira/browse/PARQUET-2165 Project: Parquet

[jira] [Commented] (PARQUET-2158) Upgrade Hadoop dependency to version 3.2.0

2022-06-13 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553552#comment-17553552 ] Steve Loughran commented on PARQUET-2158: - build is broken by HADOOP-1

[jira] [Created] (PARQUET-2158) Upgrade Hadoop dependency to version 3.2.0

2022-06-13 Thread Steve Loughran (Jira)
Steve Loughran created PARQUET-2158: --- Summary: Upgrade Hadoop dependency to version 3.2.0 Key: PARQUET-2158 URL: https://issues.apache.org/jira/browse/PARQUET-2158 Project: Parquet Issue

[jira] [Commented] (PARQUET-2155) Upgrade protobuf version to 3.20.1

2022-06-10 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552638#comment-17552638 ] Steve Loughran commented on PARQUET-2155: - there's a 3.21.1 ou

[jira] [Updated] (PARQUET-2151) Drop Hadoop 2 input stream reflection from parquet-hadoop

2022-06-07 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated PARQUET-2151: Description: Parquet uses reflection to load a hadoop2 input stream, falling back to a

[jira] [Updated] (PARQUET-2151) Drop Hadoop 2 input stream reflection from parquet-hadoop

2022-06-07 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated PARQUET-2151: Summary: Drop Hadoop 2 input stream reflection from parquet-hadoop (was: Drop Hadoop 1

[jira] [Updated] (PARQUET-2151) Drop Hadoop 1 input stream support from parquet-hadoop

2022-06-07 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated PARQUET-2151: Description: Parquet uses reflection to load a hadoop2 input stream, falling back to a

[jira] [Updated] (PARQUET-2151) Drop Hadoop 1 input stream support from parquet-hadoop

2022-06-06 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated PARQUET-2151: Summary: Drop Hadoop 1 input stream support from parquet-hadoop (was: parquet-hadoop

[jira] [Commented] (PARQUET-2134) Incorrect type checking in HadoopStreams.wrap

2022-06-06 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17550592#comment-17550592 ] Steve Loughran commented on PARQUET-2134: - have you a full stack trace. as

[jira] [Created] (PARQUET-2151) parquet-hadoop to drop Hadoop 1 input stream support

2022-06-06 Thread Steve Loughran (Jira)
Steve Loughran created PARQUET-2151: --- Summary: parquet-hadoop to drop Hadoop 1 input stream support Key: PARQUET-2151 URL: https://issues.apache.org/jira/browse/PARQUET-2151 Project: Parquet

[jira] [Commented] (PARQUET-2150) parquet-protobuf to compile on mac M1

2022-05-23 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541068#comment-17541068 ] Steve Loughran commented on PARQUET-2150: - same issue and solution as HA

[jira] [Created] (PARQUET-2150) parquet-protobuf to compile on mac M1

2022-05-23 Thread Steve Loughran (Jira)
Steve Loughran created PARQUET-2150: --- Summary: parquet-protobuf to compile on mac M1 Key: PARQUET-2150 URL: https://issues.apache.org/jira/browse/PARQUET-2150 Project: Parquet Issue Type

[jira] [Commented] (PARQUET-1615) getRecordWriter shouldn't hardcode CREAT mode when new ParquetFileWriter

2021-08-02 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391699#comment-17391699 ] Steve Loughran commented on PARQUET-1615: - just looking at this. Any spec

[jira] [Comment Edited] (PARQUET-1984) Some tests fail on windows

2021-07-08 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377466#comment-17377466 ] Steve Loughran edited comment on PARQUET-1984 at 7/8/21, 3:4

[jira] [Commented] (PARQUET-1984) Some tests fail on windows

2021-07-08 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377466#comment-17377466 ] Steve Loughran commented on PARQUET-1984: - FYI, this change stops the