[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-12-04 Thread ASF GitHub Bot (Jira)
URL: https://github.com/apache/parquet-mr/pull/#issuecomment-1838584976 @amousavigourabi , that's actually what I did and it's working for us now. Thanks > Parquet without Hadoop dependencies > --- > > Key: PARQUET-

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-12-01 Thread ASF GitHub Bot (Jira)
ing the InputFile, OutputFile implementations from this pull request before the next release is out. If you need to fully drop Hadoop, this is still being worked on. > Parquet without Hadoop dependencies > --- > > Key: PARQUET-1822 >

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-12-01 Thread ASF GitHub Bot (Jira)
URL: https://github.com/apache/parquet-mr/pull/#issuecomment-1836333591 Our project needs this feature as well, is there a date for the next major release? > Parquet without Hadoop dependencies > --- > > K

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-11-03 Thread ASF GitHub Bot (Jira)
n 1.13.2 will not have it too. > > I am not sure what is the best time for the next major release. Could you please advise? @gszadovszky @shangxinli Thanks @wgtmac . This is noted. Guess we'll have to find an alternative solution for now while waiting for the next major rele

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-08-28 Thread ASF GitHub Bot (Jira)
: URL: https://github.com/apache/parquet-mr/pull/#issuecomment-1695559126 @amousavigourabi, I would suggest to join the mailing list dev@parquet.apache.org and start a discussion about a potential minor release in the near future. > Parquet without Hadoop depen

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-08-28 Thread ASF GitHub Bot (Jira)
entations in their own little Maven artifact in the meantime, as there does seem to be some demand. > Parquet without Hadoop dependencies > --- > > Key: PARQUET-1822 > URL: https://issues.apache.org/

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-08-27 Thread ASF GitHub Bot (Jira)
sion has been introduced in the last minor releases and we have the fix for it. > Parquet without Hadoop dependencies > --- > > Key: PARQUET-1822 > URL: https://issues.apache.org/jira/browse/PARQUET-1822 >

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-08-25 Thread ASF GitHub Bot (Jira)
.2 will not have it too. I am not sure what is the best time for the next major release. Could you please advise? @gszadovszky @shangxinli > Parquet without Hadoop dependencies > --- > > Key: PARQUET-1822 > U

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-08-25 Thread ASF GitHub Bot (Jira)
blished https://mvnrepository.com/artifact/org.apache.parquet/parquet-common. This PR seems to be merged to master already though, any reason why I am not seeing the changes in the pushed jar in maven? Thanks for the help. :) > Parquet without Ha

[jira] [Updated] (PARQUET-1822) Parquet without Hadoop dependencies

2023-07-03 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-1822: - Fix Version/s: 1.14.0 > Parquet without Hadoop dependenc

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-07-03 Thread ASF GitHub Bot (Jira)
ttps://github.com/apache/parquet-mr/pull/ > Parquet without Hadoop dependencies > --- > > Key: PARQUET-1822 > URL: https://issues.apache.org/jira/browse/PARQUET-1822 > Project: Parquet >

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-06-24 Thread ASF GitHub Bot (Jira)
URL: https://github.com/apache/parquet-mr/pull/#issuecomment-1605889798 @gszadovszky Do you want to take another pass? > Parquet without Hadoop dependencies > --- > > Key: PARQUET-1822 > URL: https://issues.apa

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-06-24 Thread ASF GitHub Bot (Jira)
n PR #: URL: https://github.com/apache/parquet-mr/pull/#issuecomment-1605749008 > @amousavigourabi Will you have any update on this? In crunch mode atm so it took a bit longer, but everything has been addressed now. > Parquet without Hadoop de

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-06-24 Thread ASF GitHub Bot (Jira)
void testEmptyArrayLocal() throws Exception { Review Comment: Good idea, done > Parquet without Hadoop dependencies > --- > > Key: PARQUET-1822 > URL: https://issues.apache.org/jira/browse/PARQUET-1822 >

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-06-22 Thread ASF GitHub Bot (Jira)
URL: https://github.com/apache/parquet-mr/pull/#issuecomment-1602917455 @amousavigourabi Will you have any update on this? > Parquet without Hadoop dependencies > --- > > Key: PARQUET-1822 > URL: https://iss

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-06-16 Thread ASF GitHub Bot (Jira)
th, openOption), buffer); +} + +@Override +public long getPos() { + return pos; +} + +@Override +public void write(int data) throws IOException { + pos++; Review Comment: `OutputStream#write` writes only one byte, even when calling this method with an `int`.

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-06-16 Thread ASF GitHub Bot (Jira)
th, openOption), buffer); +} + +@Override +public long getPos() { + return pos; Review Comment: No, it abstracts it away. We have to do this bookkeeping ourselves when using `BufferedOutputStream`. > Parquet without Hadoop dependencies > -

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-06-15 Thread ASF GitHub Bot (Jira)
ic void testEmptyMap() throws Exception { } } + @Test + public void testEmptyMapLocal() throws Exception { Review Comment: ditto > Parquet without Hadoop dependencies > --- > > Key: PARQUET-1822 > URL: https://i

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-06-15 Thread ASF GitHub Bot (Jira)
ile; + } + + @Override + public long getLength() throws IOException { +RandomAccessFile file = new RandomAccessFile(path.toFile(), "r"); +long length = file.length(); Review Comment: It is stored in a `long` after the first read now. >

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-06-15 Thread ASF GitHub Bot (Jira)
+ + @Override + public boolean supportsBlockSize() { +return true; + } + + @Override + public long defaultBlockSize() { +return 512; Review Comment: OK, that sounds good! > Parquet without Hadoop dependencies > --- > &g

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-06-15 Thread ASF GitHub Bot (Jira)
s signalled by the `true` return in `supportsBlockSize`) > Parquet without Hadoop dependencies > --- > > Key: PARQUET-1822 > URL: https://issues.apache.org/jira/browse/PARQUET-1822 > Project:

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-06-15 Thread ASF GitHub Bot (Jira)
ile) { +path = file; + } + + @Override + public PositionOutputStream create(long buffer) throws IOException { Review Comment: Could you please add a test case for `create` and `createOrOverwrite` to make sure they are as expected? > Parquet without Hadoop dependencies > -

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-06-15 Thread ASF GitHub Bot (Jira)
h(); Review Comment: Should it be cached in case of repeated read? Or would `path.toFile().length()` do the same thing? > Parquet without Hadoop dependencies > --- > > Key: PARQUET-1822 > URL: https://i

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-06-15 Thread ASF GitHub Bot (Jira)
: URL: https://github.com/apache/parquet-mr/pull/#issuecomment-1593132661 @amousavigourabi, please, also update the class comments of `LocalInputFile` and `LocalOutputFile` accordingly. > Parquet without Hadoop dependencies > --- > >

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-06-15 Thread ASF GitHub Bot (Jira)
es; +import java.nio.file.Path; + +/** + * {@code DiskOutputFile} is an implementation needed by Parquet to write + * data files to disk using {@link PositionOutputStream} instances. + */ +public class DiskOutputFile implements OutputFile { Review Comment: See comme

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-06-15 Thread ASF GitHub Bot (Jira)
URL: https://github.com/apache/parquet-mr/pull/#issuecomment-1592740248 @gszadovszky @shangxinli @Fokko Do you have time to take a look? This has been discussed in the mailing list: https://lists.apache.org/thread/d33757j99xqn63hrfz415sq60v3x9hmy > Parquet without Hadoop depen

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-06-13 Thread ASF GitHub Bot (Jira)
: URL: https://github.com/apache/parquet-mr/pull/#issuecomment-1588738776 And I greatly appreciate your work! > Parquet without Hadoop dependencies > --- > > Key: PARQUET-1822 > URL: https://issues.apa

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-06-13 Thread ASF GitHub Bot (Jira)
hadoop-client-runtime. But first, this PR is to allow users to avoid the bigger Hadoop issues more easily. > Parquet without Hadoop dependencies > --- > > Key: PARQUET-1822 > URL: https://issues.apache.org/jira/

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-06-13 Thread ASF GitHub Bot (Jira)
File) And the lower level classes are a jar of their own. Just dreaming... > Parquet without Hadoop dependencies > --- > > Key: PARQUET-1822 > URL: https://issues.apache.org/jira/browse/PARQUET-1822 >

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2023-06-12 Thread ASF GitHub Bot (Jira)
) 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in t

Re: Parquet without Hadoop dependencies

2023-06-11 Thread Gang Wu
t; users that oftentimes do not need them. > > It seems to me that properly decoupling the reader/writer code from this > > dependency will likely require breaking changes in the future as it is > > hardwired in a large part of the logic. Maybe something to consider for > the >

Re: Parquet without Hadoop dependencies

2023-06-10 Thread Atour Mousavi Gourabi
give you a timeline. Best regards, Atour From: Gang Wu Sent: Saturday, June 10, 2023 12:20 PM To: dev@parquet.apache.org Subject: Re: Parquet without Hadoop dependencies My main concern of breaking change is the effort to take for downstream projects to ado

Re: Parquet without Hadoop dependencies

2023-06-10 Thread Gang Wu
to consider for the > next major release? > > Best regards, > Atour > > From: Gang Wu > Sent: Friday, June 9, 2023 4:32 PM > To: dev@parquet.apache.org > Subject: Re: Parquet without Hadoop dependencies > > That may break many downstream projects. At least we cannot b

Re: Parquet without Hadoop dependencies

2023-06-09 Thread Atour Mousavi Gourabi
st regards, Atour From: Gang Wu Sent: Friday, June 9, 2023 4:32 PM To: dev@parquet.apache.org Subject: Re: Parquet without Hadoop dependencies That may break many downstream projects. At least we cannot break parquet-hadoop (and any existing module). If you can

Re: Parquet without Hadoop dependencies

2023-06-09 Thread Gang Wu
_ > From: Gang Wu > Sent: Friday, June 9, 2023 3:32 AM > To: dev@parquet.apache.org > Subject: Re: Parquet without Hadoop dependencies > > Hi Atour, > > Thanks for bringing this up! > > From what I observed from PARQUET-1822, I think it is a valid use

Re: Parquet without Hadoop dependencies

2023-06-09 Thread Atour Mousavi Gourabi
ay, June 9, 2023 3:32 AM To: dev@parquet.apache.org Subject: Re: Parquet without Hadoop dependencies Hi Atour, Thanks for bringing this up! From what I observed from PARQUET-1822, I think it is a valid use case to support parquet reading/writing without hadoop installed. The challenge i

Re: Parquet without Hadoop dependencies

2023-06-08 Thread Gang Wu
Hi Atour, Thanks for bringing this up! >From what I observed from PARQUET-1822, I think it is a valid use case to support parquet reading/writing without hadoop installed. The challenge is backward compatibility. It would be great if you can work on it. Best, Gang On Fri, Jun 9, 2023 at 12:24 A

Parquet without Hadoop dependencies

2023-06-08 Thread Atour Mousavi Gourabi
Dear all, The Java implementations of the Parquet readers and writers seem pretty tightly coupled to Hadoop (see: PARQUET-1822). For some projects, this can cause issues as it's an unnecessary and big dependency when you might just need to write to disk. Is there any appetite here for separatin

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2022-09-08 Thread Xinyu Zeng (Jira)
here is any update to this issue? > Parquet without Hadoop dependencies > --- > > Key: PARQUET-1822 > URL: https://issues.apache.org/jira/browse/PARQUET-1822 > Project: Parquet >

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2020-09-11 Thread mark juchems (Jira)
Y) .withWriteMode(Mode.OVERWRITE)//probably not good for prod. (overwrites files). .build(); for (Map row : theData) { StopWatch stopWatch = StopWatch.createStarted(); final GenericRecord record = new GenericData.Record(avroSchema); row.forEach((k, v) -> { record.put(k, v); }); writer.write(

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2020-09-05 Thread Ben Watson (Jira)
ion|https://stackoverflow.com/questions/63655421/writing-parquet-avro-genericrecord-to-json-while-maintaining-logicaltypes]). > Parquet without Hadoop dependencies > --- > > Key: PARQUET-1822 > URL: https://issues.apache.org/

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2020-09-02 Thread Gabor Szadovszky (Jira)
the current community activity I wouldn't say parquet 2.0 is feasible any time soon. :( > Parquet without Hadoop dependencies > --- > > Key: PARQUET-1822 > URL: https://issues.apache.org/jira/browse/PARQUET-18

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2020-09-01 Thread David Mollitor (Jira)
Parquet without Hadoop dependencies > --- > > Key: PARQUET-1822 > URL: https://issues.apache.org/jira/browse/PARQUET-1822 > Project: Parquet > Issue Type: Improvement > Components: parque

[jira] [Commented] (PARQUET-1822) Parquet without Hadoop dependencies

2020-09-01 Thread Jira
ame issue. We would like to write to parquet for the convenience and the obvious benefits of the format but it just seems impossible to do without a lot of overhead, including a Hadoop installed? > Parquet without Hadoop dependencies > --- > >

[jira] [Created] (PARQUET-1822) Parquet without Hadoop dependencies

2020-03-19 Thread mark juchems (Jira)
mark juchems created PARQUET-1822: - Summary: Parquet without Hadoop dependencies Key: PARQUET-1822 URL: https://issues.apache.org/jira/browse/PARQUET-1822 Project: Parquet Issue Type