URL: https://github.com/apache/parquet-mr/pull/#issuecomment-1838584976
@amousavigourabi , that's actually what I did and it's working for us now.
Thanks
> Parquet without Hadoop dependencies
> ---
>
> Key: PARQUET-
ing the InputFile, OutputFile implementations
from this pull request before the next release is out. If you need to fully
drop Hadoop, this is still being worked on.
> Parquet without Hadoop dependencies
> ---
>
> Key: PARQUET-1822
>
URL: https://github.com/apache/parquet-mr/pull/#issuecomment-1836333591
Our project needs this feature as well, is there a date for the next major
release?
> Parquet without Hadoop dependencies
> ---
>
> K
n 1.13.2 will not have it too.
>
> I am not sure what is the best time for the next major release. Could you
please advise? @gszadovszky @shangxinli
Thanks @wgtmac . This is noted. Guess we'll have to find an alternative
solution for now while waiting for the next major rele
:
URL: https://github.com/apache/parquet-mr/pull/#issuecomment-1695559126
@amousavigourabi, I would suggest to join the mailing list
dev@parquet.apache.org and start a discussion about a potential minor release
in the near future.
> Parquet without Hadoop depen
entations in their own little
Maven artifact in the meantime, as there does seem to be some demand.
> Parquet without Hadoop dependencies
> ---
>
> Key: PARQUET-1822
> URL: https://issues.apache.org/
sion
has been introduced in the last minor releases and we have the fix for it.
> Parquet without Hadoop dependencies
> ---
>
> Key: PARQUET-1822
> URL: https://issues.apache.org/jira/browse/PARQUET-1822
>
.2 will not have it too.
I am not sure what is the best time for the next major release. Could you
please advise? @gszadovszky @shangxinli
> Parquet without Hadoop dependencies
> ---
>
> Key: PARQUET-1822
> U
blished
https://mvnrepository.com/artifact/org.apache.parquet/parquet-common.
This PR seems to be merged to master already though, any reason why I am not
seeing the changes in the pushed jar in maven?
Thanks for the help. :)
> Parquet without Ha
[
https://issues.apache.org/jira/browse/PARQUET-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gang Wu updated PARQUET-1822:
-
Fix Version/s: 1.14.0
> Parquet without Hadoop dependenc
ttps://github.com/apache/parquet-mr/pull/
> Parquet without Hadoop dependencies
> ---
>
> Key: PARQUET-1822
> URL: https://issues.apache.org/jira/browse/PARQUET-1822
> Project: Parquet
>
URL: https://github.com/apache/parquet-mr/pull/#issuecomment-1605889798
@gszadovszky Do you want to take another pass?
> Parquet without Hadoop dependencies
> ---
>
> Key: PARQUET-1822
> URL: https://issues.apa
n PR #:
URL: https://github.com/apache/parquet-mr/pull/#issuecomment-1605749008
> @amousavigourabi Will you have any update on this?
In crunch mode atm so it took a bit longer, but everything has been
addressed now.
> Parquet without Hadoop de
void testEmptyArrayLocal() throws Exception {
Review Comment:
Good idea, done
> Parquet without Hadoop dependencies
> ---
>
> Key: PARQUET-1822
> URL: https://issues.apache.org/jira/browse/PARQUET-1822
>
URL: https://github.com/apache/parquet-mr/pull/#issuecomment-1602917455
@amousavigourabi Will you have any update on this?
> Parquet without Hadoop dependencies
> ---
>
> Key: PARQUET-1822
> URL: https://iss
th,
openOption), buffer);
+}
+
+@Override
+public long getPos() {
+ return pos;
+}
+
+@Override
+public void write(int data) throws IOException {
+ pos++;
Review Comment:
`OutputStream#write` writes only one byte, even when calling this method
with an `int`.
th,
openOption), buffer);
+}
+
+@Override
+public long getPos() {
+ return pos;
Review Comment:
No, it abstracts it away. We have to do this bookkeeping ourselves when
using `BufferedOutputStream`.
> Parquet without Hadoop dependencies
> -
ic void testEmptyMap() throws Exception {
}
}
+ @Test
+ public void testEmptyMapLocal() throws Exception {
Review Comment:
ditto
> Parquet without Hadoop dependencies
> ---
>
> Key: PARQUET-1822
> URL: https://i
ile;
+ }
+
+ @Override
+ public long getLength() throws IOException {
+RandomAccessFile file = new RandomAccessFile(path.toFile(), "r");
+long length = file.length();
Review Comment:
It is stored in a `long` after the first read now.
>
+
+ @Override
+ public boolean supportsBlockSize() {
+return true;
+ }
+
+ @Override
+ public long defaultBlockSize() {
+return 512;
Review Comment:
OK, that sounds good!
> Parquet without Hadoop dependencies
> ---
>
&g
s signalled by the `true`
return in `supportsBlockSize`)
> Parquet without Hadoop dependencies
> ---
>
> Key: PARQUET-1822
> URL: https://issues.apache.org/jira/browse/PARQUET-1822
> Project:
ile) {
+path = file;
+ }
+
+ @Override
+ public PositionOutputStream create(long buffer) throws IOException {
Review Comment:
Could you please add a test case for `create` and `createOrOverwrite` to
make sure they are as expected?
> Parquet without Hadoop dependencies
> -
h();
Review Comment:
Should it be cached in case of repeated read?
Or would `path.toFile().length()` do the same thing?
> Parquet without Hadoop dependencies
> ---
>
> Key: PARQUET-1822
> URL: https://i
:
URL: https://github.com/apache/parquet-mr/pull/#issuecomment-1593132661
@amousavigourabi, please, also update the class comments of `LocalInputFile`
and `LocalOutputFile` accordingly.
> Parquet without Hadoop dependencies
> ---
>
>
es;
+import java.nio.file.Path;
+
+/**
+ * {@code DiskOutputFile} is an implementation needed by Parquet to write
+ * data files to disk using {@link PositionOutputStream} instances.
+ */
+public class DiskOutputFile implements OutputFile {
Review Comment:
See comme
URL: https://github.com/apache/parquet-mr/pull/#issuecomment-1592740248
@gszadovszky @shangxinli @Fokko Do you have time to take a look? This has
been discussed in the mailing list:
https://lists.apache.org/thread/d33757j99xqn63hrfz415sq60v3x9hmy
> Parquet without Hadoop depen
:
URL: https://github.com/apache/parquet-mr/pull/#issuecomment-1588738776
And I greatly appreciate your work!
> Parquet without Hadoop dependencies
> ---
>
> Key: PARQUET-1822
> URL: https://issues.apa
hadoop-client-runtime. But first, this PR is
to allow users to avoid the bigger Hadoop issues more easily.
> Parquet without Hadoop dependencies
> ---
>
> Key: PARQUET-1822
> URL: https://issues.apache.org/jira/
File)
And the lower level classes are a jar of their own.
Just dreaming...
> Parquet without Hadoop dependencies
> ---
>
> Key: PARQUET-1822
> URL: https://issues.apache.org/jira/browse/PARQUET-1822
>
)
1. Body wraps at 72 characters
1. Body explains "what" and "why", not "how"
### Documentation
- [x] In case of new functionality, my PR adds documentation that describes
how to use it.
- All the public functions and the classes in t
t; users that oftentimes do not need them.
> > It seems to me that properly decoupling the reader/writer code from this
> > dependency will likely require breaking changes in the future as it is
> > hardwired in a large part of the logic. Maybe something to consider for
> the
>
give you a timeline.
Best regards,
Atour
From: Gang Wu
Sent: Saturday, June 10, 2023 12:20 PM
To: dev@parquet.apache.org
Subject: Re: Parquet without Hadoop dependencies
My main concern of breaking change is the effort to take for downstream
projects to ado
to consider for the
> next major release?
>
> Best regards,
> Atour
>
> From: Gang Wu
> Sent: Friday, June 9, 2023 4:32 PM
> To: dev@parquet.apache.org
> Subject: Re: Parquet without Hadoop dependencies
>
> That may break many downstream projects. At least we cannot b
st regards,
Atour
From: Gang Wu
Sent: Friday, June 9, 2023 4:32 PM
To: dev@parquet.apache.org
Subject: Re: Parquet without Hadoop dependencies
That may break many downstream projects. At least we cannot break
parquet-hadoop (and any existing module). If you can
_
> From: Gang Wu
> Sent: Friday, June 9, 2023 3:32 AM
> To: dev@parquet.apache.org
> Subject: Re: Parquet without Hadoop dependencies
>
> Hi Atour,
>
> Thanks for bringing this up!
>
> From what I observed from PARQUET-1822, I think it is a valid use
ay, June 9, 2023 3:32 AM
To: dev@parquet.apache.org
Subject: Re: Parquet without Hadoop dependencies
Hi Atour,
Thanks for bringing this up!
From what I observed from PARQUET-1822, I think it is a valid use
case to support parquet reading/writing without hadoop installed.
The challenge i
Hi Atour,
Thanks for bringing this up!
>From what I observed from PARQUET-1822, I think it is a valid use
case to support parquet reading/writing without hadoop installed.
The challenge is backward compatibility. It would be great if you can
work on it.
Best,
Gang
On Fri, Jun 9, 2023 at 12:24 A
Dear all,
The Java implementations of the Parquet readers and writers seem pretty tightly
coupled to Hadoop (see: PARQUET-1822). For some projects, this can cause issues
as it's an unnecessary and big dependency when you might just need to write to
disk. Is there any appetite here for separatin
here is any update to this issue?
> Parquet without Hadoop dependencies
> ---
>
> Key: PARQUET-1822
> URL: https://issues.apache.org/jira/browse/PARQUET-1822
> Project: Parquet
>
Y)
.withWriteMode(Mode.OVERWRITE)//probably not good for prod. (overwrites files).
.build();
for (Map row : theData) {
StopWatch stopWatch = StopWatch.createStarted();
final GenericRecord record = new GenericData.Record(avroSchema);
row.forEach((k, v) -> {
record.put(k, v);
});
writer.write(
ion|https://stackoverflow.com/questions/63655421/writing-parquet-avro-genericrecord-to-json-while-maintaining-logicaltypes]).
> Parquet without Hadoop dependencies
> ---
>
> Key: PARQUET-1822
> URL: https://issues.apache.org/
the current community activity I wouldn't say parquet
2.0 is feasible any time soon. :(
> Parquet without Hadoop dependencies
> ---
>
> Key: PARQUET-1822
> URL: https://issues.apache.org/jira/browse/PARQUET-18
Parquet without Hadoop dependencies
> ---
>
> Key: PARQUET-1822
> URL: https://issues.apache.org/jira/browse/PARQUET-1822
> Project: Parquet
> Issue Type: Improvement
> Components: parque
ame issue. We would like to write to parquet for the
convenience and the obvious benefits of the format but it just seems impossible
to do without a lot of overhead, including a Hadoop installed?
> Parquet without Hadoop dependencies
> ---
>
>
mark juchems created PARQUET-1822:
-
Summary: Parquet without Hadoop dependencies
Key: PARQUET-1822
URL: https://issues.apache.org/jira/browse/PARQUET-1822
Project: Parquet
Issue Type
45 matches
Mail list logo