Yes.

Due to below error, Flink bulk writer never close the part file and keep on 
creating new part file continuously. Is flink not handling exceptions like 
below?

From: Feng Jin <jinfeng1...@gmail.com>
Sent: 20 September 2023 05:54 PM
To: Kamal Mittal <kamal.mit...@ericsson.com>
Cc: user@flink.apache.org
Subject: Re: About Flink parquet format

Hi

I tested it on my side and also got the same error. This should be a limitation 
of Parquet.

```
java.lang.IllegalArgumentException: maxCapacityHint can't be less than 
initialSlabSize 64 1
    at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:57) 
~[flink-sql-parquet-1.17.1.jar:1.17.1]
    at 
org.apache.parquet.bytes.CapacityByteArrayOutputStream.<init>(CapacityByteArrayOutputStream.java:153)
 ~[flink-sql-parquet-1.17.1.jar:1.17.1]
    at 
org.apache.parquet.column.values.rle.RunLengthBitPackingHybridEncoder.<init>(RunLengthBitPackingHybridEncoder.jav
```


So I think the current minimum page size that can be set for parquet is 64B.

Best,
Feng


On Tue, Sep 19, 2023 at 6:06 PM Kamal Mittal 
<kamal.mit...@ericsson.com<mailto:kamal.mit...@ericsson.com>> wrote:
Hello,

If given page size as 1 byte then encountered exception as  - ‘maxCapacityHint 
can't be less than initialSlabSize %d %d’.

This is coming from class CapacityByteArrayOutputStream and contained in 
parquet-common library.

Rgds,
Kamal

From: Feng Jin <jinfeng1...@gmail.com<mailto:jinfeng1...@gmail.com>>
Sent: 19 September 2023 01:01 PM
To: Kamal Mittal <kamal.mit...@ericsson.com<mailto:kamal.mit...@ericsson.com>>
Cc: user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: About Flink parquet format

Hi Kamal

What exception did you encounter? I have tested it locally and it works fine.


Best,
Feng


On Mon, Sep 18, 2023 at 11:04 AM Kamal Mittal 
<kamal.mit...@ericsson.com<mailto:kamal.mit...@ericsson.com>> wrote:
Hello,

Checkpointing is enabled and works fine if configured parquet page size is at 
least 64 bytes as otherwise there is exception thrown at back-end.

Looks to be an issue which is not handled by file sink bulk writer?

Rgds,
Kamal

From: Feng Jin <jinfeng1...@gmail.com<mailto:jinfeng1...@gmail.com>>
Sent: 15 September 2023 04:14 PM
To: Kamal Mittal <kamal.mit...@ericsson.com<mailto:kamal.mit...@ericsson.com>>
Cc: user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: About Flink parquet format

Hi Kamal

Check if the checkpoint of the task is enabled and triggered correctly. By 
default, write parquet files will roll a new file when checkpointing.


Best,
Feng

On Thu, Sep 14, 2023 at 7:27 PM Kamal Mittal via user 
<user@flink.apache.org<mailto:user@flink.apache.org>> wrote:
Hello,

Tried parquet file creation with file sink bulk writer.

If configured parquet page size as low as 1 byte (allowed configuration) then 
flink keeps on creating multiple ‘in-progress’ state files and with content 
only as ‘PAR1’ and never closed the file.

I want to know what is the reason of not closing the file and creating multiple 
‘in-progress’ part files or why no error is given if applicable?

Rgds,
Kamal

Reply via email to