You should be able to configure the number of partition like this:

https://github.com/GoogleCloudPlatform/dataflow-cookbook/blob/main/Java/src/main/java/jdbc/ReadPartitionsJdbc.java#L132

The code to  auto infer the number of partitions seems to be unreachable (I
haven't checked this carefully). More details are here:
https://issues.apache.org/jira/browse/BEAM-12456

On Fri, May 31, 2024 at 7:40 AM Vardhan Thigle via user <
user@beam.apache.org> wrote:

> Hi Beam Experts,I have a small query about `JdbcIO#readWithPartitions`
>
>
> ContextJdbcIO#readWithPartitions seems to always default
> to 200 partitions (DEFAULT_NUM_PARTITIONS). This is set by default when the
> object is constructed here
> <https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L362>
> There seems to be no way to override this with a null value. Hence it
> seems that, the code
> <https://github.com/apache/beam/blob/b50ad0fe8fc168eaded62efb08f19cf2aea341e2/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L1398>
>  that
> checks the null value and tries to auto infer the number of partitions
> based on the never runs.I am trying to use this for reading a tall table
> of unknown size, and the pipeline always defaults to 200 if the value is
> not set.  The default of 200 seems to fall short as worker goes out of
> memory in reshuffle stage. Running with higher number of partitions like 4K
> helps for my test setup.Since the size is not known at the time of
> implementing the pipeline, the auto-inference might help
> setting maxPartitions to a reasonable value as per the heuristic decided by
> Beam code.
> Request for help
>
> Could you please clarify a few doubts around this?
>
>    1. Is this behavior intentional?
>    2. Could you please explain the rationale behind the heuristic in L1398
>    
> <https://github.com/apache/beam/blob/b50ad0fe8fc168eaded62efb08f19cf2aea341e2/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L1398>
>     and DEFAULT_NUM_PARTITIONS=200?
>
>
> I have also raised this as issues/31467 incase it needs any changes in
> the implementation.
>
>
> Regards and Thanks,
> Vardhan Thigle,
> +919535346204 <+91%2095353%2046204>
>

Reply via email to