[ 
https://issues.apache.org/jira/browse/BEAM-6115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kasia Kucharczyk resolved BEAM-6115.
------------------------------------
       Resolution: Fixed
    Fix Version/s: 2.9.0

> SyntheticSource bundle size parameter sometimes is casted to invalid type
> -------------------------------------------------------------------------
>
>                 Key: BEAM-6115
>                 URL: https://issues.apache.org/jira/browse/BEAM-6115
>             Project: Beam
>          Issue Type: Bug
>          Components: testing
>            Reporter: Kasia Kucharczyk
>            Assignee: Kasia Kucharczyk
>            Priority: Minor
>             Fix For: 2.9.0
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> The parameter {code}bundle_size_in_elements{code} in SyntheticSources in 
> Python in specific situations becomes `float` instead of `int` what causes 
> failure on Dataflow:
> {code:java}
> Traceback (most recent call last):
> File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", 
> line 642, in do_work
> work_executor.execute()
> File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", 
> line 198, in execute
> self._split_task)
> File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", 
> line 206, in _perform_source_split_considering_api_limits
> desired_bundle_size)
> File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", 
> line 243, in _perform_source_split
> for split in source.split(desired_bundle_size):
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/testing/synthetic_pipeline.py",
>  line 222, in split
> bundle_size_in_elements):
> TypeError: range() integer step argument expected, got float.{code}
>  
> Debugging showed that on Dataflow following line causes this problem (line 
> 213-214):
> {code:python}max(1, self._num_records / 
> self._initial_splitting_num_bundles){code}.
> In line 218, there is:
> {code:python}math.floor(math.sqrt(self._num_records)){code} which also 
> returns float.
> In 222 line _bundle_size_in_elements_ is used to _range_ method which 
> requires _int_.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to