On Mon, Jan 13, 2020 at 12:45 PM Bo Shi <[email protected]> wrote:

> Python 3.7.5
> Beam 2.17
>
> I've used both WriteToBigquery and BigQueryBatchFileLoads successfully
> using DirectRunner.  I've boiled the issue down to a small
> reproducible case (attached).  The following works great:
>
> $ pipenv run python repro-direct.py \
>   --project=CHANGEME \
>   --runner=DirectRunner \
>   --temp_location=gs://beam-to-bq--tmp/bshi/tmp \
>   --staging_location=gs://beam-to-bq--tmp/bshi/stg \
>   --experiment=beam_fn_api \   # <<<< NB! (I assume this is a no-op w/
> Direct
>   --save_main_function
>
> $ bq --project=CHANGEME query 'select * from bshi_test.direct'
> Waiting on bqjob_r53488a6b9ed6cb20_0000016fa05fb570_1 ... (0s) Current
> status: DONE
> +----------+----------+
> | some_str | some_int |
> +----------+----------+
> | hello    |        1 |
> | world    |        2 |
> +----------+----------+
>
> When changing to --runner=DataflowRunner, I get a cryptic Java error
> with no other useful Python feedback (see class_cast_exception.txt)
>
> apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException:
> Dataflow pipeline failed. State: FAILED, Error:
> java.util.concurrent.ExecutionException: java.lang.ClassCastException:
> [B cannot be cast to org.apache.beam.sdk.values.KV
>
> On a hunch, I removed --experiment=beam_fn_api and DataflowRunner
> works fine.  Am I doing something against the intent of the SDK?
>

Can you try adding "--experiment=use_beam_bq_sink"  ?
Also, out of curiosity, why are you setting "--experiment=beam_fn_api" ?



> Happy to share access to the GCP project if that would assist in debugging.
>

Reply via email to