Thanks all for the information!
Eleanore
On Wed, Apr 29, 2020 at 6:36 PM Ankur Goenka wrote:
> Beam does support parallelism for the job which applies to all the
> transforms in the job when executing on Flink using the "--parallelism"
> flag.
>
> From the usecase you mentioned, Kafka read oper
Beam does support parallelism for the job which applies to all the
transforms in the job when executing on Flink using the "--parallelism"
flag.
>From the usecase you mentioned, Kafka read operations will be over
parallelised but it should be ok as they will only have a small amount of
memory impa
+dev +Brian Hulette +Reuven Lax
On Wed, Apr 29, 2020 at 4:21 AM Paolo Tomeo wrote:
> Hi all,
>
> I think the method AvroUtils.toBeamSchema has a not expected side effect.
> I found out that, if you invoke it and then you run a pipeline of
> GenericRecords containing a timestamp (l tried with
Beam doesn't expose such a thing directly but the FlinkRunner may be able
to take some pipeline options to configure this.
On Wed, Apr 29, 2020 at 5:51 PM Eleanore Jin wrote:
> Hi Kyle,
>
> I am using Flink Runner (v1.8.2)
>
> Thanks!
> Eleanore
>
> On Wed, Apr 29, 2020 at 10:33 AM Kyle Weaver
Hi Kyle,
I am using Flink Runner (v1.8.2)
Thanks!
Eleanore
On Wed, Apr 29, 2020 at 10:33 AM Kyle Weaver wrote:
> Which runner are you using?
>
> On Wed, Apr 29, 2020 at 1:32 PM Eleanore Jin
> wrote:
>
>> Hi all,
>>
>> I just wonder can Beam allow to set parallelism for each operator
>> (PTran
Which runner are you using?
On Wed, Apr 29, 2020 at 1:32 PM Eleanore Jin wrote:
> Hi all,
>
> I just wonder can Beam allow to set parallelism for each operator
> (PTransform) separately? Flink provides such feature.
>
> The usecase I have is the source is kafka topics, which has less
> partition
Hi all,
I just wonder can Beam allow to set parallelism for each operator
(PTransform) separately? Flink provides such feature.
The usecase I have is the source is kafka topics, which has less
partitions, while we have heavy PTransform and would like to scale it with
more parallelism.
Thanks a l
> This seems to have worked, as the output file is created on the host
system. However the pipeline silently fails, and the output file remains
empty.
Have you checked the SDK container logs? They are most likely to contain
relevant failure information.
> I don't know if this is a result of me re
@Sruthi: You'll have to implement your own factory. If you include the
mentioned dependency, e.g. if you use Gradle:
runtimeOnly
"org.apache.flink:flink-statebackend-rocksdb_2.11:$flink_version"
Then create the factory:
options.setStateBackend(o -> new RocksDBStateBackend("file://..."));
Al
Many thanks
On Wed, Apr 29, 2020, 5:56 PM André Rocha Silva <
a.si...@portaltelemedicina.com.br> wrote:
> Hey
>
> You simply use the output PCollection from one to many pipes as you want.
> E.g.:
> p = beam.Pipeline(options=pipeline_options)
>
> data = (
> p
> | 'Get data' >> beam.io.ReadFromText
Hey
You simply use the output PCollection from one to many pipes as you want.
E.g.:
p = beam.Pipeline(options=pipeline_options)
data = (
p
| 'Get data' >> beam.io.ReadFromText(user_options.input_file)
)
output1 = (
data
| 'Transform 1' >> beam.ParDo(trasnf1())
| 'Write transform 1 results' >> be
Yes, this is not a problem. We do it regularly in our streaming pipelines
as a single stream doesn't have enough load for ruining on a Dataflow. So
we run different streams in parallel in a single Beam pipelines.
Data wise the streams have nothing to do with each other, but
transformation wise the
Hi all
Is it possible in beam to create a pipeline where two tasks can run in
parallel as opposed to sequential,?
Simple usecase would be step 3 will generate some data out of which I
generate eg 3 completely different outcomes. ( Eg 3 different files stored
in a bucket)
Thanks
Marco
Hi all,
We're working on a project where we're limited to one big development
machine for now. We want to start developing data processing pipelines in
Python, which should eventually be ported to a currently unknown setup on a
separate cluster or cloud, so we went with Beam for its portability.
Hi Max,
The ingestion is covering EOD processing from a Kafka source, so we get a lot
of data from 5pm-8pm and outside of that time we get no data. The checkpoint is
just storing the Kafka offset for restart.
Sounds like during the period of no data there could be an open buffer. I would
have
Hi all,
I think the method AvroUtils.toBeamSchema has a not expected side effect.
I found out that, if you invoke it and then you run a pipeline of
GenericRecords containing a timestamp (l tried with logical-type
timestamp-millis), Beam converts such timestamp from long to
org.joda.time.DateTime.
16 matches
Mail list logo