I think you are right, since "writer.close()” contains a business logic, it
must be moved to @FinishBundle. The same thing about DeleteFn.
I’ll create a Jira for that.
> On 25 Mar 2021, at 00:49, Kenneth Knowles wrote:
>
> Alex's idea sounds good and like what Vincent maybe implemented. I am j
The reason I was checking out the code is that sometimes a natural thing to
output would be a summary of what was written. So each chunk of writes and
the final chunk written in @FinishBundle. This is, for example, what SQL
engines do (output # of rows written).
You could output both the summary a
Yeah, the entire input is not always what is needed, and can generally be
achieved via
input -> wait(side input of write) -> do something with the input
Of course one could also do
entire_input_as_output_of_wait -> MapTo(KV.of(null, null)) ->
CombineGlobally(TrivialCombineFn)
to reduce this
I believe that the Wait transform turns this output into a side input, so
outputting the input PCollection might be problematic.
On Wed, Mar 24, 2021 at 4:49 PM Kenneth Knowles wrote:
> Alex's idea sounds good and like what Vincent maybe implemented. I am just
> reading really quickly so sorry i
On Wed, Mar 24, 2021 at 4:49 PM Kenneth Knowles wrote:
> Alex's idea sounds good and like what Vincent maybe implemented. I am just
> reading really quickly so sorry if I missed something...
>
> Checking out the code for the WriteFn I see a big problem:
>
> @Setup
> public void setup() {
Alex's idea sounds good and like what Vincent maybe implemented. I am just
reading really quickly so sorry if I missed something...
Checking out the code for the WriteFn I see a big problem:
@Setup
public void setup() {
writer = new Mutator<>(spec, Mapper::saveAsync, "writes");
On Wed, Mar 24, 2021 at 2:36 PM Ismaël Mejía wrote:
> +dev
>
> Since we all agree that we should return something different than
> PDone the real question is what should we return.
>
My proposal is that one returns a PCollection that consists, internally,
of something contentless like nulls. Thi
+dev
Since we all agree that we should return something different than
PDone the real question is what should we return.
As a reminder we had a pretty interesting discussion about this
already in the past but uniformization of our return values has not
happened.
This thread is worth reading for Vi
I thought that was said about returning a PCollection of write results as it’s
done in other IOs (as I mentioned as examples) that have _additional_ write
methods, like “withWriteResults()” etc, that return PTransform<…,
PCollection>.
In this case, we keep backwards compatibility and just add n
Returning PDone is an anti-pattern that should be avoided, but changing it
now would be backwards incompatible. PRs to add non-PDone returning
variants (probably as another option to the builders) that compose well
with Wait, etc. would be welcome.
On Wed, Mar 24, 2021 at 11:14 AM Alexey Romanenko
In this way, I think “Wait” PTransform should work for you but, as it was
mentioned before, it doesn’t work with PDone, only with PCollection as a
signal.
Since you already adjusted your own writer for that, it would be great to
contribute it back to Beam in the way as it was done for other IO
This has been a common problem I've run into with lots of built-in IOs,
I've generally submitted PRs for them to add support for emitting something
once writed are completed.
On Wed, Mar 24, 2021 at 1:04 PM Vincent Marquez
wrote:
>
> *~Vincent*
>
>
> On Wed, Mar 24, 2021 at 10:01 AM Reuven Lax
On Wed, Mar 24, 2021 at 10:04 AM Vincent Marquez
wrote:
>
> *~Vincent*
>
>
> On Wed, Mar 24, 2021 at 10:01 AM Reuven Lax wrote:
>
>> Does that work if cassandra returns a PDone?
>>
>
> No, it doesn't work. I wrote my own CassandraIO.Write that is a
> PTransform, PCollection> instead.
>
> I'm ju
*~Vincent*
On Wed, Mar 24, 2021 at 10:01 AM Reuven Lax wrote:
> Does that work if cassandra returns a PDone?
>
No, it doesn't work. I wrote my own CassandraIO.Write that is a
PTransform, PCollection> instead.
I'm just asking if there's a better way of doing this because I'm having to
do this
No, it only needs to ensure that one record seen on Pubsub has successfully
written to a database. So "record by record" is fine, or even "bundle".
*~Vincent*
On Wed, Mar 24, 2021 at 9:49 AM Alexey Romanenko
wrote:
> Do you want to wait for ALL records are written for Cassandra and then
> wri
Does that work if cassandra returns a PDone?
On Wed, Mar 24, 2021 at 10:00 AM Chamikara Jayalath
wrote:
> If you want to wait for all records are written (per window) to Cassandra
> before writing that window to PubSub, you should be able to use the Wait
> transform:
> https://github.com/apache/
If you want to wait for all records are written (per window) to Cassandra
before writing that window to PubSub, you should be able to use the Wait
transform:
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Wait.java
Thanks,
Cham
On Wed, Mar 2
Do you want to wait for ALL records are written for Cassandra and then write
all successfully written records to PubSub or it should be performed "record by
record"?
> On 24 Mar 2021, at 04:58, Vincent Marquez wrote:
>
> I have a common use case where my pipeline looks like this:
> CassandraIO
I have a common use case where my pipeline looks like this:
CassandraIO.readAll -> Aggregate -> CassandraIO.write -> PubSubIO.write
I do NOT want my pipeline to look like the following:
CassandraIO.readAll -> Aggregate -> CassandraIO.write
19 matches
Mail list logo