On Mon, May 27, 2019 at 4:05 PM Reza Rokni wrote:
> "Many APIs that have been in place for years and are used by most Beam
> users are still marked Experimental."
>
> Should there be a formal process in place to start 'graduating' features
> out of @Experimental? Perhaps even target an up coming
On Mon, May 27, 2019 at 3:44 PM Reuven Lax wrote:
> We generally use Experimental for two different things, which leads to
> confusion.
> 1. Features that work stably, but where we think we might still make
> some changes to the API.
> 2. New features that we think might not yet be stable.
>
Hi Flink experts,
I am getting ready to push a PR around a utility class for timeseries join
left.timestamp match to closest right.timestamp where right.timestamp <=
left.timestamp.
It makes very heavy use of Event.Time timers and has to do some manual DoFn
cache work to get around some O(heavy)
Could we first figure out the process (where to push, how to push,
permissions needed, how to validate etc.) as part of the snapshots and
update the release guide based on that?
On Tue, May 28, 2019 at 2:43 AM Robert Bradshaw wrote:
> In the future (read, next release) the SDK will likely have r
PR link
https://github.com/apache/beam/pull/8416
On Tue, May 28, 2019 at 4:25 PM Alex Amato wrote:
> I'm had a lingering PR for some about a month now. I'm trying to get this
> passing presubmits and submitted, but I don't have enough output from the
> failing task to debug this.
>
> I think its
I'm had a lingering PR for some about a month now. I'm trying to get this
passing presubmits and submitted, but I don't have enough output from the
failing task to debug this.
I think its from a wordcount timeout, but I don't know how to get more
info. I don't think its a dataflow job with any lin
Open cherry pick PRs for spark runner
https://github.com/apache/beam/pull/8705
https://github.com/apache/beam/pull/8706
On Tue, May 28, 2019 at 3:42 PM Valentyn Tymofieiev
wrote:
> Yes, looking into that.
>
> On Tue, May 28, 2019 at 3:37 PM Ankur Goenka wrote:
>
>> Valentyn, Can you please send
Yes, looking into that.
On Tue, May 28, 2019 at 3:37 PM Ankur Goenka wrote:
> Valentyn, Can you please send the cherry pick PR for
> https://issues.apache.org/jira/browse/BEAM-7439
>
> On Tue, May 28, 2019 at 3:04 PM Ankur Goenka wrote:
>
>> Sure, I will cherry pick those PRs.
>>
>> On Tue, May
Hi All,
In the meanwhile Please validate RC1 to catch anyother issues.
Thanks,
Ankur
On Tue, May 28, 2019 at 3:37 PM Ankur Goenka wrote:
> Valentyn, Can you please send the cherry pick PR for
> https://issues.apache.org/jira/browse/BEAM-7439
>
> On Tue, May 28, 2019 at 3:04 PM Ankur Goenka wr
Valentyn, Can you please send the cherry pick PR for
https://issues.apache.org/jira/browse/BEAM-7439
On Tue, May 28, 2019 at 3:04 PM Ankur Goenka wrote:
> Sure, I will cherry pick those PRs.
>
> On Tue, May 28, 2019 at 2:19 PM Kyle Weaver wrote:
>
>> Hi Ankur,
>>
>> It's not a blocker, but I'd
Sure, I will cherry pick those PRs.
On Tue, May 28, 2019 at 2:19 PM Kyle Weaver wrote:
> Hi Ankur,
>
> It's not a blocker, but I'd like to see
> https://github.com/apache/beam/pull/8558 and
> https://github.com/apache/beam/pull/8569 be included so TFX examples can
> be run without errors on the
Hi Ankur,
It's not a blocker, but I'd like to see
https://github.com/apache/beam/pull/8558 and
https://github.com/apache/beam/pull/8569 be included so TFX examples can be
run without errors on the 2.13.0 Spark runner (
https://github.com/tensorflow/tfx/pull/84).
Kyle Weaver | Software Engineer |
I am in the same boat with Robert, I am in favor of autoformatters but I am
not familiar with this one. My concerns are:
- The product is clearly marked as beta with a big warning.
- It looks like mostly a single person project. For the same reason I also
strongly prefer not using a fork for a spec
Thanks for the validation.
I have marked fixed version of
https://issues.apache.org/jira/browse/BEAM-7406
https://issues.apache.org/jira/browse/BEAM-6380 to be 2.13.0 and will
cherry pick the associated commits to the jira.
On Tue, May 28, 2019 at 11:19 AM Lukasz Cwik wrote:
> I would also sug
I would also suggest to get https://github.com/apache/beam/pull/8668 in to
2.13.0 since it fixes a logging setup issue on Dataflow (BEAM-7406).
On Tue, May 28, 2019 at 10:22 AM Chamikara Jayalath
wrote:
> I would also like to get https://github.com/apache/beam/pull/8661 in to
> 2.13.0 that fixes
Alexey,
sorry for the confusion then. Let me explain this better once more:
1. IO tests:
In IO tests we do not use the Synthetic Sources that generate the records.
We use a GenerateSequence class that generates a sequence of long values
and then map it to some records to finally write that to a
I like the concept of expressing type coercion as a wrapper coder which
says that this language treats this type as Foo. This seems to be useful in
general for cross language pipelines since it is much more likely that two
languages will understand an encoding but may want to express the type
withi
I would also like to get https://github.com/apache/beam/pull/8661 in to
2.13.0 that fixes https://issues.apache.org/jira/browse/BEAM-6380. It's not
a new issue but has affected a number of users.
- Cham
On Tue, May 28, 2019 at 9:31 AM Valentyn Tymofieiev
wrote:
> Thanks, Juta Staes, for reporti
On Sun, May 26, 2019 at 1:25 PM Reuven Lax wrote:
>
>
> On Fri, May 24, 2019 at 11:42 AM Brian Hulette
> wrote:
>
>> *tl;dr:* SchemaCoder represents a logical type with a base type of Row
>> and we should think about that.
>>
>> I'm a little concerned that the current proposals for a portable
>>
The Go SDK doesn't yet have these counters implemented or published
(sampling elements &countinf between DoFns, etc).
On Tue, May 28, 2019, 9:08 AM Alexey Romanenko
wrote:
> On 28 May 2019, at 17:31, Łukasz Gajowy wrote:
>
>
> I'm not quite following what these sizes are needed for--aren't the
A slightly larger concern: it also will force users to create stateful
DoFns everywhere to generate these sequence numbers. If I have a ParDo that
is not a simple 1:1 transform (i.e. not MapElements), then the ParDo will
need to generate its own sequence numbers for ordering, and the only safe
way
Thanks, Juta Staes, for reporting this issue.
On Tue, May 28, 2019, 9:19 AM Valentyn Tymofieiev
wrote:
> -1.
> I would like us to fix
> https://issues.apache.org/jira/browse/BEAM-7439 for 2.13.0. It is a
> regression that happened in 2.12.0, but was not caught by existing tests.
>
> Thanks,
> Va
-1.
I would like us to fix
https://issues.apache.org/jira/browse/BEAM-7439 for 2.13.0. It is a
regression that happened in 2.12.0, but was not caught by existing tests.
Thanks,
Valentyn
On Wed, May 22, 2019, 4:30 PM Ankur Goenka wrote:
> Hi everyone,
>
> Please review and vote on the release ca
On 28 May 2019, at 17:31, Łukasz Gajowy wrote:
>
> I'm not quite following what these sizes are needed for--aren't the
> benchmarks already tuned to be specific, known sizes?
>
> Maybe I wasn't clear enough. Such metric is useful mostly in IO tests -
> different IOs generate records of differen
I'm not quite following what these sizes are needed for--aren't the
benchmarks already tuned to be specific, known sizes?
Maybe I wasn't clear enough. Such metric is useful mostly in IO tests -
different IOs generate records of different size. It would be ideal for us
to have a universal way to ge
I'm not quite following what these sizes are needed for--aren't the
benchmarks already tuned to be specific, known sizes? I agree that
this can be expensive; especially for benchmarking purposes a 5x
overhead means you're benchmarking the sizing code, not the pipeline
itself.
Beam computes estimat
Hi Reuven,
> It also gets awkward with Flatten - the sequence number is no longer
enough, you must also encode which side of the flatten each element came
from.
That is a generic need. Even if you read data from Kafka, the offsets
are comparable only inside single partition. So, for Kafka to
Sequence metadata does have the disadvantage that users can no longer use
the types coming from the source. You must create a new type that contains
a sequence number (unless Beam provides this). It also gets awkward with
Flatten - the sequence number is no longer enough, you must also encode
which
This sounds really good. A lot of Jenkins jobs failures are caused by lint
problems.
I think it would be great to have something similar to Spotless in Java SDK
(I heard there is problem with configuring Black with IntelliJ).
On Mon, May 27, 2019 at 10:52 PM Robert Bradshaw
wrote:
> I'm generall
Hi all,
part of our work while creating benchmarks for Beam is to collect total
data size (bytes) that was put inside the testing pipeline. We need that in
load tests of core beam operations (to see how big was the load really) and
IO tests (to calculate throughput). The "not so good" way we're do
As I understood it, Kenn was supporting the idea that sequence metadata
is preferable over FIFO. I was trying to point out, that it even should
provide the same functionally as FIFO, plus one important more -
reproducibility and ability to being persisted and reused the same way
in batch and st
On Fri, May 24, 2019 at 6:57 PM Kenneth Knowles wrote:
>
> On Fri, May 24, 2019 at 9:51 AM Kenneth Knowles wrote:
>>
>> On Fri, May 24, 2019 at 8:14 AM Reuven Lax wrote:
>>>
>>> Some great comments!
>>>
>>> Aljoscha: absolutely this would have to be implemented by runners to be
>>> efficient. W
In the future (read, next release) the SDK will likely have reference
to the containers, so this will have to be part of the release. But I
agree for 2.13 it should be more about figuring out the process and
not necessarily holding back.
On Mon, May 27, 2019 at 7:42 PM Ankur Goenka wrote:
>
> +1
Huge +1 to all Kenn said.
Jan, batch sources can have orderings too, just like Kafka. I think
it's reasonable (for both batch and streaming) that if a source has an
ordering that is an important part of the data, it should preserve
this ordering into the data itself (e.g. as sequence numbers, offs
34 matches
Mail list logo