I would definitely support a PR making this an option. Changing the default
would be a rather big change that would require more thought.
On Tue, Oct 10, 2023 at 4:24 PM Joey Tran wrote:
> Bump on this. Sorry to pester - I'm trying to get a few teams to adopt
> Apache Beam at my company and I'm
Bump on this. Sorry to pester - I'm trying to get a few teams to adopt
Apache Beam at my company and I'm trying to foresee parts of the API they
might find inconvenient.
If there's a conclusion to make the behavior similar to java, I'm happy to
put up a PR
On Thu, Oct 5, 2023, 12:49 PM Joey Tran
On Tue, Oct 10, 2023 at 4:05 PM Chamikara Jayalath
wrote:
>
> On Tue, Oct 10, 2023 at 4:02 PM Robert Bradshaw
> wrote:
>
>> On Tue, Oct 10, 2023 at 3:53 PM Chamikara Jayalath
>> wrote:
>>
>>>
>>> On Tue, Oct 10, 2023 at 3:41 PM Reuven Lax wrote:
>>>
I suspect some simple pattern
On Tue, Oct 10, 2023 at 4:02 PM Robert Bradshaw wrote:
> On Tue, Oct 10, 2023 at 3:53 PM Chamikara Jayalath
> wrote:
>
>>
>> On Tue, Oct 10, 2023 at 3:41 PM Reuven Lax wrote:
>>
>>> I suspect some simple pattern templating would solve most use cases. We
>>> probably would want to support
On Tue, Oct 10, 2023 at 4:03 PM Robert Bradshaw wrote:
> On Tue, Oct 10, 2023 at 3:41 PM Reuven Lax wrote:
>
>> I suspect some simple pattern templating would solve most use cases.
>>
>
> That's what I'm leaning towards as well.
>
>
>> We probably would want to support timestamp formatting
On Tue, Oct 10, 2023 at 3:41 PM Reuven Lax wrote:
> I suspect some simple pattern templating would solve most use cases.
>
That's what I'm leaning towards as well.
> We probably would want to support timestamp formatting (e.g. $ $M $D)
> as well.
>
Although we have several timestamps to
On Tue, Oct 10, 2023 at 3:53 PM Chamikara Jayalath
wrote:
>
> On Tue, Oct 10, 2023 at 3:41 PM Reuven Lax wrote:
>
>> I suspect some simple pattern templating would solve most use cases. We
>> probably would want to support timestamp formatting (e.g. $ $M $D) as
>> well.
>>
>> On Tue, Oct
That's a good point--- in the dbt case they're almost always treating that
as a precomputation. I suppose a JinjaTransform isn't totally insane, but
not sure I'd want to introduce Yet Another Way Of Writing A Lambda :-)
On Tue, Oct 10, 2023 at 3:22 PM Robert Bradshaw wrote:
> On Tue, Oct 10,
On Tue, Oct 10, 2023 at 3:41 PM Reuven Lax wrote:
> I suspect some simple pattern templating would solve most use cases. We
> probably would want to support timestamp formatting (e.g. $ $M $D) as
> well.
>
> On Tue, Oct 10, 2023 at 3:35 PM Robert Bradshaw
> wrote:
>
>> On Mon, Oct 9, 2023
I suspect some simple pattern templating would solve most use cases. We
probably would want to support timestamp formatting (e.g. $ $M $D) as
well.
On Tue, Oct 10, 2023 at 3:35 PM Robert Bradshaw wrote:
> On Mon, Oct 9, 2023 at 3:09 PM Chamikara Jayalath
> wrote:
>
>> I would say:
>>
>>
On Mon, Oct 9, 2023 at 3:09 PM Chamikara Jayalath
wrote:
> I would say:
>
> sink:
> type: WriteToParquet
> config:
> path: /beam/filesytem/dest
> prefix:
> suffix:
>
> Underlying SDK will add the middle part of the file names to make sure
> that files
On Tue, Oct 10, 2023 at 7:21 AM Kenneth Knowles wrote:
> Another perspective:
>
> We should focus on the fact that FileIO writes what I would call a "big
> file-based dataset" to a filesystem. The primary characteristic of a "big
> file-based dataset" is that it is sharded and that the shards
On Tue, Oct 10, 2023 at 7:22 AM Byron Ellis via dev
wrote:
> FWIW dbt (which is also YAML and has this problem for other reasons) does
> something like this. It also chooses to assume that everything is a string
> but allows users to use the Jinja templating language to make those strings
>
Thank you for the information.
I agree with Kenn in that case. This could wait for the next release.
Unless there is another reason to do the RC2.
On Tue, Oct 10, 2023 at 12:30 PM Yi Hu wrote:
>
> Would it impact all python users including breaking the new user, quick
>> start experience? Or
> Would it impact all python users including breaking the new user, quick
> start experience? Or would it impact users of a specific IO or
> configuration?
>
It is the latter. It will impact users of Specific IO (BigQueryIO read)
specific configuration (Direct_Read). Note that the default
Would it impact all python users including breaking the new user, quick
start experience? Or would it impact users of a specific IO or
configuration? If it is the former, I think it would be worth fixing it
just to have a working new user experience. With new user experience I am
thinking about
After thinking this through a bit more, I am inclined to release RC1 with
this noted as a known issue, unless there are other more compelling reasons
to issues a second RC.
Why?
- It is more-or-less by design that end users of Beam Python have
dependencies shift under them; breakage and
Top-post comment: Aggregation of test results is hard. We've had a million
threads on it. You want to have a clear green "this runner works" signal
but you also want completely isolated "this runner works for Java" / "this
runner works for Python" / etc signals. The tension between these is
Conversely, by unifying to Gradle command names, it also teaches how folks
can run these things locally.
Doesn't help entirely with discoverability, or initial scrutability, but it
feels lower impedance than someone needing to look at the action manually
to learn what it's running under the hood
> Just to clarify: I'm not proposing tying them to gradle tasks (I'm fine
with `go test` for example) or doing this in situations where it is
unnatural.
> My example probably confused this because I left off the `./gradlew` just
to save space. I'm proposing naming them after their obvious repro
+1 to the general proposal.
I'm not bothered if something says a gradle command and in execution, that
task ends up running multiple different commands. Arguably, if we're
running a gradle task manualy to prepare for a subsequent task that latter
task should be adding the former to it's
On Tue, Oct 10, 2023 at 10:21 AM Danny McCormick via dev <
dev@beam.apache.org> wrote:
> I'm +1 on:
> - standardizing our naming
> - making job names match their commands exactly (though I'd still like the
> `Run` prefix so that you can do things like say "Suite XYZ failed" without
> triggering
FWIW dbt (which is also YAML and has this problem for other reasons) does
something like this. It also chooses to assume that everything is a string
but allows users to use the Jinja templating language to make those strings
dynamic where needed. Syntactically I think that's a bit nicer to look at
I'm +1 on:
- standardizing our naming
- making job names match their commands exactly (though I'd still like the
`Run` prefix so that you can do things like say "Suite XYZ failed" without
triggering the automation)
- removing pre/postcommit from the naming (we actually already run many
precommits
Another perspective:
We should focus on the fact that FileIO writes what I would call a "big
file-based dataset" to a filesystem. The primary characteristic of a "big
file-based dataset" is that it is sharded and that the shards should not
have any individual distinctiveness. The dataset should
Thanks for raising this. This generally works, though some jobs run more
than one gradle task (e.g. some IO_Direct_PreCommit run both :build (which
executes unit tests) and :integrationTest).
Another option is to normalize the naming of every job, saying the job name
is X, then workflow name is
I'm +1 on standardizing the names and while I don't have a strong opinion
on which standard (so long as it's only one) using the Gradle name seems
like a perfectly good choice... I don't know the GHA setup well enough, but
would that help maintain those? Presumably the various actions all
Since I've been in GHA files lately...
I think they have a very useful pattern which we could borrow from or learn
from, where setting up the variables happens separately, like
FWIW I aware of the README in
https://github.com/apache/beam/tree/master/.test-infra/jenkins that lists
the phrases alongside the jobs. This is just wasted work to maintain IMO.
Kenn
On Tue, Oct 10, 2023 at 9:46 AM Kenneth Knowles wrote:
> *Proposal:* make all the job names exactly match the
*Proposal:* make all the job names exactly match the GH comment to run them
and make it also as close as possible to how to reproduce locally
*Example problems*:
- We have really silly redundant jobs results like 'Chicago Taxi Example
on Dataflow ("Run Chicago Taxi on Dataflow")' and
This is your daily summary of Beam's current high priority issues that may need
attention.
See https://beam.apache.org/contribute/issue-priorities for the meaning and
expectations around issue priorities.
Unassigned P1 Issues:
https://github.com/apache/beam/issues/28760 [Bug]: EFO Kinesis
31 matches
Mail list logo