Data Engineering track for Community Over Code NA is calling for presentations

2023-07-07 Thread Jarek Potiuk
Hello Beam community, Just a reminder, that there are just 6 days left to submit your proposal for The Community Over Code NA (former ApacheCon) conference. This is the flagship event for the ASF in Halifax, Nova Scotia, Canada, October 7-10, 2023 and together with Ismael, we want to encourage

Invitation for CFP for Data Engineering Track at the Community Over Code NA

2023-06-16 Thread Jarek Potiuk
Hello Beam community members ! TL;DR: Call For Papers for Community Over Code NA conference in Halifax in October *ends in 4 weeks (13th of July!)* and this is about the last moment to prepare and submit your proposals: https://communityovercode.org/call-for-presentations/ *Community Over Code

Re: [DISCUSS] Dependency management in Apache Beam Python SDK

2022-08-26 Thread Jarek Potiuk
expertise here. > I think Beam would benefit from adopting some of these practices. > Kerry > > On Fri, Aug 26, 2022, 7:35 AM Jarek Potiuk wrote: > >> >>> I'm curious Jarek, does Airflow take any dependencies on popular >>> libraries like pandas, numpy, pyarrow,

Re: [DISCUSS] Dependency management in Apache Beam Python SDK

2022-08-26 Thread Jarek Potiuk
> > I'm curious Jarek, does Airflow take any dependencies on popular libraries > like pandas, numpy, pyarrow, scipy, etc... which users are likely to have > their own dependency on? I think these dependencies are challenging in a > different way than the client libraries - ideally we would support

Re: [DISCUSS] Dependency management in Apache Beam Python SDK

2022-08-24 Thread Jarek Potiuk
Comment (from a bit outsider) Fantastic document Valentyn. Very, very insightful and interesting. We feel a lot of the same pain in Apache Airflow (actually even more because we have not 20 but 620+ dependencies) but we are also a bit more advanced in the way how we are managing the dependencies

Re: Apache Trift vs GRPC summary

2022-07-16 Thread Jarek Potiuk
ation formats (NO - there are enough impedance mismatches that it is > just not worthwhile, even though proto has lots of problems at least we can > develop workarounds only once) > > We never did develop with anything other than proto+gRPC in mind. > > Kenn > > On Thu, Feb

Re: Trigger phrases in Github Actions

2022-07-15 Thread Jarek Potiuk
future improvements to come to GHA, not Probot. > - Actions logging shows up in the repo where anyone can view it. Probot > requires a separate logging service. > > Cons: > - Probot is a little snappier, even with private runners > - Maintaining state is easier with probot (no

Re: Trigger phrases in Github Actions

2022-07-15 Thread Jarek Potiuk
My 3 cents. We've been playing with similar approaches in Apache Airflow and I think Github Actions Workflows are not a good idea for this kind of behaviour. Github Action workflows are really "heavy-weight" in many ways, you should really think of them to be spinned to actually do some

Data Engineering Track at ApacheCon (October 3-6, New Orleans) - CFP ends 23rd of May !

2022-05-10 Thread Jarek Potiuk
Hello Beam developers ! ApacheCon North America is back in person this year in October. https://apachecon.com/acna2022/ Together with Ismaël Mejía, we are organizing for the first time a Data Engineering Track as part of ApacheCon. You might be wondering why a different track if we already have

Re: Data Engineering track at ApacheCon (October 3-6, New Orleans)

2022-04-13 Thread Jarek Potiuk
Cool 23 May is the deadline (I forgot to mention it). On Wed, Apr 13, 2022 at 9:16 PM Pablo Estrada wrote: > Thanks Jarek! > This is a great idea, and I'll try and submit something for this : ) > Best > -P. > > On Wed, Apr 13, 2022 at 1:22 AM Jarek Potiuk wrote: >

Data Engineering track at ApacheCon (October 3-6, New Orleans)

2022-04-13 Thread Jarek Potiuk
Hello Beam Friends. There is an ApacheCon N coming this year in October ( https://apachecon.com/acna2022/) and it's going to be an "ONSITE" event - 3-6 October, New Orleans, Louisiana! It's one of the best events ever when it comes to community building at Apache so I heartily invite everyone.

Re: Apache Trift vs GRPC summary

2022-02-17 Thread Jarek Potiuk
t; 1: https://beam.apache.org/roadmap/portability/ >> >> On Wed, Feb 16, 2022 at 2:38 AM Jarek Potiuk wrote: >> >>> Hello Beam friends, >>> >>> I have a question, we are preparing (as part of >>> https://cwiki.apache.org/confluence/display/AIRFL

Apache Trift vs GRPC summary

2022-02-16 Thread Jarek Potiuk
Hello Beam friends, I have a question, we are preparing (as part of https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API) to split Airflow into more components which will be communicating using RPC. Basically we need to extract some of the internal methods into a

Re: [RFC][Design] Automate Reviewer Assignment

2022-02-11 Thread Jarek Potiuk
art of the Apache org? I can't 100% tell). > > 3. Don't break the existing use case where a contributor wants a review > from a specific person. > > Thanks, > Danny > > On Thu, Feb 10, 2022 at 7:52 AM Jarek Potiuk wrote: > >> Very interesting one - as an outsider I

Re: [RFC][Design] Automate Reviewer Assignment

2022-02-10 Thread Jarek Potiuk
Very interesting one - as an outsider I am interested to see how this initiative will work out for the beam community. Just one comment - maybe you do not know but in GitHub there is a "CODEOWNERS" feature (I notice you are not using it). Quote from

Re: Developing on an M1 Mac

2022-02-08 Thread Jarek Potiuk
ine should work > pretty well. This is what Apache Beam's Jenkins setup effectively does. > > > > No experience with developing on an ARM based CPU. > > > > On Wed, Jan 12, 2022 at 9:28 AM Jarek Potiuk wrote: > >> > >> Comment from the side - If you

Re: [ANNOUNCE] Apache Beam 2.36.0 Release

2022-02-08 Thread Jarek Potiuk
Thanks a lot for that Emily! It's been a release we were waiting for at Apache Airflow. I believe It will unblock a number of "modernizations" in our pipeline - Python 3.10, ARM support were quite a bit depending on it (mostly through numpy transitive dependency limitation). Great to see this one

Re: [DISCUSS] Migrate Jira to GitHub Issues?

2022-01-31 Thread Jarek Potiuk
f using a "closing > keyword". (For reference: Linking a pull request to an issue > <https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue> > ) > > I'm not sure how much this could sway the decisions but thought it was >

Re: [DISCUSS] Migrate Jira to GitHub Issues?

2022-01-31 Thread Jarek Potiuk
are not especially useful anyway. They are too detailed for a >>>>>>> quick >>>>>>> summary, and not precise enough to show everything. For a readable >>>>>>> summary, >>>>>>> we use CHANGES.md to highlight changes we

Re: Python SDK release good for Python 3.10/M1

2022-01-24 Thread Jarek Potiuk
y#L145 > > > On Sun, Jan 23, 2022 at 11:34 PM Jarek Potiuk wrote: > >> Hello Apache Beam Friends, >> >> I have attempted today (that was yet another attempt) to prepare an >> Apache Airflow CI image for testing with Python 3.10. >> >> Unlike previo

Python SDK release good for Python 3.10/M1

2022-01-23 Thread Jarek Potiuk
Hello Apache Beam Friends, I have attempted today (that was yet another attempt) to prepare an Apache Airflow CI image for testing with Python 3.10. Unlike previous attempts (where there were quite a few deps that lagged behind) - this one was **almost** successful. I think the last (or at

Re: Developing on an M1 Mac

2022-01-12 Thread Jarek Potiuk
Comment from the side - If you use Docker - experience from Airflow - until we will get ARM images, docker experience is next to unusable (docker filesystem slowness + emulation). J. On Wed, Jan 12, 2022 at 6:21 PM Daniel Collins wrote: > > I regularly develop on a non-m1 mac using intellij,

Re: [DISCUSS] Migrate Jira to GitHub Issues?

2021-12-07 Thread Jarek Potiuk
to >> initiate this process and what are the show-stoppers for us with a current >> Jira workflow? >> >> — >> Alexey >> >> On 6 Dec 2021, at 19:48, Udi Meiri wrote: >> >> +1 on migrating to GH issues. >> We will need to update

Re: [DISCUSS] Migrate Jira to GitHub Issues?

2021-12-04 Thread Jarek Potiuk
spective of a new Beam contributor. +1 on Github >>>> issues. I feel like it would be easier to learn about and contribute to >>>> existing issues/bugs if it were tracked in the same place as that of the >>>> source code, rather than bouncing back and forth betw

Re: [DISCUSS] Migrate Jira to GitHub Issues?

2021-12-03 Thread Jarek Potiuk
Comment from a friendly outsider. TL; DR; Yes. Do migrate. Highly recommended. There were already similar discussions happening recently (community and infra mailing lists) and as a result I captured Airflow's experiences and recommendations in the BUILD wiki. You might find some hints and

Re: Debugging GitHub Actions workflows

2021-11-07 Thread Jarek Potiuk
You can try https://github.com/nektos/act J. On Wed, Nov 3, 2021 at 9:09 PM Valentyn Tymofieiev wrote: > > Does anybody know how one can replicate an environment used by GitHub actions > so that one can SSH (or some equivalent) modify the environment in realtime, > and try out commands that

Re: Following up on the migration of the GA runners over to Google Cloud.

2021-08-11 Thread Jarek Potiuk
Just one more caveat and few comments so that you realise the full scope (of at least what I know) of the danger and can make informed decisions. I think you should simply weigh the risks vs. costs. As usual with security, there is never a 0-1 case, it's always how much investment you can do to

Re: Help needed with migration of GitHub Action Runners from GitHub to GKE.

2021-08-05 Thread Jarek Potiuk
I'd love to help, but I am on vacation next week, just one word of warning. If you want to run GitHub Runner on your own infrastructure, that might introduce several security risks. Basically anyone who makes a PR to your repo can compromise your runners. The dangers of compromising runners are

Re: LGPL-2.1 in beam-vendor-grpc

2021-05-10 Thread Jarek Potiuk
Also we have very similar discussion about it in https://issues.apache.org/jira/browse/LEGAL-572 Just to be clear about the context of it, it's not a legal requirement of Apache Licence, it's Apache Software Foundation policy, that we should not limit our users in using our software. If the LGPL

Re: Consider Cloudpickle instead of dill for Python pickling

2021-05-01 Thread Jarek Potiuk
Just my 2 cents comment from the users perspective. In Airflow, the narrow limits of `dill` caused some problems with dependencies. We had to add some exceptions in our process for that: https://github.com/apache/airflow/blob/master/Dockerfile#L246

Re: [VOTE] Release 2.29.0, release candidate #1

2021-04-25 Thread Jarek Potiuk
+1 (non-binding) Thanks for tirelessly working on improving the python client :). This is a friendly visit from Apache Airflow here. I've just tested the 2.29.0rc1 in our "apache.beam" provider's tests and they are all Green. Just to give a bit of context here. We are eagerly waiting for the