Re: [Discuss] AIP-44 Airflow Database API

2022-08-05 Thread Jarek Potiuk
This is a very good question. What actually makes it a bit easier to reason about it, is that in all our cases a single RPC method call spans a single transaction (this is the boundary of the RPC methods we have and the main decision point on whether to split it or not). There are few error cases

Re: [Discuss] AIP-44 Airflow Database API

2022-08-04 Thread Ping Zhang
Hi Jarek, Sorry for another question. It is regarding the error handling. In the db session world, when the client fails during session.commit due to any reasons, the client knows the db transaction will be rolled back. With the internal API, that might not always be the case as there is another

Re: [Discuss] AIP-44 Airflow Database API

2022-08-04 Thread Ping Zhang
Hi Jarek, Thanks for the thorough explanation. They all make sense, especially the performance impact and scalability part. I believe with the internal API, it should reduce lots of db connections. Looking forward to it! Thanks, Ping On Thu, Aug 4, 2022 at 12:11 AM Jarek Potiuk wrote: > I w

Re: [Discuss] AIP-44 Airflow Database API

2022-08-04 Thread Jarek Potiuk
I would like to see more about how you do this because I have noticed that there are tons of different configurations in the airflow and some combinations are less covered by the test as there are some default values, for example this code

Re: [Discuss] AIP-44 Airflow Database API

2022-08-03 Thread Ping Zhang
Hi Jarek, Thanks for pushing forward with this AIP. I am very interesting in this part: All tests for DagProcessor, Workers (including Operators) and triggers are executed in both modes "DBDirect'' and "DBIsolation ''. The code for Airflow Database API, the DbApiClient and DBSession are extended

Re: [Discuss] AIP-44 Airflow Database API

2022-08-02 Thread Jarek Potiuk
Hello, I have updated AIP-44 with the findings from the POC. If there are any more questions/comments, please let me know, otherwise I plan to start voting tomorrow. https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API On Mon, Jul 25, 2022 at 10:18 PM Jarek Potiuk wr

Re: [Discuss] AIP-44 Airflow Database API

2022-07-25 Thread Jarek Potiuk
I think the drop will be less than those numbers - especially the high-end ones. The DB operations we use are quite "coarse" grained (they span a single transaction, not a function call). Also I deliberately chose "example dags" as the benchmark playground - the example DAGs we have are small. And

Re: [Discuss] AIP-44 Airflow Database API

2022-07-15 Thread Jarek Potiuk
Two more things: 1) there is one "caveat" I had to handle - timeout handling in DagFileProcessor (signals do not play well with threads of GRPC - but that should be easy to fix) 2) The way I proposed it (with very localized changes needed) will be very easy to split the job among multiple people a

Re: [Discuss] AIP-44 Airflow Database API

2022-07-15 Thread Jarek Potiuk
Hello Everyone, First of all - apologies for those who waited for it, I've been dragged in multiple directions, but finally I got some quality time to take a look at open questions and implement POC for AIP-44 as promised before. TL;DR; I finally came back to the AIP-44 and multi-tenancy and have

Re: [Discuss] AIP-44 Airflow Database API

2022-02-01 Thread Jarek Potiuk
Since we have AIP-43 already approved, I think I would love to have more questions and discussions about AIP-44 - The "Airflow Internal API" https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API (as the new name is). For those who would like to get more context - recordin

Re: [Discuss] AIP-44 Airflow Database API

2022-01-03 Thread Jarek Potiuk
Also AIP-44 - which is the DB isolation mode is much more detailed and ready for deeper discussion if needed. https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Database+API On Tue, Dec 14, 2021 at 12:07 PM Jarek Potiuk wrote: > > And just to add to that. > > Thanks again for the

Re: [Discuss] AIP-44 Airflow Database API

2021-12-14 Thread Jarek Potiuk
And just to add to that. Thanks again for the initial comments and pushing us to provide more details. That allowed us to discuss and focus on many of the aspects that were raised and we have many more answers now: * first of all - we focused on making sure impact on the existing code and "behavi

Re: [Discuss] AIP-44 Airflow Database API

2021-12-03 Thread Jarek Potiuk
Cool. Thanks for the guidance. On Fri, Dec 3, 2021 at 6:37 PM Ash Berlin-Taylor wrote: > > - make an inventory: It doesn't need to be exhaustive, but a representative > sample. > - More clearly define _what_ the API calls return -- object type, methods on > them etc. > > From the AIP you have t

Re: [Discuss] AIP-44 Airflow Database API

2021-12-03 Thread Ash Berlin-Taylor
- make an inventory: It doesn't need to be exhaustive, but a representative sample. - More clearly define _what_ the API calls return -- object type, methods on them etc. From the AIP you have this example: def get_dag_run(self, dag_id, execution_date): return self.db_client.get_dag_run(da

Re: [Discuss] AIP-44 Airflow Database API

2021-12-03 Thread Jarek Potiuk
Surely - if you think that we need to do some more work to get confidence, that's fine. I am sure we can improve it to the level that we will not have to do full performance tests, and you are confident in the direction. Just to clarify your concerns and make sure we are on the same page - as I un

Re: [Discuss] AIP-44 Airflow Database API

2021-12-03 Thread Ash Berlin-Taylor
This is a fundamental change to the architecture with significant possible impacts on performance, and likely requires touching a large portion of the code base. Sorry, you're going to have to do expand on the details first and work out what would actually be involved and what the impacts will

Re: [Discuss] AIP-44 Airflow Database API

2021-12-02 Thread Jarek Potiuk
Oh yeah - good point and we spoke about performance testing/implications. Performance is something we were discussing as the next step when we get general "OK" in the direction - we just want to make sure that there are no "huge" blockers in the way this is proposed and explain any doubts first, s

Re: [Discuss] AIP-44 Airflow Database API

2021-12-02 Thread Andrew Godwin
Ah, my bad, I missed that. I'd still like to see discussion of the performance impacts, though. On Thu, Dec 2, 2021 at 11:14 AM Ash Berlin-Taylor wrote: > The scheduler was excluded from the components that would use the dbapi - > the mini scheduler is the odd one out here is it (currently) runs

Re: [Discuss] AIP-44 Airflow Database API

2021-12-02 Thread Ash Berlin-Taylor
The scheduler was excluded from the components that would use the dbapi - the mini scheduler is the odd one out here is it (currently) runs on the work but shares much of the code from the scheduling path. -a On 2 December 2021 17:56:40 GMT, Andrew Godwin wrote: >I would also like to see some

Re: [Discuss] AIP-44 Airflow Database API

2021-12-02 Thread Andrew Godwin
I would also like to see some discussion in this AIP about how the data is going to be serialised to and from the database instances (obviously Connexion is involved, but I presume more transformation code is needed than that) and the potential slowdown this would cause. In my experience, a somewh

Re: [Discuss] AIP-44 Airflow Database API

2021-12-02 Thread Jarek Potiuk
Yeah - I thik Ash you are completely right we need some more "detailed" clarification. I believe, I know what you are - rightfully - afraid of (re impact on the code), and maybe we have not done a good job on explaining it with some of our assumptions we had when we worked on it with Mateusz. Sim

Re: [Discuss] AIP-44 Airflow Database API

2021-12-02 Thread Ash Berlin-Taylor
I just provided a general idea for the approach - but if you want me to put more examples then I am happy to do that Yes please. It is /too/ general for me and I can't work out what effect it would actually have on the code base, especially how it would look with the config option to enable/d

Re: [Discuss] AIP-44 Airflow Database API

2021-12-02 Thread Ash Berlin-Taylor
I'm sorry to say it, but this proposal right just doesn't contain enough detail to say what the actual changes to the code would be, and what the impact would be To take the one example you have so far: def get_dag_run(self, dag_id, execution_date): return self.db_client.get_dag_run(dag_i

Re: [Discuss] AIP-44 Airflow Database API

2021-12-02 Thread Jarek Potiuk
Cool. Just to clarify both AIP-43 and AIP-44 are part of the Multitenancy stuff - which are under the "AIP-1 Improve Airflow Security" "Umbrella". We will likely have more AIP's under the same "AIP-1" umbrella - notably we discussed with Cloudera their need to get more selective access to resource