Re: [DISCUSS] indexes for API calls

2024-05-31 Thread Daniel Standish
Lots of good discussion here. We should create separate threads for the questions about (1) whether to keep or drop mysql / mssql / sqlite / mongodb / just-pipe-to-/dev/null/ and (2) to UUID or not to UUID and (3) database agnosticism a.k.a. an interface. But some responses... Using UUIDS was

[RESULT][VOTE] May 2024 PR of the Month

2024-05-31 Thread Briana Okyere
Hey All, Congratulations to Daniel Standish on winning PR of the Month with 8 votes for PR #39336: Scheduler to handle incrementing of try_number Well done all! PR #39336 will be featured in the May 2024 Newsletter, and thank you to all those who

Re: [DISCUSS] indexes for API calls

2024-05-31 Thread Andrey Anshin
> however it might or might not be affected. Similarly MariaDB, but MySQL does not seem to have proper UUID support, so we should really use UUID7 rather that UUID4 for such UUIDs in case we do not want to affect insert performance on MySQL. Some NITs here, I guess better to use deterministic

Re: [VOTE] Airflow Providers prepared on May 30, 2024

2024-05-31 Thread Pankaj Koti
+1 (non-binding) Concurring with Wei! Best regards, *Pankaj Koti* Senior Software Engineer (Airflow OSS Engineering team) Location: Pune, Maharashtra, India Timezone: Indian Standard Time (IST) On Fri, May 31, 2024 at 8:55 AM Wei Lee wrote: > +1 (non-binding) > > Tested my changes and our

Re: [DISCUSS] Restore the SQL server backend

2024-05-31 Thread Wei Lee
I agree with Jed and the following comments. If my memory serves me right, this topic has been discussed a few times in the past. 5% doesn't seem very convincing. Even if it's biased, I'm still not persuaded that there are a large number of users that are worth the community's effort. And Jarek

Re: [DISCUSS] indexes for API calls

2024-05-31 Thread Vincent Beck
Interesting thread. I think what makes this discussion complex is that Airflow makes a lot of different queries (API, Scheduler, ...). I think it is even harder to keep track of all the different queries Airflow makes and thus, hard to figure if such index is needed. Also, Airflow evolves (and

Re: [DISCUSS] indexes for API calls

2024-05-31 Thread Andrey Anshin
IMHO, blindy adding new indexes into the `dag_run` and `task_instance` tables will cause additional maintenance costs. There are 8 indexes already exists per each of this tables SELECT pi.schemaname schema_name, pi.tablename table_name, count(*) num FROM pg_indexes pi WHERE

Re: [DISCUSS] Restore the SQL server backend

2024-05-31 Thread Elad Kalif
I agree with Jarek I am a bit worried about the mental model of this proposal as you are offering to deliver a feature but you are not offering being a community member. I had a lot of frustration with the MsSQL backend tests, it really caused me pain as a contributor. According to your mental

Re: [DISCUSS] indexes for API calls

2024-05-31 Thread Pierre Jeambrun
Indeed Jarek I feel like this is another point in favor of stick to "Postgres" As mentioned, maybe we were a little reckless when adding all these kinds of filters. If they are not often used and we rarely / never see performance github issues on those, marking them as 'non optimised but here for

Re: [DISCUSS] indexes for API calls

2024-05-31 Thread Jarek Potiuk
And to be perfectly honest - if people (like me) hesitate on settling on architectural decisions because they are afraid that their changes might have unintended consequences, because we want to support all the different kinds of databases - this is one more reason we should stick to "Postgres

Re: [DISCUSS] indexes for API calls

2024-05-31 Thread Jarek Potiuk
Using UUIDS was the proposal how we can bypass the limitation of MYSQL for Airflow 2 when we discussed whether to do a "simple" version of team-prefix in dag id, or whether we want to mess with adding yet-another-field-to-indexes-that-are-already-too-long-for-mysql and it was based on the

Re: [DISCUSS] Restore the SQL server backend

2024-05-31 Thread Jarek Potiuk
> We also understand and are ready to address the concerns stated in the vote about support and resolving CI issues Hello James, Could you please explain how exactly are you planning to help a number of maintainers who are working on developing new feature to make sure they know and realise

Re: [DISCUSS] indexes for API calls

2024-05-31 Thread Daniel Standish
Yes uuid is risky and problematic as primary key. If you do it you need to do carefully/ sequential. But I think that we are not going with UUID pk on any tables at this time. BUT I do want to add a uuid for every TI try that is not PK but can be used as a more convenient identifier when tying

Re: [DISCUSS] indexes for API calls

2024-05-31 Thread Jarek Potiuk
Also if we are speaking about indexes - a bit tangential but I know we were planning to replace some of the primary keys (mainly because of mysql limitations) with synthetic keys for DAG versioning casse where we planned to use UUIDs). We should be very, very careful when doing it because I've

Re: [DISCUSS] Restore the SQL server backend

2024-05-31 Thread James Duong
Many of the MSSQL customers using Airflow with MSSQL as the backend are unlikely to participate in those types of surveys, unfortunately, so I fear the numbers are biased. We have had direct feedback from multiple very large MSSQL customers who see the removal of this support as a large

Re: [DISCUSS] Restore the SQL server backend

2024-05-31 Thread Ephraim Anierobi
I also agree with others and aside from the survey, MSSQL was a headache. I think so many pain points would delay Airflow 3 development if we reconsider MSSQL. Maybe any reconsideration should be after Airflow 3? On Thu, 30 May 2024 at 23:48, Jarek Potiuk wrote: > Agree with all comments above.

Re: [DISCUSS] indexes for API calls

2024-05-31 Thread Pankaj Koti
Addressing one of Pierre's questions: Should I index foreign keys? Is that done by default or should I explicitly do it? The answer varies depending on the database engine. PostgreSQL and SQLite do not add indexes on foreign keys by default, while MySQL does. Developers should keep this in mind.

Re: [DISCUSS] indexes for API calls

2024-05-31 Thread Daniel Standish
I would be in favor of this for sure. Let's see what others think :) On Thu, May 30, 2024 at 10:55 PM Jarek Potiuk wrote: > Simply speaking - let's make "lack of optimisation for these and that" part > of the API specification. > > On Fri, May 31, 2024 at 7:54 AM Jarek Potiuk wrote: > > > So