Re: Django ORM annotate performance

Naresh Jonnala Fri, 15 Jan 2021 20:21:32 -0800

Hi,

print(qs.query) and share.


On Wednesday, January 13, 2021 at 11:25:06 PM UTC+5:30 pawe...@gmail.com 
wrote:

> Hi all,
>
> I wanted to cross post my question / problem in regards to Django's ORM 
> `annotate` performance. Not sure if I should post it here on or Django 
> developers mailing list, but I wanted to start here.
>
>
> https://stackoverflow.com/questions/65506731/django-orm-annotate-performance
>
> ---
>
> I'm using Django and Django REST Framework at work and we've been having 
> some performance issues with couple endpoints lately. We started by making 
> sure that the SQL part is optimized, no unnecessary N+1 queries, indexes 
> where possible, etc.
>
> Looking at the database part itself, it seems to be very fast (3 SQL 
> queries total, under a second), even with larger datasets, but the API 
> endpoint still took >5 seconds to return. I started profiling the Python 
> code using couple different tools and the majority of time is always spent 
> inside the `annotate` and `set_group_by` functions in Django.
>
> [image: le0oG.png]
>
> I tried Googling about `annotate` and performance, looking at Django docs, 
> but there's no mention of it being a 'costly' operation, especially when 
> used with the `F` function.
>
> The `annotate` part of the code looks something like this:
>
>     qs = qs.annotate(
>         foo_name=models.F("foo__core__name"),
>         foo_birth_date=models.F("foo__core__birth_date"),
>         bar_name=models.F("bar__core__name"),
>         spam_id=models.F("baz__spam_id"),
>         spam_name=models.F("baz__spam__core__name"),
>         spam_start_date=models.F("baz__spam__core__start_date"),
>         eggs_id=models.F("baz__spam__core___eggs_id"),
>         eggs_name=models.F("baz__spam__eggs__core___name"),
>     )
>
>     qs = (
>         qs.order_by("foo_id", "eggs_id", "-spam_start_date", "bar_name")
>         .values(
>             "foo_name",
>             "foo_birth_date",
>             "bar_name",
>             "spam_id",
>             "spam_name",
>             "eggs_id",
>             "eggs_name",
>         )
>         .distinct()
>     )
>
> The query is quite big, spans multiple relationships, so I was sure that 
> the problem is database related, but it doesn't seem to be. All the 
> `select_related` and `prefetch_related` are there, indexes too.
>
> I tried rewriting the code without `annotate` at all, but it didn't seem 
> to help. I started wondering wether the time spent in `annotate` is really 
> a red herring and it's only how the profiler sees it, but all profilers I 
> tried showed the same thing.
>
> While I feel like I know Django quite well and had success optimising API 
> endpoints before, I'm not sure what 'thread' to pull in this case. I tried 
> looking at Django internals, especially around `annotate` and 
> `set_group_by` but couldn't pin point the time spent there. My last ditch 
> effort will be trying to rewrite those couple endpoints with raw SQL, but 
> I'd very much like to avoid that.
>
> All help will be much appreciated : )
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/34d7939c-f7de-4793-bff3-cf46bb5fe68en%40googlegroups.com.

Re: Django ORM annotate performance

Reply via email to