Django queryset exclude() with multiple related field clauses

You can use the new django SubQuery to avoid doing 2 queries to the server:

User.objects.exclude(id__in=SubQuery(User.objects.filter(preferences__preference="PREF_A", preferences__value=True)))

I implemented a quick-and-dirty solution so I could move on, expecting it to be horribly inefficient; however, on inspection of the SQL generated, it turned out to not be that bad:

>>> User.objects.exclude(id__in=User.objects.filter(preferences__preference="PREF_A", preferences__value=True))

I thought that the ORM would load the results of the subordinate query into the webserver's memory before completing (this is a problem because our production app will have in the millions of users), but in fact it properly uses a subquery:

>>> User.objects.exclude(id__in=User.objects.filter(preferences__preference="PREF_A", preferences__value=True)).values('id').query.sql_with_params()
(u'SELECT "sgauth_user"."id" FROM "sgauth_user" WHERE NOT ("sgauth_user"."id" IN (SELECT U0."id" FROM "sgauth_user" U0 INNER JOIN "feeds_preference" U1 ON (U0."id" = U1."user_id") WHERE (U1."preference" = %s  AND U1."value" = %s )))', ('PREF_A', True))

I'm putting this up as one possible answer, but I am still interested if there is a way to do it with the straightforward exclude clause, or a way of generating a query through the ORM that works with straightforward joins and no subquery whatsoever.


This is a still-present gotcha in Django, where exclude() is not acting as the reverse of filter(). Here's the documentation explaining the difference:

Note

The behavior of filter() for queries that span multi-value relationships, as described above, is not implemented equivalently for exclude(). Instead, the conditions in a single exclude() call will not necessarily refer to the same item.

For example, the following query would exclude blogs that contain both entries with “Lennon” in the headline and entries published in 2008:

Blog.objects.exclude(
    entry__headline__contains='Lennon',
    entry__pub_date__year=2008,
)

However, unlike the behavior when using filter(), this will not limit blogs based on entries that satisfy both conditions. In order to do that, i.e. to select all blogs that do not contain entries published with “Lennon” that were published in 2008, you need to make two queries:

Blog.objects.exclude(
    entry__in=Entry.objects.filter(
        headline__contains='Lennon',
        pub_date__year=2008,
    ),
)

What you've done is probably the way to go.