Django ORM for Beginners | We optimize requests



Django ORM (Object Relational Mapping) is one of the most powerful features of Django. This allows us to interact with the database using Python code, not SQL.

To demonstrate, I will describe the following model:

from django.db import models

class Blog(models.Model):
    name = models.CharField(max_length=250)
    url = models.URLField()

    def __str__(self):
        return self.name

class Author(models.Model):
    name = models.CharField(max_length=250)

    def __str__(self):
        return self.name

class Post(models.Model):
    title = models.CharField(max_length=250)
    content = models.TextField()
    published = models.BooleanField(default=True)
    blog = models.ForeignKey(Blog, on_delete=models.CASCADE)
    authors = models.ManyToManyField(Author, related_name="posts")

I will use django-extentions to get useful information with:

python manage.py shell_plus --print-sql

So, let's begin:

>>> post = Post.objects.all()
>>> post
SELECT "blog_post"."id",
       "blog_post"."title",
       "blog_post"."content",
       "blog_post"."blog_id"
  FROM "blog_post"
 LIMIT 21
Execution time: 0.000172s [Database: default]
<QuerySet [<Post: Post object (1)>]>

1. Use ForeignKey values โ€‹โ€‹directly


>>> Post.objects.first().blog.id
SELECT "blog_post"."id",
       "blog_post"."title",
       "blog_post"."content",
       "blog_post"."blog_id"
  FROM "blog_post"
 ORDER BY "blog_post"."id" ASC
 LIMIT 1
Execution time: 0.000225s [Database: default]
SELECT "blog_blog"."id",
       "blog_blog"."name",
       "blog_blog"."url"
  FROM "blog_blog"
 WHERE "blog_blog"."id" = 1
 LIMIT 21
Execution time: 0.000144s [Database: default]
1

And so we get 1 query in the database:

>>> Post.objects.first().blog_id
SELECT "blog_post"."id",
       "blog_post"."title",
       "blog_post"."content",
       "blog_post"."blog_id"
  FROM "blog_post"
 ORDER BY "blog_post"."id" ASC
 LIMIT 1
Execution time: 0.000155s [Database: default]
1

2. OneToMany Relations


If we use the OneToMany relationship we use the ForeignKey field and the query looks something like this:

>>> post = Post.objects.get(id=1)
SELECT "blog_post"."id",
       "blog_post"."title",
       "blog_post"."content",
       "blog_post"."blog_id"
  FROM "blog_post"
 WHERE "blog_post"."id" = 1
 LIMIT 21
Execution time: 0.000161s [Database: default]

And if we want to access the blog object from the post object, we can do:

>>> post.blog
SELECT "blog_blog"."id",
       "blog_blog"."name",
       "blog_blog"."url"
  FROM "blog_blog"
 WHERE "blog_blog"."id" = 1
 LIMIT 21
Execution time: 0.000211s [Database: default]
<Blog: Django tutorials>

However, this triggered a new request to get information from the blog. So use select_related to avoid this. To use it, we can update our original request:

>>> post = Post.objects.select_related("blog").get(id=1)
SELECT "blog_post"."id",
       "blog_post"."title",
       "blog_post"."content",
       "blog_post"."blog_id",
       "blog_blog"."id",
       "blog_blog"."name",
       "blog_blog"."url"
  FROM "blog_post"
 INNER JOIN "blog_blog"
    ON ("blog_post"."blog_id" = "blog_blog"."id")
 WHERE "blog_post"."id" = 1
 LIMIT 21
Execution time: 0.000159s [Database: default]

Please note that Django is using JOIN now! And the query execution time is shorter than before. In addition, now post.blog will be cached!

>>> post.blog
<Blog: Django tutorials>

select_related also works with QurySets:

>>> posts = Post.objects.select_related("blog").all()
>>> for post in posts:
...     post.blog
...
SELECT "blog_post"."id",
       "blog_post"."title",
       "blog_post"."content",
       "blog_post"."blog_id",
       "blog_blog"."id",
       "blog_blog"."name",
       "blog_blog"."url"
  FROM "blog_post"
 INNER JOIN "blog_blog"
    ON ("blog_post"."blog_id" = "blog_blog"."id")
Execution time: 0.000241s [Database: default]
<Blog: Django tutorials>

3. ManyToMany Relations


To get post authors, we use something like this:

>>> for post in Post.objects.all():
...     post.authors.all()
...
SELECT "blog_post"."id",
       "blog_post"."title",
       "blog_post"."content",
       "blog_post"."blog_id"
  FROM "blog_post"
Execution time: 0.000242s [Database: default]
SELECT "blog_author"."id",
       "blog_author"."name"
  FROM "blog_author"
 INNER JOIN "blog_post_authors"
    ON ("blog_author"."id" = "blog_post_authors"."author_id")
 WHERE "blog_post_authors"."post_id" = 1
 LIMIT 21
Execution time: 0.000125s [Database: default]
<QuerySet [<Author: Dmytro Parfeniuk>, <Author: Will Vincent>, <Author: Guido van Rossum>]>
SELECT "blog_author"."id",
       "blog_author"."name"
  FROM "blog_author"
 INNER JOIN "blog_post_authors"
    ON ("blog_author"."id" = "blog_post_authors"."author_id")
 WHERE "blog_post_authors"."post_id" = 2
 LIMIT 21
Execution time: 0.000109s [Database: default]
<QuerySet [<Author: Dmytro Parfeniuk>, <Author: Will Vincent>]>

Looks like we got a request for each post object. Therefore, we must use prefetch_related . This is similar to select_related but used with ManyToMany Fields:

>>> for post in Post.objects.prefetch_related("authors").all():
...     post.authors.all()
...
SELECT "blog_post"."id",
       "blog_post"."title",
       "blog_post"."content",
       "blog_post"."blog_id"
  FROM "blog_post"
Execution time: 0.000300s [Database: default]
SELECT ("blog_post_authors"."post_id") AS "_prefetch_related_val_post_id",
       "blog_author"."id",
       "blog_author"."name"
  FROM "blog_author"
 INNER JOIN "blog_post_authors"
    ON ("blog_author"."id" = "blog_post_authors"."author_id")
 WHERE "blog_post_authors"."post_id" IN (1, 2)
Execution time: 0.000379s [Database: default]
<QuerySet [<Author: Dmytro Parfeniuk>, <Author: Will Vincent>, <Author: Guido van Rossum>]>
<QuerySet [<Author: Dmytro Parfeniuk>, <Author: Will Vincent>]>

What just happened ??? We reduced the number of queries from 2 to 1 to get 2 QuerySet-a!

4. Prefetch object


prefetch_related is sufficient for most cases, but it doesnโ€™t always help to avoid additional requests. For example, if we use filtering, Django cannot use our cached posts , since they were not filtered when they were requested in the first request. And we get:

>>> authors = Author.objects.prefetch_related("posts").all()
>>> for author in authors:
...     print(author.posts.filter(published=True))
...
SELECT "blog_author"."id",
       "blog_author"."name"
  FROM "blog_author"
Execution time: 0.000580s [Database: default]
SELECT ("blog_post_authors"."author_id") AS "_prefetch_related_val_author_id",
       "blog_post"."id",
       "blog_post"."title",
       "blog_post"."content",
       "blog_post"."published",
       "blog_post"."blog_id"
  FROM "blog_post"
 INNER JOIN "blog_post_authors"
    ON ("blog_post"."id" = "blog_post_authors"."post_id")
 WHERE "blog_post_authors"."author_id" IN (1, 2, 3)
Execution time: 0.000759s [Database: default]
SELECT "blog_post"."id",
       "blog_post"."title",
       "blog_post"."content",
       "blog_post"."published",
       "blog_post"."blog_id"
  FROM "blog_post"
 INNER JOIN "blog_post_authors"
    ON ("blog_post"."id" = "blog_post_authors"."post_id")
 WHERE ("blog_post_authors"."author_id" = 1 AND "blog_post"."published" = 1)
 LIMIT 21
Execution time: 0.000299s [Database: default]
<QuerySet [<Post: Post object (1)>, <Post: Post object (2)>]>
SELECT "blog_post"."id",
       "blog_post"."title",
       "blog_post"."content",
       "blog_post"."published",
       "blog_post"."blog_id"
  FROM "blog_post"
 INNER JOIN "blog_post_authors"
    ON ("blog_post"."id" = "blog_post_authors"."post_id")
 WHERE ("blog_post_authors"."author_id" = 2 AND "blog_post"."published" = 1)
 LIMIT 21
Execution time: 0.000336s [Database: default]
<QuerySet [<Post: Post object (1)>, <Post: Post object (2)>]>
SELECT "blog_post"."id",
       "blog_post"."title",
       "blog_post"."content",
       "blog_post"."published",
       "blog_post"."blog_id"
  FROM "blog_post"
 INNER JOIN "blog_post_authors"
    ON ("blog_post"."id" = "blog_post_authors"."post_id")
 WHERE ("blog_post_authors"."author_id" = 3 AND "blog_post"."published" = 1)
 LIMIT 21
Execution time: 0.000412s [Database: default]
<QuerySet [<Post: Post object (1)>]>

That is, we used prefetch_related to reduce the number of requests, but we actually increased it. To avoid this, we can customize the request using the Prefetch object :

>>> authors = Author.objects.prefetch_related(
...     Prefetch(
...             "posts",
...             queryset=Post.objects.filter(published=True),
...             to_attr="published_posts",
...     )
... )
>>> for author in authors:
...     print(author.published_posts)
...
SELECT "blog_author"."id",
       "blog_author"."name"
  FROM "blog_author"
Execution time: 0.000183s [Database: default]
SELECT ("blog_post_authors"."author_id") AS "_prefetch_related_val_author_id",
       "blog_post"."id",
       "blog_post"."title",
       "blog_post"."content",
       "blog_post"."published",
       "blog_post"."blog_id"
  FROM "blog_post"
 INNER JOIN "blog_post_authors"
    ON ("blog_post"."id" = "blog_post_authors"."post_id")
 WHERE ("blog_post"."published" = 1 AND "blog_post_authors"."author_id" IN (1, 2, 3))
Execution time: 0.000404s [Database: default]
[<Post: Post object (1)>, <Post: Post object (2)>]
[<Post: Post object (1)>, <Post: Post object (2)>]
[<Post: Post object (1)>]

We used a specific request to receive posts through the request parameter and saved the filtered messages in a new attribute. As we can see, now we have only 2 queries to the database.

All Articles