初学者的Django ORM | 我们优化要求



Django ORM(对象关系映射)是Django最强大的功能之一。这使我们可以使用Python代码而不是SQL与数据库进行交互。

为了演示,我将描述以下模型:

from django.db import models

class Blog(models.Model):
    name = models.CharField(max_length=250)
    url = models.URLField()

    def __str__(self):
        return self.name

class Author(models.Model):
    name = models.CharField(max_length=250)

    def __str__(self):
        return self.name

class Post(models.Model):
    title = models.CharField(max_length=250)
    content = models.TextField()
    published = models.BooleanField(default=True)
    blog = models.ForeignKey(Blog, on_delete=models.CASCADE)
    authors = models.ManyToManyField(Author, related_name="posts")

我将使用django-extensions通过以下方法获取有用的信息:

python manage.py shell_plus --print-sql

因此,让我们开始:

>>> post = Post.objects.all()
>>> post
SELECT "blog_post"."id",
       "blog_post"."title",
       "blog_post"."content",
       "blog_post"."blog_id"
  FROM "blog_post"
 LIMIT 21
Execution time: 0.000172s [Database: default]
<QuerySet [<Post: Post object (1)>]>

1.直接使用ForeignKey值


>>> Post.objects.first().blog.id
SELECT "blog_post"."id",
       "blog_post"."title",
       "blog_post"."content",
       "blog_post"."blog_id"
  FROM "blog_post"
 ORDER BY "blog_post"."id" ASC
 LIMIT 1
Execution time: 0.000225s [Database: default]
SELECT "blog_blog"."id",
       "blog_blog"."name",
       "blog_blog"."url"
  FROM "blog_blog"
 WHERE "blog_blog"."id" = 1
 LIMIT 21
Execution time: 0.000144s [Database: default]
1

因此,我们在数据库中获得了1个查询:

>>> Post.objects.first().blog_id
SELECT "blog_post"."id",
       "blog_post"."title",
       "blog_post"."content",
       "blog_post"."blog_id"
  FROM "blog_post"
 ORDER BY "blog_post"."id" ASC
 LIMIT 1
Execution time: 0.000155s [Database: default]
1

2.一对多关系


如果我们使用OneToMany关系,则使用ForeignKey字段,查询看起来像这样:

>>> post = Post.objects.get(id=1)
SELECT "blog_post"."id",
       "blog_post"."title",
       "blog_post"."content",
       "blog_post"."blog_id"
  FROM "blog_post"
 WHERE "blog_post"."id" = 1
 LIMIT 21
Execution time: 0.000161s [Database: default]

如果要从post对象访问blog对象,则可以执行以下操作:

>>> post.blog
SELECT "blog_blog"."id",
       "blog_blog"."name",
       "blog_blog"."url"
  FROM "blog_blog"
 WHERE "blog_blog"."id" = 1
 LIMIT 21
Execution time: 0.000211s [Database: default]
<Blog: Django tutorials>

但是,这触发了从博客获取信息的新请求。因此,请使用select_related来避免这种情况。要使用它,我们可以更新原始请求:

>>> post = Post.objects.select_related("blog").get(id=1)
SELECT "blog_post"."id",
       "blog_post"."title",
       "blog_post"."content",
       "blog_post"."blog_id",
       "blog_blog"."id",
       "blog_blog"."name",
       "blog_blog"."url"
  FROM "blog_post"
 INNER JOIN "blog_blog"
    ON ("blog_post"."blog_id" = "blog_blog"."id")
 WHERE "blog_post"."id" = 1
 LIMIT 21
Execution time: 0.000159s [Database: default]

请注意,Django现在正在使用JOIN!而且查询执行时间比以前短。另外,现在post.blog将被缓存!

>>> post.blog
<Blog: Django tutorials>

select_related还与QurySets:

>>> posts = Post.objects.select_related("blog").all()
>>> for post in posts:
...     post.blog
...
SELECT "blog_post"."id",
       "blog_post"."title",
       "blog_post"."content",
       "blog_post"."blog_id",
       "blog_blog"."id",
       "blog_blog"."name",
       "blog_blog"."url"
  FROM "blog_post"
 INNER JOIN "blog_blog"
    ON ("blog_post"."blog_id" = "blog_blog"."id")
Execution time: 0.000241s [Database: default]
<Blog: Django tutorials>

3.多对多关系


为了获得帖子作者,我们使用如下代码:

>>> for post in Post.objects.all():
...     post.authors.all()
...
SELECT "blog_post"."id",
       "blog_post"."title",
       "blog_post"."content",
       "blog_post"."blog_id"
  FROM "blog_post"
Execution time: 0.000242s [Database: default]
SELECT "blog_author"."id",
       "blog_author"."name"
  FROM "blog_author"
 INNER JOIN "blog_post_authors"
    ON ("blog_author"."id" = "blog_post_authors"."author_id")
 WHERE "blog_post_authors"."post_id" = 1
 LIMIT 21
Execution time: 0.000125s [Database: default]
<QuerySet [<Author: Dmytro Parfeniuk>, <Author: Will Vincent>, <Author: Guido van Rossum>]>
SELECT "blog_author"."id",
       "blog_author"."name"
  FROM "blog_author"
 INNER JOIN "blog_post_authors"
    ON ("blog_author"."id" = "blog_post_authors"."author_id")
 WHERE "blog_post_authors"."post_id" = 2
 LIMIT 21
Execution time: 0.000109s [Database: default]
<QuerySet [<Author: Dmytro Parfeniuk>, <Author: Will Vincent>]>

看来我们收到了每个post对象的请求。因此,我们必须使用prefetch_related这类似于select_related,但用于ManyToMany字段:

>>> for post in Post.objects.prefetch_related("authors").all():
...     post.authors.all()
...
SELECT "blog_post"."id",
       "blog_post"."title",
       "blog_post"."content",
       "blog_post"."blog_id"
  FROM "blog_post"
Execution time: 0.000300s [Database: default]
SELECT ("blog_post_authors"."post_id") AS "_prefetch_related_val_post_id",
       "blog_author"."id",
       "blog_author"."name"
  FROM "blog_author"
 INNER JOIN "blog_post_authors"
    ON ("blog_author"."id" = "blog_post_authors"."author_id")
 WHERE "blog_post_authors"."post_id" IN (1, 2)
Execution time: 0.000379s [Database: default]
<QuerySet [<Author: Dmytro Parfeniuk>, <Author: Will Vincent>, <Author: Guido van Rossum>]>
<QuerySet [<Author: Dmytro Parfeniuk>, <Author: Will Vincent>]>

刚才发生了什么 ???我们将查询数量从2减少到1,以获得2 QuerySet-a!

4.预取对象


prefetch_related在大多数情况下足够了,但并不总是有助于避免其他请求。例如,如果我们使用过滤,则Django无法使用缓存的帖子,因为在第一个请求中被请求时,它们并未被过滤。我们得到:

>>> authors = Author.objects.prefetch_related("posts").all()
>>> for author in authors:
...     print(author.posts.filter(published=True))
...
SELECT "blog_author"."id",
       "blog_author"."name"
  FROM "blog_author"
Execution time: 0.000580s [Database: default]
SELECT ("blog_post_authors"."author_id") AS "_prefetch_related_val_author_id",
       "blog_post"."id",
       "blog_post"."title",
       "blog_post"."content",
       "blog_post"."published",
       "blog_post"."blog_id"
  FROM "blog_post"
 INNER JOIN "blog_post_authors"
    ON ("blog_post"."id" = "blog_post_authors"."post_id")
 WHERE "blog_post_authors"."author_id" IN (1, 2, 3)
Execution time: 0.000759s [Database: default]
SELECT "blog_post"."id",
       "blog_post"."title",
       "blog_post"."content",
       "blog_post"."published",
       "blog_post"."blog_id"
  FROM "blog_post"
 INNER JOIN "blog_post_authors"
    ON ("blog_post"."id" = "blog_post_authors"."post_id")
 WHERE ("blog_post_authors"."author_id" = 1 AND "blog_post"."published" = 1)
 LIMIT 21
Execution time: 0.000299s [Database: default]
<QuerySet [<Post: Post object (1)>, <Post: Post object (2)>]>
SELECT "blog_post"."id",
       "blog_post"."title",
       "blog_post"."content",
       "blog_post"."published",
       "blog_post"."blog_id"
  FROM "blog_post"
 INNER JOIN "blog_post_authors"
    ON ("blog_post"."id" = "blog_post_authors"."post_id")
 WHERE ("blog_post_authors"."author_id" = 2 AND "blog_post"."published" = 1)
 LIMIT 21
Execution time: 0.000336s [Database: default]
<QuerySet [<Post: Post object (1)>, <Post: Post object (2)>]>
SELECT "blog_post"."id",
       "blog_post"."title",
       "blog_post"."content",
       "blog_post"."published",
       "blog_post"."blog_id"
  FROM "blog_post"
 INNER JOIN "blog_post_authors"
    ON ("blog_post"."id" = "blog_post_authors"."post_id")
 WHERE ("blog_post_authors"."author_id" = 3 AND "blog_post"."published" = 1)
 LIMIT 21
Execution time: 0.000412s [Database: default]
<QuerySet [<Post: Post object (1)>]>

也就是说,我们使用prefetch_related减少了请求数量,但实际上增加了它。为了避免这种情况,我们可以使用Prefetch对象来自定义请求

>>> authors = Author.objects.prefetch_related(
...     Prefetch(
...             "posts",
...             queryset=Post.objects.filter(published=True),
...             to_attr="published_posts",
...     )
... )
>>> for author in authors:
...     print(author.published_posts)
...
SELECT "blog_author"."id",
       "blog_author"."name"
  FROM "blog_author"
Execution time: 0.000183s [Database: default]
SELECT ("blog_post_authors"."author_id") AS "_prefetch_related_val_author_id",
       "blog_post"."id",
       "blog_post"."title",
       "blog_post"."content",
       "blog_post"."published",
       "blog_post"."blog_id"
  FROM "blog_post"
 INNER JOIN "blog_post_authors"
    ON ("blog_post"."id" = "blog_post_authors"."post_id")
 WHERE ("blog_post"."published" = 1 AND "blog_post_authors"."author_id" IN (1, 2, 3))
Execution time: 0.000404s [Database: default]
[<Post: Post object (1)>, <Post: Post object (2)>]
[<Post: Post object (1)>, <Post: Post object (2)>]
[<Post: Post object (1)>]

我们使用特定查询通过query参数接收帖子,并将过滤后的消息保存在新属性中。如我们所见,现在我们只有2个查询到数据库。

All Articles