Computer stuff, Ruby

Rails: Be wary of N+1 queries with `order`

I had to solve an interesting performance issue today, so I thought I might as well document it for the posterity. It involved fixing an N+1 query in a snippet of code where all the objects were correctly fetched with includes.

As you might have guessed from the title, the issue stemmed from an interesting interaction with order. I've prepared a nice little example to showcase the issue and the solution. Sounds interesting? Then let's go!

The setup

Let's say I'm building an app to manage a library of consoles and video games.

classDiagram direction LR Console *-- Game Console: +string name Game: +string title

Such an app can be bootstrapped very easily thanks to Rails' scaffolds. In fact, you can head over to my Github and find a simple implementation there.

https://github.com/Richard-Degenne/rails-includes-order

In this application, I would like to create a route to export the consoles and their respective games in a JSON document.

A simple implementation could look something like this:

Console.all.map do |console|
  games_data = console.games.map { |game| game.slice(:title) }
  console.slice(:name).merge(games: games_data)
end

And this works just as expected:

[
  {
    "name": "NES",
    "games": [
      {
        "title": "The Legend of Zelda"
      },
      {
        "title": "Teenage Mutant Ninja Turtles"
      },
      // ...
    ]
  },
  // ...
]

The catch

N+1 queries

If you have a bit of experience with ActiveRecord, you can probably already see a problem. If we check the logs of the application, we can see the following.

  Console Load (0.5ms)  SELECT "consoles".* FROM "consoles"
  ↳ app/controllers/exports_controller.rb:14:in `map'
  Game Load (0.8ms)  SELECT "games".* FROM "games" WHERE "games"."console_id" = ?  [["console_id", 1]]
  ↳ app/controllers/exports_controller.rb:15:in `map'
  Game Load (0.3ms)  SELECT "games".* FROM "games" WHERE "games"."console_id" = ?  [["console_id", 2]]
  ↳ app/controllers/exports_controller.rb:15:in `map'
  Game Load (0.2ms)  SELECT "games".* FROM "games" WHERE "games"."console_id" = ?  [["console_id", 3]]
  ↳ app/controllers/exports_controller.rb:15:in `map'
  Game Load (0.3ms)  SELECT "games".* FROM "games" WHERE "games"."console_id" = ?  [["console_id", 4]]
  ↳ app/controllers/exports_controller.rb:15:in `map'

ActiveRecord executes a first SQL query that selects the consoles from the consoles table. This query originates from the Console.all part of the snippet above.

Below that, we can see a bunch of queries that look exactly the same. These queries load the game for each console. This is also where they get their name from: 1 query for the first model, and then N more queries for the second model, for a total of N+1 query.

Obviously, such queries are performance killers. With 3 or 4 consoles, the difference in not even noticeable. But, should our application scale, we could be doing hundreds of even thousands of superfluous SQL queries which to grind our export route to a complete halt.

For instance, with just 1000 consoles (and 5 games each), the snippet above takes above 900 ms to process.

includes to the rescue

To get rid of these nasty N+1 queries, ActiveRecord allows to "include" nested records when querying for a model. We can improve on the first implementation by including the games when querying for all consoles.

Console.all.includes(:games).map do |console|
  games_data = console.games.map { |game| game.slice(:title) }
  console.slice(:name).merge(games: games_data)
end

We can now see in the logs that the multiple queries got replacing two queries.

  Console Load (2.3ms)  SELECT "consoles".* FROM "consoles"
  ↳ app/controllers/exports_controller.rb:14:in `map'
  Game Load (26.2ms)  SELECT "games".* FROM "games" WHERE "games"."console_id" IN (1, 2, 3, 4)
  ↳ app/controllers/exports_controller.rb:14:in `map'

The first one is the same than before, but the second one now fetches the games for all consoles at once, in a single request.

All is well that ends well, our app will now scale to the moon. The end. 👌

Or is it?

My export is nice and all, but now I want the consoles to be sorted by name, and the games in each console sorted by title. ActiveRecord offers the order method which roughly translates into an ORDER BY clause in the SQL query. We can add that into our export code:

Console.all.includes(:games).order(:name).map do |console|
  games_data = console.games.order(:title).map { |game| game.slice(:title) }
  console.slice(:name).merge(games: games_data)
end

Neato. The result looks just as expected.

[
  {
    "name": "Mega Drive",
    "games": [
      {
        "title": "Earthworm Jim"
      },
      {
        "title": "Mortal Kombat 2"
      },
      {
        "title": "Sonic the Hedgehog 2"
      },
      {
        "title": "Street of Rage 2"
      }
    ]
  },
  {
    "name": "NES",
    "games": [
      {
        "title": "Metroid"
      },
      {
        "title": "Super Mario Bros."
      },
      // ...
    ]
  },
  {
    "name": "PlayStation",
    "games": [
      // ...
    ]
  },
  {
    "name": "SNES",
    "games": [
      // ...
    ]
  }
]

Hovever, should we take a look at the logs, we notice that the N+1 queries are back, even though we use includes!

  Console Load (0.1ms)  SELECT "consoles".* FROM "consoles" ORDER BY "consoles"."name" ASC LIMIT ?  [["LIMIT", 4]]
  ↳ app/controllers/exports_controller.rb:14:in `map'
  Game Load (0.2ms)  SELECT "games".* FROM "games" WHERE "games"."console_id" IN (?, ?, ?, ?)  [["console_id", 4], ["console_id", 1], ["console_id", 3], ["console_id", 2]]
  ↳ app/controllers/exports_controller.rb:14:in `map'
  Game Load (0.2ms)  SELECT "games".* FROM "games" WHERE "games"."console_id" = ? ORDER BY "games"."title" ASC  [["console_id", 4]]
  ↳ app/controllers/exports_controller.rb:15:in `map'
  Game Load (0.2ms)  SELECT "games".* FROM "games" WHERE "games"."console_id" = ? ORDER BY "games"."title" ASC  [["console_id", 1]]
  ↳ app/controllers/exports_controller.rb:15:in `map'
  Game Load (0.1ms)  SELECT "games".* FROM "games" WHERE "games"."console_id" = ? ORDER BY "games"."title" ASC  [["console_id", 3]]
  ↳ app/controllers/exports_controller.rb:15:in `map'
  Game Load (0.1ms)  SELECT "games".* FROM "games" WHERE "games"."console_id" = ? ORDER BY "games"."title" ASC  [["console_id", 2]]
  ↳ app/controllers/exports_controller.rb:15:in `map'

I don't know nearly enough about ActiveRecord's internals to explain clearly what is happening here, but I'm pretty sure that it recognizes that, because of the order, the data that was fetched by the includes query is not applicable here because it might not be in the correct order. And because of that, ActiveRecord decides to fetch data again, but this time console by console, hence the N+1 queries.

Of course, we could simply replace the order(:title) with something like sort_by(&:title). The difference between the two is that sort_by is going to do all the work in-memory, and I would like the example to stay focused on the ORM aspect on the problem.

order is cooler than you think

So, what do we to get rid of these nasty N+1 queries that we've introduced?

Well, this is where the cool trick I mentioned in the introduction comes into play. As it turns out, it is possible to order an ActiveRecord relation using an association. To do this, we have to pass it a string (and not a symbol) with the name of the associated table and the sort column, separated by a dot, SQL style.

Console.all.includes(:games).order(:name, 'games.title')

(We add 'games.title' as a secondary sort because we still want to keep :name as our primary sort.)

And because the games' records were fetched in the right order, we don't need to specify an order or a sort in the inner loop of our code. Here's what it looks like now.

Console.all.includes(:games).order(:name, 'games.title').map do |console|
  games_data = console.games.map { |game| game.slice(:title) }
  console.slice(:name).merge(games: games_data)
end

I think the coolest part in all of this is how the query gets translated into SQL. Let's take a look at the logs.

  SQL (1.8ms)  SELECT "consoles"."id" AS t0_r0, "consoles"."name" AS t0_r1, "consoles"."created_at" AS t0_r2, "consoles"."updated_at" AS t0_r3, "games"."id" AS t1_r0, "games"."title" AS t1_r1, "games"."console_id" AS t1_r2, "games"."created_at" AS t1_r3, "games"."updated_at" AS t1_r4 FROM "consoles" LEFT OUTER JOIN "games" ON "games"."console_id" = "consoles"."id" ORDER BY "consoles"."name" ASC, games.title
  ↳ app/controllers/exports_controller.rb:14:in `map'

So first things first: the N+1 queries are gone! Yay! But instear of the two queries that we had earlier (one for the consoles, one for the games), we only have one single query.

It is a bit longer than the others, so here it is prettified.

SELECT 
  "consoles"."id" AS t0_r0, 
  "consoles"."name" AS t0_r1, 
  "consoles"."created_at" AS t0_r2, 
  "consoles"."updated_at" AS t0_r3, 
  "games"."id" AS t1_r0, 
  "games"."title" AS t1_r1, 
  "games"."console_id" AS t1_r2, 
  "games"."created_at" AS t1_r3, 
  "games"."updated_at" AS t1_r4 
FROM 
  "consoles" 
  LEFT OUTER JOIN "games" ON "games"."console_id" = "consoles"."id" 
ORDER BY 
  "consoles"."name" ASC, 
  games.title

Again, I don't know much about how ActiveRecord handles that, but it has identified that we ordered on an association field, and thus the includes is turned into a LEFT OUTER JOIN clause, fetching all consoles and their respective games, all sorted according to our criteria.

Jeff Goldblum GIFs are the best.

Conclusion

And there your have it! In short: always keep an eye on your SQL logs when developing with Rails. Not only will you catch lots of sub-optimal queries that risk harming the performance of your app, but you will also find weird little interactions everywhere in ActiveRecord, and I love it.

A question, a feedback? Write a comment!