In this post I want to describe how to fetch attributes from files uploaded via ActiveStorage, without causing N+1 queries. I hope this is relevant to anyone running into the same issues while building a Rails application.

Setup

Users can upload multiple PDF documents. Each PDF document is stored in S3 and uploaded via Rails ActiveStorage. The user can see a list of attributes of the PDF in a typical index view. The list contains data take about each PDF after it was analysed. We store that data in an outputs table.

# app/models/upload.rb

class Upload < ApplicationRecord
  has_many_attached :pdfs
  has_many :outputs
# app/models/output.rb

class Output < ApplicationRecord
  belongs_to :upload

Note, that there are no models for the active_storage tables that Rails creates. By default they look as follows in the schema:

# db/schema.rb

create_table "active_storage_attachments", force: :cascade do |t|
  t.string "name", null: false
  t.string "record_type", null: false
  t.bigint "record_id", null: false
  t.bigint "blob_id", null: false
  t.datetime "created_at", null: false
  t.index ["blob_id"], name: "index_active_storage_attachments_on_blob_id"
  t.index ["record_type", "record_id", "name", "blob_id"], name: "index_active_storage_attachments_uniqueness", unique: true
end

create_table "active_storage_blobs", force: :cascade do |t|
  t.string "key", null: false
  t.string "filename", null: false
  t.string "content_type"
  t.text "metadata"
  t.string "service_name", null: false
  t.bigint "byte_size", null: false
  t.string "checksum"
  t.datetime "created_at", null: false
  t.index ["key"], name: "index_active_storage_blobs_on_key", unique: true
end

The outputs index view lists attributes on the outputs table, but should also show the filename of the PDF and link to it. The filename lives on active_storage_blobs by default.

Furthermore, the index view should link to the uploaded PDF document, so that the user can download it. That requires the blob record for the uploaded PDF, so it can be used in the rails_blob_path helper.

rails_blob_path(output.blob, disposition: 'attachment')

Challenge

The challenge is how to get from the output to the active_storage_blob record to generate the link to the PDF and display the filename. As a first step I stored the blob_id on each output record.

Outputs were created after the upload of PDFs had finished, through an after_commit callback. THis allowed assigning the blob_id of the pdf_attachment to the output record.

# app/models/upload.rb

def create_outputs
  pdfs.each do |pdf|
    Output.create(upload: self, blob_id: pdf.blob_id)
  end
end
# db/schema.rb

create_table "outputs", force: :cascade do |t|
  t.bigint "blob_id"
  t.bigint "upload_id"
  t.string "title"
  t.integer "page_count"
  t.datetime "created_at", null: false
  t.datetime "updated_at", null: false
  t.index ["upload_id"], name: "index_outputs_on_upload_id"
end

The link between output and active_storage_blob records now exists. It maade it possivle to get the filename of the PDF for each output with a filename method on output

# app/models/output.rb

def filename
  ActiveStorageService.find_filename_from_blob_id(blob_id)
end

The ActiveStorage service had a method which wrapped a sequel query to fetch the filename

# app/services/active_storage_service.rb

def self.find_filename_from_blob_id(blob_id)
  find_filename_from_blob_id_sql = "
  SELECT filename FROM active_storage_blobs
  WHERE id = '#{blob_id}';"

  ActiveRecord::Base.connection.execute(find_filename_from_blob_id_sql).values.flatten.first
end

This worked. However, the way the filename was fetched caused another query for each output, painfully slowing down the index page. A classical N+1 scenario. The multiple queries can be observed in the Rails server logs. The bullet gem can further help debug the issue.

Solution

ActiveStorage tables are generated with a Rails command. But, they come without models, flying under the radar. In the ActiveStorage documentation, attributes such as filename and content_type are accessed through the model of the record_type, the upload model in our case.

The solution was to build a relationship between Output and ActiveStorage::Blob. That way the filename could be accessed through the output.blob relationship.

# app/models/output.rb

belongs_to :blob, class_name: 'ActiveStorage::Blob'

# ...

def filename
  blob.filename.to_s
end

We just need to make sure to eager load blob when fetching outputs. This is achieved by using includes in the ActiveRecord query.

# app/controllers/outputs_controller.rb

class OutputsController < ApplicationController
  def index
    @outputs = Output.not_exported.includes(:blob).find_each(batch_size: 100, order: :desc)
  end