Data Mapper vs Active Record

Published Apr 20, 2012

Martin Fowler described two main patterns of object persistence. Here they are with very simplistic descriptions:

Active Record — Objects manage their own persistence
Data Mapper — Object persistence is managed by a separate mapper class

There are Ruby gems with these names, but both actually implement the Active Record pattern. The DataMapper team, however, is currently working on version 2.0, which will implement the Data Mapper pattern. Also, as an academic exercise, off and on for the past few months I've been working on a gem called Perpetuity, an implementation of the Data Mapper pattern. I actually began it about two days before Piotr Solnica posted the above article. The name was chosen because "perpetuity" is the quality of lasting indefinitely.

Active Record vs Single Responsibility Principle

The entire reason for my writing that gem has been caused by the ActiveRecord gem's violation of the Single Responsibility Principle. All objects and methods in a system should follow the Unix principle of "do one thing well". Active Record (both the pattern and the gem) combine business logic and persistence logic in the same class.

An object based on the Active Record pattern represents not only the singular object but the representation in the database, as well. Additionally, methods on ActiveRecord classes are intended to operate over the entire collection of objects. This is entirely too much functionality and it should be separated.

The only semi-proper use case for an ActiveRecord-style class is when the class exists solely to represent data (a glorified struct, really) and has no behavior.

Why do we care about SRP?

Most programmers could probably skip this section, but feel free to read through it.

The Single Responsibility Principle is important in computer science because it allows us to make modifications to code without changing every single thing that that particular piece of code works with. It's like modifying the engine of a car. Let's say you want to give your engine some more power by adding a performance carburetor. But in order to let the carburetor perform at its peak capacity, you need an intake manifold that's designed to handle the increased volume of fuel/air mixture. Then you need to install a larger camshaft because a stock cam will still only draw in the same amount of oomph from the carburetor.

But then you realize that your cylinder heads aren't designed to handle that much juice coming in all at once, so you need to remachine them for that. But then you realize that that only handles the intake. You can pull some serious power into the combustion chamber, but after ignition, your exhaust system has to give it somewhere to go efficiently, so you have to modify that, too!

Before you know it, you've shaved the skin right off that yak all because you wanted to change the carburetor.

Applying this to programming, we can write code that allows us to figuratively change the carburetor without having to change the rest of the intake and exhaust systems. We'd be able to change just the carb.

How does ActiveRecord make it harder?

The hardest part about ActiveRecord (the gem) is testing. In order to run a single model spec, I need the ActiveRecord gem because the model class is a derivative of ActiveRecord::Base. Then, in order to instantiate an ActiveRecord class, I need to connect to a database server, just to test a single method that has nothing to do with persistence. When you're specifying the domain logic of your application and haven't written a single piece of Rails-specific code, there is absolutely no need for persistence.

This is definitely made ridiculously easy by leaning entirely on Rails generators for all of the boilerplate and then loading your Rails environment in tests, but all that does is move the pain from configuring ActiveRecord specifically for model specs to loading your entire Rails app to execute a single spec. For small apps, loading the Rails environment takes several seconds on a reasonably fast machine. For large apps, loading the Rails environment could take over 30 seconds. If you're doing TDD properly, this means that a simple red/green/refactor cycle could potentially take several minutes instead of a single minute or so.

Side note: If your Rails app takes more than 5-10 seconds to load, consider moving significant portions of it into another app. I'll write another post about this soon.

ActiveRecord isn't all bad

I'm really talking a lot of shit about ActiveRecord here. I don't hate it. I just disagree with it. The magic of it is what drew me to Rails back in 2005 and now, oddly, we're learning that that magic is bad.

Reinstate SRP with Data Mapper

Before we step into the Data Mapper pattern, let's have a look at how Corey Haines and Gary Bernhardt separate concerns in order to achieve their renowned fast tests.

The "Fast Rails Tests" gurus

The way I've seen Corey and Gary discuss their tests is that they extract the behavior of their models into a separate module or class and call that behavior from the model.

class CalculatesTotalPrice
  def self.for(products)
    products.map(&:price).reduce(0, &:+)
  end
end

describe CalculatesTotalPrice do
  it 'returns 0 for an empty product list' do
    no_products = []
    CalculatesTotalPrice.for(no_products).should == 0
  end

  it 'returns the sum of all product prices for a list' do
    products = [stub(price: 10), stub(price: 15)]
    CalculatesTotalPrice.for(products).should == 25
  end
end

This is an outstanding way to separate behavior from data, but I'm not sure I agree with it. This is not meant as an insult to them — I think they're both very talented people — I may just have a different view of OOP than they do.

My own views

It is my belief that data and behavior should not be separate. The two are organic to each other and they exist solely because the other exists. They're like bread and butter, love and marriage, or Jenny and Forrest. They're soulmates. Don't split them up.

Let's see some code

So, since using ActiveRecord means the objects are subclassed from ActiveRecord::Base, that means that the Data Mapper objects are the subclass of some DataMapper base class, right?

Nah, not even close. The idea behind the Data Mapper pattern is that the objects don't know anything about persistence or even the classes/objects that map them to the database. We just use plain-old Ruby objects!

For example, the Article class can look like this:

class Article
  attr_reader :comments
  def initialize(args = {})
    @comments = args.fetch(:comments) { Array.new }
  end

  def << comment
    comments << comment
  end
end

describe Article do
  describe :comments do
    it 'has an empty collection of comments upon init' do
      subject.comments.should be_empty
    end

    it 'can be given a list of comments' do
      article = Article.new(comments: [:first, :second])
      article.comments.should include :first, :second
    end

    it 'returns a collection of comments' do
      comment = Object.new
      subject << comment
      subject.should have(1).comments
    end
  end
end

All that matters is that we provide some sort of interface to the data so we can persist it. In this case, we just use an attr_reader. Ideally, we'd want to be able to write to it, too, so attr_accessor would be better, but you can use custom getter/setter methods if that works better for that particular piece of data (such as encrypted text).

With ActiveRecord, we wouldn't be able to add any object to Article#comments that isn't an instance of the Comment class due to the has_many macro. In the above spec, we don't care that what we're putting something that isn't a comment into the comments collection. We're only testing that we can put comments into it. Additionally, testing a plain Ruby object is fast. This example runs in 174ms on my machine, which includes loading the Ruby VM and the RSpec gem. At that speed, the feedback loop is limited only by how fast your fingers hit the keys.

The same example using an ActiveRecord class would take anywhere up to 30 seconds in larger apps and requires configuration — all because we inherited from ActiveRecord::Base.

The tight feedback loop isn't mandatory for developing quality software, but the tighter it is, the more you'll run the tests and the more likely you'll actually do TDD properly, which is more likely to result in better code.

Jamie Gaskins

Ruby/Rails developer, coffee addict