Jamie Gaskins

Ruby/Rails developer, coffee addict

Data Mapper vs Active Record

Apr 20, 2012 @ 12:00am

Martin Fowler described two main patterns of object persistence. Here they are with very simplistic descriptions:

  1. Active Record — Objects manage their own persistence
  2. Data Mapper — Object persistence is managed by a separate mapper class

There are Ruby gems with these names, but both actually implement the Active Record pattern. The DataMapper team, however, is currently working on version 2.0, which will implement the Data Mapper pattern. Also, as an academic exercise, off and on for the past few months I've been working on a gem called Perpetuity, an implementation of the Data Mapper pattern. I actually began it about two days before Piotr Solnica posted the above article. The name was chosen because "perpetuity" is the quality of lasting indefinitely.

Active Record vs Single Responsibility Principle

The entire reason for my writing that gem has been caused by the ActiveRecord gem's violation of the Single Responsibility Principle. All objects and methods in a system should follow the Unix principle of "do one thing well". Active Record (both the pattern and the gem) combine business logic and persistence logic in the same class.

An object based on the Active Record pattern represents not only the singular object but the representation in the database, as well. Additionally, methods on ActiveRecord classes are intended to operate over the entire collection of objects. This is entirely too much functionality and it should be separated.

The only semi-proper use case for an ActiveRecord-style class is when the class exists solely to represent data (a glorified struct, really) and has no behavior.

Why do we care about SRP?

Most programmers could probably skip this section, but feel free to read through it.

The Single Responsibility Principle is important in computer science because it allows us to make modifications to code without changing every single thing that that particular piece of code works with. It's like modifying the engine of a car. Let's say you want to give your engine some more power by adding a performance carburetor. But in order to let the carburetor perform at its peak capacity, you need an intake manifold that's designed to handle the increased volume of fuel/air mixture. Then you need to install a larger camshaft because a stock cam will still only draw in the same amount of oomph from the carburetor.

But then you realize that your cylinder heads aren't designed to handle that much juice coming in all at once, so you need to remachine them for that. But then you realize that that only handles the intake. You can pull some serious power into the combustion chamber, but after ignition, your exhaust system has to give it somewhere to go efficiently, so you have to modify that, too!

Before you know it, you've shaved the skin right off that yak all because you wanted to change the carburetor.

Applying this to programming, we can write code that allows us to figuratively change the carburetor without having to change the rest of the intake and exhaust systems. We'd be able to change just the carb.

How does ActiveRecord make it harder?

The hardest part about ActiveRecord (the gem) is testing. In order to run a single model spec, I need the ActiveRecord gem because the model class is a derivative of ActiveRecord::Base. Then, in order to instantiate an ActiveRecord class, I need to connect to a database server, just to test a single method that has nothing to do with persistence. When you're specifying the domain logic of your application and haven't written a single piece of Rails-specific code, there is absolutely no need for persistence.

This is definitely made ridiculously easy by leaning entirely on Rails generators for all of the boilerplate and then loading your Rails environment in tests, but all that does is move the pain from configuring ActiveRecord specifically for model specs to loading your entire Rails app to execute a single spec. For small apps, loading the Rails environment takes several seconds on a reasonably fast machine. For large apps, loading the Rails environment could take over 30 seconds. If you're doing TDD properly, this means that a simple red/green/refactor cycle could potentially take several minutes instead of a single minute or so.

Side note: If your Rails app takes more than 5-10 seconds to load, consider moving significant portions of it into another app. I'll write another post about this soon.

ActiveRecord isn't all bad

I'm really talking a lot of shit about ActiveRecord here. I don't hate it. I just disagree with it. The magic of it is what drew me to Rails back in 2005 and now, oddly, we're learning that that magic is bad.

Reinstate SRP with Data Mapper

Before we step into the Data Mapper pattern, let's have a look at how Corey Haines and Gary Bernhardt separate concerns in order to achieve their renowned fast tests.

The "Fast Rails Tests" gurus

The way I've seen Corey and Gary discuss their tests is that they extract the behavior of their models into a separate module or class and call that behavior from the model.

class CalculatesTotalPrice
  def self.for(products)
    products.map(&:price).reduce(0, &:+)
  end
end

describe CalculatesTotalPrice do
  it 'returns 0 for an empty product list' do
    no_products = []
    CalculatesTotalPrice.for(no_products).should == 0
  end

  it 'returns the sum of all product prices for a list' do
    products = [stub(price: 10), stub(price: 15)]
    CalculatesTotalPrice.for(products).should == 25
  end
end

This is an outstanding way to separate behavior from data, but I'm not sure I agree with it. This is not meant as an insult to them — I think they're both very talented people — I may just have a different view of OOP than they do.

My own views

It is my belief that data and behavior should not be separate. The two are organic to each other and they exist solely because the other exists. They're like bread and butter, love and marriage, or Jenny and Forrest. They're soulmates. Don't split them up.

Let's see some code

So, since using ActiveRecord means the objects are subclassed from ActiveRecord::Base, that means that the Data Mapper objects are the subclass of some DataMapper base class, right?

Nah, not even close. The idea behind the Data Mapper pattern is that the objects don't know anything about persistence or even the classes/objects that map them to the database. We just use plain-old Ruby objects!

For example, the Article class can look like this:

class Article
  attr_reader :comments
  def initialize(args = {})
    @comments = args.fetch(:comments) { Array.new }
  end

  def << comment
    comments << comment
  end
end

describe Article do
  describe :comments do
    it 'has an empty collection of comments upon init' do
      subject.comments.should be_empty
    end

    it 'can be given a list of comments' do
      article = Article.new(comments: [:first, :second])
      article.comments.should include :first, :second
    end

    it 'returns a collection of comments' do
      comment = Object.new
      subject << comment
      subject.should have(1).comments
    end
  end
end

All that matters is that we provide some sort of interface to the data so we can persist it. In this case, we just use an attr_reader. Ideally, we'd want to be able to write to it, too, so attr_accessor would be better, but you can use custom getter/setter methods if that works better for that particular piece of data (such as encrypted text).

With ActiveRecord, we wouldn't be able to add any object to Article#comments that isn't an instance of the Comment class due to the has_many macro. In the above spec, we don't care that what we're putting something that isn't a comment into the comments collection. We're only testing that we can put comments into it. Additionally, testing a plain Ruby object is fast. This example runs in 174ms on my machine, which includes loading the Ruby VM and the RSpec gem. At that speed, the feedback loop is limited only by how fast your fingers hit the keys.

The same example using an ActiveRecord class would take anywhere up to 30 seconds in larger apps and requires configuration — all because we inherited from ActiveRecord::Base.

The tight feedback loop isn't mandatory for developing quality software, but the tighter it is, the more you'll run the tests and the more likely you'll actually do TDD properly, which is more likely to result in better code.

Unordered List Helper for Rails

Mar 28, 2012 @ 12:00am

The Rails helper idea of "one HTML element per helper method" is a silly abstraction. I'm not sure it's the best idea for the general case. Here's an example:

<%= form_for @article do |f| %>
  <div class="field">
    <%= f.label :title %>
    <%= f.text_field :title %>
  </div>
  <div class="field">
    <%= f.label :body %>
    <%= f.text_area :body %>
  </div>
  <%= f.submit %>
<% end %>

That includes a lot of boilerplate. All we care about is rendering a form that specifies 2 fields.

Reduce that boilerplate code

With the SimpleForm gem, we can reduce the code down to something like this:

<%= simple_form_for @article do |f| %>
  <%= f.input :title %>
  <%= f.input :body %>
  <%= f.submit %>
<% end %>

That's perfect! All of the labels are inserted automatically and the wrappers for the form inputs are handled through SimpleForm configuration with sensible defaults. In this form, we've reduced the boilerplate down to 2 lines (submit and form end tag) from 8.

Forms are definitely an area where there has always been a lot of unnecessary code, especially with the power of Rails helpers. I'd actually like to see this merged into Rails at some point, but SimpleForm has a lot of functionality and customization and merging every piece of it would be a bit much. However, we could easily optimize for the general case in Rails (wrapping the inputs in a div and inserting labels) very, very simply.

Now, where else can we do something like this?

Abstracting away any extra code that we don't need is always a huge win, but where else in Rails can we do such a thing?

Lists

Ordered and unordered lists are one of the biggest areas where we still write code the same way.

<ul>
  <% @items.each do |item| %>
    <li><%= item.name %></li>
  <% end %>
<ul>

Sure, that's not a lot of code, but how many times over the course of a project do you see this? If it's a project of any decent size, it'll be at least a dozen. Looping over each item within an array (or an ActiveRecord::Relation for people that care so much about precision ;-)) within the ul has always felt odd to me. I know it's required, but that doesn't make it sit well. It's just one of those things.

Awkward feelings aside, how much better would it feel if you could, instead, write this:

<%= unordered_list @items { |item| item.name } %>

Regardless of how you feel about the loop within the containing element, this presents more cleanly. We could also use other view helpers, say linking to each item:

<%= unordered_list @items { |item| link_to item.name, item } %>

There are other pieces of HTML that go together all the time. I'll update this article with more as I think of them.

UPDATE: I submitted this as a pull request to Rails, but it was rejected. I'd forgotten that you can already use content_tag_for to iterate over a collection, but I still wanted this abstraction at the list level. I still believe that the way we do one HTML tag per helper method is silly; it feels more like translating HTML into Ruby instead of abstracting away the HTML.

Every ul tag has as its children nothing but li tags. This means that writing li at all is solely to delimit the list items themselves and hence it becomes unnecessary boilerplate that can be abstracted away.

I very much disagree with them for rejecting this and I don't think the reason of "you can already do this by writing more code that's harder to read" is a good reason to reject it. :-) However, they are the core team and I'm sure they deal with indignant pull requests all the time, so I won't add to it.

Perpetuity Object Declarations

Feb 27, 2012 @ 12:00am

So far, in Perpetuity, this is what I've got setup for Mapper declarations:

class ArticleMapper < Perpetuity::Mapper
  attribute :title, String
  attribute :body, String
  attribute :author, User

  id { title.gsub('\\W+', '-').downcase } # SEO-friendly URLs
end

What ends up happening is that it sees that it can serialise the title and body attributes of the Article model because they're String objects. But it knows it can't serialise the author attribute because it's a User instance, so it saves the id of the author object (it must already be persisted or Perpetuity throws an error … for now) in its place. When loading the association from the database, it calls UserMapper.find(model.author) and places the result into the Article instance.

This is not the cleanest idea, since we would have an actual value there before the association is loaded … and it would be wrong. I've thrown around several ideas for this:

Load all associations when the object is retrieved from the database

This has the obvious pro of never having to worry about associations in user code, but we end up retrieving extra data from the database which we may never use. Clearly, a more efficient way of handling it would be to do lazy loading, but then we end up moving away from the Data Mapper pattern and start implementing the Active Record pattern by telling the object how to work with the database.

Granted, this is done through metaprogramming (we would inject the code into the object at load time) so the written code is pure Data Mapper, but it still feels wrong. We do currently apply a little magic to the object to assign it an id when it's been persisted into/loaded from the database, but I'm trying to limit it as much as humanly possible.

Force a user to load all associations manually

This is the way it is currently implemented and seems to be the most true to the Data Mapper pattern. In order to get associated objects, a user would have to call something like so:

article = ArticleMapper.find(params[:id])
ArticleMapper.load_association!(article, :author)

This code executes a database query and assigns the proper User object to the author attribute of the article. The disadvantage of this is that we must be mindful of every single database query, but the advantage is that … we end up being mindful of every single database query. :-)

How this is an advantage is that we don't allow queries to get away from us. ActiveRecord-style associations mean that we can link up objects without caring about the consequences. I've caught myself doing this, only to look at the logs and notice that I've executed dozens of queries when I only meant to execute 3 or 4 simply because I was treating associations as data instead of separate database rows.

The verdict

If you haven't already figured this out, I'm strongly leaning toward manual loading of associations, but I want to do something a little cleaner. Calling two class methods in two lines is ugly (I understand that everything being a class method on the mapper class is a code smell in itself, as well, but I'm working on that), so I think something more like this is in order:

article = ArticleMapper.find(params[:id], associations: [:author])

That should help keep things clean while still forcing you to think about each query you're doing. We could also expand this to the Mapper#retrieve method so that we don't end up doing N+1 queries. Both MongoDB (currently the only supported DB) and SQL (would like to get this one in, but I think I'll work on that later) support selecting on inclusion of values in lists, so we could optimise from N+1 down to 2 queries.

For example, if we're displaying a list of blog articles that will be shown with comment information within the list, we could do something like this:

articles = ArticleMapper.retrieve(published: true, associations: [:comments])

Instead of iterating over each article retrieved and loading their respective associations (N queries for comments + 1 original comment query), it gets all the articles for the specified criteria and then retrieves all comments whose article ids are in that list of articles — two queries, just as with retrieving a single article and its comments. Just as importantly, it only requires a single line of clean code.

Rails Scaffolded Create Is Wrong

Feb 26, 2012 @ 12:00am

A lot of developers have gotten used to the "Rails scaffold" method of saving and loading data inside their controllers. This is roughly what happens in about 95% of #create actions:

def create
  @article = Article.new(params[:article])

  if @article.save
    redirect_to @article, notice: 'Article saved.'
  else
    render :new
  end
end

Why I disagree with this

An if statement should imply that both methods are somewhat expected, but the only reason it'll hit the else clause is if the object is invalid. You shouldn't be expecting an invalid object. The expected outcome of this action is that it creates a record in the database. That is its only function. That is where it gets its name. Being passed an invalid object is an exception and should be treated as such.

ActiveRecord (and similar ORMs like Mongoid) provide a #save! method that raises an exception when the object is invalid. My objections to that particular name scheme aside — in Ruby, convention is that bang methods modify the object in place rather than return a copy with a modified value — this should be the default behavior of the #save method.

def create
  @article = Article.create(params[:article])
  redirect_to @article, notice: 'Article saved.'
rescue ActiveRecord::RecordNotSaved
  render :new
end

This makes more sense.

TwitterGithubRss