Jamie Gaskins

Ruby/Rails developer, coffee addict

Why I Like Developing with MongoDB

Published Jan 28, 2013

MongoDB is a document-oriented database. When I say "document", I don't mean the Microsoft Office variety. Specifically, it stores BSON documents. BSON is a form of JSON, but rather than JSON's text representation, BSON is stored in binary form for efficiency. For the purposes here, BSON and JSON may be used interchangeably. Keep in mind that they are mostly equivalent, just stored in different form.

Some people hate MongoDB

There is a lot of MongoDB hate. A lot. I'm not going to go into examples here, but a lot of people have historically lost data with MongoDB due to not really knowing how to configure it properly. This is probably also a fault of the authors for not making it painfully obvious how to configure the database server/cluster for their purposes.

The problem comes from the database defaults being tuned for performance. This gives it excellent benchmarks and in a single-server installation this is fine, but makes durability across a cluster an issue. However, a cluster can be tuned for durability. I won't go into that here, though, because this article isn't about configuring MongoDB.

The only reasonable complaint I've personally seen is from people losing data after upgrading their MongoDB installation. This is bad, but as with any upgrade, you should backup your data first. Importing afterward is pretty straightforward.

Why I love MongoDB

When it comes to programming, there are a lot of reasons to choose one language or framework over another. I choose Ruby because, even with all of its drawbacks, it still conforms to my tastes better than any other language I've used. One of the core philosophies of Rails (besides "do what DHH feels like") is that minor details get out of the way and let you focus on building web apps. This is why we don't have to think about things like CSRF protection and HTTP headers/requests/responses except in special cases. I love that I don't have to defend against CSRF in every POST request or even think about HTTP at all in the vast majority of my controllers.

MongoDB shows similar qualities to both Ruby and Rails and that's what I love about developing applications with it.

Flexibility of data types

For the most part, JSON values are straightforward. A value:

  • surrounded by quotes is a string
  • surrounded by square brackets is an array/list
    • Each value in the array can also be of any type
  • surrounded by curly braces is a JSON object/hash/map/dictionary with keys and values (Note: BSON has a couple restrictions on keys)
  • without any decoration is numeric (or a variable reference, if supported by whatever you're doing with JSON)

There is no type declaration for your data. You don't tell the DB that all "email" attributes have to be a string. They don't even all have to be the same type. If you want your values to be numeric in some, strings in others, and objects in others, you can do that.

SQL databases, on the other hand, are pretty inflexible, which is annoying in development. Every field in the same column of every row has to have the same type — I realize that this is fine for most cases, but there are times when that's infeasible. One reason I develop in Ruby is so that I'm not constrained by types. Every object can be any type of object.

Databases are used primarily to store the state of an object, so if an object can hold different types of data in the same attribute, I should be able to store that as such in the database. I might have a legitimate reason to store strings and numeric values in the same field — and storing every single value as a string, then converting back to integers/floats may not be what I want to do.

HERE BE DRAGONS

The flexibility of data types can cause trouble with existing data if you decide to change things down the line. If, for example, one of your classes decides to assume that one of its objects' attributes will be stored as strings when it's been nothing but numerics so far, you'll need to ensure that this is true.

# Using Perpetuity
my_objects = Perpetuity[MyClass].all
my_objects.each do |object|
  object.my_attribute = object.my_attribute.to_s
  Perpetuity[MyClass].save object
end

The only way I could find to do this was to update each document individually. I was hoping I could pass a JS function to the update, which would let me run the update in a single query, but I couldn't figure out a way to do that. If anyone knows if this is possible, tweet at me.

Flexibility of structure

In a SQL DB, every time you add a new data attribute to an object that needs to be persisted, you need to add a column to the DB. In apps with large amounts of data, this can cause downtime, which can cost you money. In development, this stops the developer's momentum while she runs a migration. If that data changes for any reason, that's another migration.

The single best thing from a developer's point of view is that adding an attribute to a MongoDB collection is that it's as simple as adding the key to the document. There is no ALTER TABLE. You just pretend it was there all along. You can treat documents without that key as having nil as that attribute's value (including in queries). This is the default state of any instance variable or hash lookup in Ruby anyway.

Some people claim this is actually a weakness, that it can hide bugs in your code, that a rigid structure will raise exceptions when you try to give it an invalid attribute. That last part is true, but I have my doubts about it hiding bugs in your code. I guess it depends on how these documents are generated. For example, in Perpetuity, all BSON documents are generated from object state. The only way you can put the wrong data into your database is if your objects are storing things in the wrong instance variables or your mappers are serializing the wrong attributes, which means your testing could use some improvement.

Some also claim that it's a weakness because it bloats your data — every document has to explicitly specify which attributes hold which values (whereas in a SQL database, this is determined by the value's position in the row). This is true, but that's the cost of flexibility. SQL databases aren't exempt from this type of overhead, though. Every NULL field in a SQL row carries extra cost, as well (though arguably not as much, depending on the column type), whereas document databases can simply leave that attribute out. It's definitely a trade-off, but I can't imagine it'd make or break most applications. If keys are a significant portion of your documents' size and data size is an issue in your application, maybe a document database isn't the best use case for you.

The last justification is completely outside the scope of this article because I'm aiming for a developer-happiness perspective and data size means sweet frak-all in that light, but I figure someone that reads this would probably mention it.

It plays along with whatever I do

When you start developing on an existing Ruby on Rails application backed by a SQL database, you have to:

  1. create the database
  2. ensure the DBMS you're using for development has the right user account on it (for example, "root" with no password in MySQL) and configure your app to use that
  3. load your schema
  4. check Twitter
  5. write code that talks to the database

When you start working with an existing app backed by MongoDB, you:

  1. write code that talks to the database
  2. there is no step 2

It creates the DB on the fly. It defaults to no authentication. If you write to a collection that doesn't exist, it creates that, too. You get to stop worrying about the details and focus on the stuff that matters.

If you're logged into a PostgreSQL server as a user that has permission to create databases and you try to access a database that doesn't exist, why is the response "it doesn't exist"? I can't imagine a situation where I'm trying to talk to a database that isn't there and an error is the best result (unless the DB can't be created). Why do I have to make my intent explicit? When I say "talk to this database", it pretty much implies that I want to talk to it unless there is no way you can possibly let me talk to it, such as a disk error, network error or insufficient permissions.

I'm not saying there aren't plenty of times you want things to be explicit in programming. There are a lot of cases where being explicit is superior. This is not one of those times.

Conclusion

Maybe MongoDB isn't right for your particular use case because your app has requirements that are more important than developer happiness. Maybe your ops person/team doesn't have enough experience with MongoDB to keep it in their toolbelt. Maybe you need a graph database or table joins or transactions. But for most apps, I use MongoDB because I find it more fun to work with; this keeps me motivated and helps me work quickly.

TwitterGithubRss