Jamie Gaskins

Ruby/Rails developer, coffee addict

Hashie vs OpenStruct vs POROs

Published Dec 20, 2014

At work the other day, we were discussing the merits of Hashie::Mash vs OpenStruct (then yesterday I saw this article from Richard Schneeman which also discusses them). One of the things we discussed was the performance aspect of one over the other. We didn't have any actual benchmarks to use, so I went ahead and wrote some out.

        Hashie alloc    233.387k (± 3.3%) i/s -      1.184M
    OpenStruct alloc     78.249k (± 4.3%) i/s -    395.395k
       Hashie access    549.476k (± 2.1%) i/s -      2.772M
      OStruct access      2.863M (± 4.3%) i/s -     14.390M

This shows that initializing a Hashie::Mash is 2.7x as fast as an OpenStruct, but in accessing attributes (each one had 3 attributes and I hit them all), OpenStruct led by a factor of 5.2.

When you consider that initialization happens once and attribute access happens frequently (you will always access attributes at least as often as initializing, otherwise the object isn't worth a whole heck of a lot), you'd want the one with faster accessors, so you'd want to use OpenStruct, right?

Nope.

The Method Cache and You

All of the major Ruby implementations have a data structure that stores the locations of methods in memory. Looking these methods up every time you access them is slow, so they make use of a cache that speeds up lookup of methods you're actually using. This explanation is intended to be a bit hand-wavy because there are other articles that talk about the method cache. Just know that it's important to the performance of your app.

Several things you can do in Ruby will invalidate some or all of that method cache depending on which Ruby implementation you're using. define_singleton_method, extend, or defining a new class will all do it. When this happens, the VM has to fall back on slow method lookup.

Why do I tell you all this? Because this happens every time you initialize an OpenStruct. This is the reason it's slow to instantiate.

The only way to provide those dynamic accessors without using define_singleton_method is to use method_missing. That is what Hashie::Mash uses. Unfortunately, that comes with its own drawbacks.

Excuse me, have you seen my method?

method_missing is a powerful tool in Ruby, just like dynamic method definition. It helps us do a lot of amazing metaprogramming things. However, that flexibility comes with performance tradeoffs. It's important to realize what all happens before that method is invoked.

  • Send an object a message
  • VM doesn't have a method cached for that message, so we invoke the slow-lookup route I mentioned in the previous section
    • VM checks the object's singleton class (every object has one that contains the methods defined using define_singleton_method; you can check it out using object.singleton_class)
    • VM walks up the ancestor chain of the object's class (object.class.ancestors) to find a method that handles that message.
  • Invoke method_missing

When you send an object a message and the VM doesn't have that method cached (it doesn't exist), so it has to hit the slow-lookup route I mentioned in the previous section. This checks the object's singleton class first (every object has one that contains the methods defined using define_singleton_method; you can check it out using object.singleton_class), then walks up the ancestor chain of the object's class (object.class.ancestors) to find a method that handles that message.

For Hashie::Mash in a Rails app, there are no fewer than 17 classes and modules it has to check. In the benchmark above, there were 9. This means Hashie::Mash will perform significantly worse after Rails injects all those mixins into your ancestor chain. Our app had roughly a 12% decrease in Hashie::Mash access performance.

So … what do you want me to do?

If you want to be able to access attributes with a dot instead of brackets, it really is best to write a class for it. It's faster to allocate and faster to get attributes. Here's the same benchmark as above including plain-old Ruby objects (POROs).

        Hashie alloc    233.387k (± 3.3%) i/s -      1.184M
    OpenStruct alloc     78.249k (± 4.3%) i/s -    395.395k
          PORO alloc    558.183k (± 2.7%) i/s -      2.812M
       Hashie access    549.476k (± 2.1%) i/s -      2.772M
      OStruct access      2.863M (± 4.3%) i/s -     14.390M
         PORO access      8.096M (± 4.9%) i/s -     40.490M

Allocating a PORO is faster than accessing 3 attributes on a Hashie::Mash. In fact, you could allocate a PORO and access all of its attributes with time to spare before a pre-allocated Hashie::Mash just touches its attributes.

Don't get me wrong, using Hashie::Mash or OpenStruct is awesome for prototyping and just getting a feature out the door, but performance is important, so I'd always recommend writing a class for the object you're trying to pass or return. This also has a happy side effect of documenting what that data is; an object telling me explicitly that it's an Article is better than me having to figure it out based on keys and values of a hash.

TwitterGithubRss