From working the last 6 plus years in the web industry, I’ve come to learn …
Here in the bucolic surroundings of Centresource, we have many tools at
our disposal. We build many types of products and consult with even more
types of businesses. So we have to be flexible and adapt the needs of
our clients. Chief among the tools we use for web product development are Ruby on Rails and Drupal.
In the interest of full disclosure, I learned web development via the
Ruby on Rails framework and heavily biased towards it; I’m equally ignorant
of Drupal and I am thankful we have a team of highly talented developers
who specialize in that particular CMS so I can focus on learning Rails
thoroughly. Rarely do our products in the different frameworks need to
Such was exactly the case the other day, however, when I was tasked
with importing content from one of our Drupal sites into a Rails
application. My initial reaction: how hard could it be? The answer: not
too hard but interesting enough to be worthy of a blog post. So here we
- First and perhaps most important is that this cross-framework mating was not to be– at least with the current set of requirements ongoing. I was told it was strictly one-off.
- The Drupal site has many different type/categories of blog posts, but the client only wanted to port over one specific type.
- The model already exists on the Rails side, but the imported content obviously needed to match up with the existing schema.
Connecting to the Content
The first step was to get a dump of the existing Drupal database. Most
providers, including the one in this case (who shall remain nameless to
protect the innocent), provide tools to dump your database. Once that
was done, I imported the dump in Sequel Pro. I can hear a few of you out
there groaning about the use of such a tool; for the task at hand
and the short time to accomplish it, having a GUI tool really helped
move things along. It took some time to navigate my way through all of
the join tables that Drupal employs, but I eventually found the content
and identifying category id I was looking for. From there I edited my
database.yml file in my Rails project. Because the Rails side is using
Postrgesql and the Drupal side is using MySQL, I had to add the mysql2
gem to my gem file. After quick bundle, I was ready to go
Using a Rake Task to Update My Local Database
The most important part of this part of the process is the first step:
get an up-to-date copy of the database (staging first, and eventually
production) and be sure to get it and execute these steps during a time
of low user activity– you’ll see why shortly. With a local copy of both the
Rails and Drupal databases, I set up a rake task to pull in content from
the Drupal side to the Rails side. In my rake task, I built
ActiveRecord-backed classes to give the data I were accessing that old
familiar ActiveRecord feel.
One could very rightly make a case for using a migration instead of a rake task in this situation; this data manipulation is only going to happen once and one should be wary of cluttering `lib/tasks` with such frivolity. I erred on the side of leaving as small a foot print as possible and deleting the rake task when I completed my goal.
Some big snags I ran into had to do with ActiveRecord’s (and Rails’s in a
larger context) naming conventions butting up against Drupal’s naming
conventions. For example, Drupal’s version of Rails’ `updated_at` is
called `changed`. Rails very much threw a fit when one of its “Models”
used a reserved ActiveRecord word. Similarly one of the Drupal tables
used `type` as a column name. Rails automatically assumes that any
column name type is a reference to STI, which was clearly not the case
in this instance. Luckily for me, I did not need the information in
either of the afore-mentioned columns, so I could effectively tell Rails
to ignore them.
Here is where doing this quick-and-dirty maneuver involves some major
risk: once I had my local copy of the database updated with content I
had pullled, I had to make it decidedly not local. To accomplish this, I
made dump of my local database, uploaded it to Amazon S3, made it
public, and then used Heroku’s pg_backups add-on to make it my live
databse. Obviously the big caveat here is that any changes made to your
site’s databse while you are doing all of the above will be lost once
you execute your pg_backups. If losing this data will cripple your (or
your client’s) business, DO NOT DO IT.
Are there more elegant ways to solve this problem? My answer to that is
only that I wish the demands of my time were such that I could write
nothing but elegant code all the time. Alas, that’s not the world we
live in. Will this solution work for all use cases? No, but it could be
the path of least resistance for the right situation. In the end, the
requirements were satisfied, the client was happy, and I didn’t lose too
much sleep over. That’s a win in my book.