Backups and the Web

Backup solutions and disaster recovery are a mainstay of technology, and they have been well-analyzed in the past -- typically from an IT angle. But choosing a solid backup for a website is just as important, and it's an oft-overlooked final step in the successful deployment of any web site or application. Backup technology has changed a lot over the years, and it can be a daunting task to figure out which solution works for you. This is not a technical guide -- there are countless technical solutions to any given backup need. The intent of this guide is to help you understand the nature of your own website and data, and to narrow down the solution that is ideal for your needs. Without a proper understanding of what data your website has and where it's stored, any backup solution has the potential to languish or be unrestoreable when the worst happens. Read on:

  1. The first step is to consider the nature of your website. Is it merely a "static" HTML/CSS website with no dynamic data? If not, does it use a database of any kind? How/where does that database store its data? Lastly, does your website include the storage of a large number of files (multimedia, documents, etc)? And if so, are these crucial to the operation of your website?
  2. Next, how much data can you afford to lose? Of course, no one likes to think or talk about losing data, but any solution is a compromise between resources you're willing to pay for and the amount of data that's guaranteed protection. This "amount" is typically/best measured in time. Any backup solution will capture "snapshots" of your website -- copies of the data at a certain point in time -- that will vary anywhere from "realtime" to hours or days since the last snapshot was made. If you cannot afford to lose any data, then a realtime backup solution is the only option. These solutions will do their best to use a program running non-stop in the background making a backup (somewhere) of your data. If the answer is "some" -- then you just make a determination of what your acceptable loss is. Is it a day? A week? 6 hours?
  3. The second "time" factor is the time-to-restore. How much time is acceptable to go from a catastrophic failure to a complete restore of your website functionality? This time is determined by many things in addition to the actual type of backup you have, including the type of webhosting you have, and the form of the failure. If you host your website on a VPS ("Virtual Private Server"), for example -- and this VPS was backed-up in its entirity on a regular basis -- then restoring is as simple as copying over the VPS backup and restarting it to that point. If your website has a more complicated architecture (multiple frontend webservers and multiple databases, for example), then the restore process is likely to be a more lengthy ordeal of reconfiguring the servers ahead of time before any data is put back in place.
  4. Where is the data backed up? It should be obvious that the best backup is to a location that is physically separate from where your website is hosted. This is not an absolute (in fact it's quite relative -- is another server rack in the same datacenter "off-site"? What about a different datacenter owned by the same company and on the same network?), but in general you should attempt to distance your data from the potential points of failure: fires, tornadoes, hosting company incompetence, nuclear strikes -- you name it.
  5. Can you test the restore functionality regularly? This is a factor of cost, to some extent, and also practicality. If you opted for any backup solution other than "realtime", you've decided on an acceptable level of data (time) loss -- so testing a full restore right to production is not advisable, since you'd lose data. But working with your hosting/backup vendor to verify the backup, and maybe test a restore to another location are things you should pursue, if possible.

These are all important questions that you need to ask of yourself, and also of your web hosting vendor and your web developer. Together, you can decide on a backup solution that makes sense for your website. That said, what kind of solutions are there? This is not a comprehensive list, but it's a general summary of the types of ways you can guarantee continuity of your website. Interestingly, computing/server technology has come a long way in the last couple of years. Virtualization and cloud competing have blurred the lines between continuity, load-balancing, and backup. After all, if your website is running on a virtual server in the cloud, data loss is impossible, right? Well, no, not really.

  • VPS (Virtual Private Servers) - as mentioned above, a VPS can be a good way of assuring a very quick time from hardware failure to restoration. A VPS is a "server" that is entirely self-contained and run in a "virtual" environment. This means that your server itself is essentially a file or files that can be copied (regularly, or realtime) to any other backup medium for later restoration. Typically, most hosting providers that offer hosting on a VPS will also offer a backup service that makes restoration of your VPS a cinch. The above questions, however, are still quite pertinent and should be asked: How often do they backup the VPS? How quickly can they restore? Some hosting vendors host their VPS on a "cloud" server architecture which makes hardware failure a nearly negligible factor. However, you should still consider how much you trust the competence of your hosting vendor and explore the option of backing up your data somewhere else independent of the VPS itself. Many other things can go wrong -- virtualization software failure, data corruption, etc. These things can all leave your virtual server inoperable or unsalvageable.
  • Backup "daemons" - "daemon" is computer nerd slang for any software that runs non-stop in the background. There are many common backup solutions that use a daemon running in the background to persistently backup your website and its data. Typically, this sort of solution is what's required for a "realtime" backup of your data. Most hosting vendors that host larger dedicated managed server environments will have some offering in this regard.
  • Scheduled backup jobs - there are a wide variety of ways this can be implemented, but loosely speaking, this is any sort of backup script/program that is merely scheduled on a regular basis (say, nightly) and involves a manual copy of the website data (files and database dump) to an arbitrary off-site location.

It's very likely that any hosting vendor you choose will have one or more of the above options -- and possibly others. But it's not enough to simply sign-off on an arbitrary product without understanding what it does and what it guarantees. Hopefully, armed with the above questions and information, you can make the right decision about how to backup your website. In an age where a website is increasingly the face of your business, you can't afford to leave it to chance.