Webfaction fail. over.

This post starts as a rant about webfaction, but somehow turns into a rave. I recently discovered (the hard way) that I can failover almost any site to a secondary host in a different data centre, all with a few scripts on a webfaction shared hosting account.

Rant

I was misfortunate enough to suffer two rather extensive periods of downtime, both lasting a number of hours. The first was some kind of a misunderstanding when the site was moved as part of a scheduled maintenance to a new server. Instead of the migration only taking a few minutes, it lasted about 5 hours. Ouch. The second case wasn’t so much within webfaction’s control, but was nevertheless disappointing. A DDOS attack brought the server down for at least 7 hours, with many smaller periods of downtime lasting the entire day (I counted well over 100 different up and down alerts in a single day!). Perhaps I got spoiled, but until those two incidents, I’ve hardly had any problems with my sites on webfaction. In fact, up to that point a few weeks ago, our website statistics showed 99.98% availability for the year (including scheduled maintenance). During June and July, availability dropped to 99.19% and 97.28%, respectively (yep, July isn’t over yet, so it will hopefully climb up a little). This is not too bad for a shared hosting account, but I was expecting much more from Webfaction.

Failover – on a budget

I don’t tend to use webfaction for very mission-critical stuff. I usually split hosts on different data centres on linode, and also use rackspace or aws. But I ended up discovering that I can achieve much more with webfaction than I expected. I guess I under-estimated it, since it is ‘only’ a shared hosting. Webfaction is a little different from other shared hosting providers, and its unique offering allows you to easily create a failover service. Not only on a different server, but even in a completely different data centre, in a different continent! This is similar to what you can do with usually much more expensive and harder-to-manage VPS cloud services, but with only a few extra dollars per month.

Case Study – failover wordpress

I’ve had a couple of wordpress sites running on the shared hosting account. As I mentioned, after suffering some downtime, I wanted to increase their availability and avoid a similar situation in future. I won’t try to achieve full load-balancing, but rather a fail-over solution. If the server is not available for some reason, it can be manually/semi-automatically switched-over to the failover server, where a recent copy of the website is running. I am also willing to make some compromises, in terms of how ‘lively’ the failover server is going to be. For example, it’s a reasonable sacrifice not to be able to make any changes to content or be able to update the site when the failover happens. Only once the primary server goes back online, we can switch it back and issue those changes. Seems like a fair price to pay as long as we can increase our availability.

WordPress is a good example, since it combines static files in the application folder, as well as a mysql database. The aim was therefore to sync both of these from the primary to the failover server regularly and be able to switch over when necessary.

First step – adding a failover server

Assuming you already have an account with webfaction, you can add another server to your account by contacting their support. That’s probably the easiest way, but you can also use the Upgrade / downgrade menu option from your control panel. You can choose from one of three locations webfaction host at: USA, Europe (Amsterdam) and Asia (Singapore). I would personally recommend choosing a different location for the primary and failover servers. This way, even if the entire data centre is somehow experiencing issues, you are still likely to be able to fail-over to a different location.

Cloning your apps

Once you have two servers set up on your account, whenever you configure a website, an application or a database, you are given a choice of the server you want it on. Make sure you remember which server is where, since they all have pretty generic names like Web250, or Web193. For the sake of clarity, lets say that our primary server is in the USA on Web100, and the failover server is in Europe on Web200. We already have one wordpress application on Web100. We now want to create an identical application with the same name on Web200.

  • Go to Domains / websites -> Applications and click on Add new application.
  • Use the exact same Name, App Category and App Type, only this time we add this application to Web200.

Setting database password

The new app we created on Web200 has the same name, but the database password is different. We must make sure the database has the same password, otherwise when we fail over from Web100 to Web200, our application will not be able to connect to the database. Changing the database password is quite simple.

First of all, we need to get the existing password for the database on Web100. This is usually available on the Extra info section of the application. Just click on the application (on Web100!) and copy the password on the Extra info section.

Now we need to change the password of the database on Web200:

  • Click on Databases -> Databases
  • Search for the database on Web200. It will have a name that is something like <account_name>_<application_name>
  • Next to the database name, there’s a little icon that allows you to edit the database. Click on it.
  • On the next screen click on Change password and paste in the password twice.

SSH Access

In order to synchronize data between the two servers, you must set up the failover’s server SSH public key as trusted on your primary server. There are plenty of online guides on setting up SSH public key access, so I won’t give a step-by-step instructions, but the process involves a few relatively simple steps:

  • Log on to Web200 via SSH
  • Generate an ssh key using ssh-keygen
  • Copy the file contents of ~/.ssh/id_rsa.pub
  • Log on to Web100 via SSH
  • append the contents to the authorized_keys file, e.g. echo {contents of id_rsa.pub} >> ~/.ssh/authorized_keys
  • Log on to Web200 via SSH again
  • Check that if you ssh to Web100 you can login without a password

Rsync application folder

Once key-based SSH access is in place, you can use a simple rsync script to mirror the contents of the application folder on the primary server (Web100) onto the failover server (Web200).

    #!/bin/bash

    # set username with your webfaction user name
    username=

    rsync -e ssh -vaz --delete-after ${username}@web100.webfaction.com:webapps/www/ /home/${username}/webapps/www/

    # optional - to prevent modification of the failover wordpress instance, uncomment the next line
    #echo "deny from all" > /home/${username}/webapps/www/wp-admin/.htaccess
    
  • Save this script in a subfolder of your home folder, e.g. ~/scripts/rsync-web100
  • Schedule this script to run regularly (daily, hourly, depending on your needs), using crontab -e

With this all set, we now have our primary application folder regularly mirrored to the failover server.

Database sync

There are probably more advanced ways to create a mysql slave or to replicate data on-the-fly, but for the sake of this solution, a simpler process is sufficient. The database is dumped into a file on the primary server, then rsync’d across to the failover server, where it is loaded from the dump file.

First of all, lets create a script to create a database backup dump on Web100:

    #!/bin/bash

    # set username with your webfaction user name
    username=
    # set the database_name and password
    database_name=
    password=

    mkdir -p /home/${username}/backups
    mysqldump -u${database_name} -p${password} ${database_name} |gzip >/home/${username}/backups/${database_name}.sql.gz
    chmod -R o-rwx /home/$username/backups
    
  • Save this script, e.g. in ~/scripts/db-backup on Web100
  • Set it to run regularly via crontab -e

Now lets modify the rsync script above and add a command to rsync our database backups from Web100 to Web200.
We also add a command to load the database from the dump file produced on Web100:

    #!/bin/bash

    # set username with your webfaction user name
    username=
    # set the database_name and password
    database_name=
    password=

    # syncing www folder
    rsync -e ssh -vaz --delete-after ${username}@web100.webfaction.com:webapps/www/ /home/${username}/webapps/www/

    # syncing database backup dump
    rsync -e ssh -vaz --delete-after ${username}@web100.webfaction.com:backups/ /home/${username}/backups/
    # loading database from dump
    gunzip -c /home/${username}/backups/${database_name}.sql.gz | mysql -u${database_name} -p${password} -D ${database_name}

    # optional - to prevent modification of the failover wordpress instance, uncomment the next line
    #echo "deny from all" > /home/${username}/webapps/www/wp-admin/.htaccess
    

the missing piece of the puzzle

So far we’ve managed to synchronize the folder and database between our two different webfaction servers. We still need to control on which server the app is active though. If you are using webfaction for your DNS, then this can be done manually through the web console, or via a small script, using the webfaction API. Otherwise, if you manage your DNS externally (which I do), you could set the app on webfaction in such a way that will only require a DNS change to switch-over. I will try to cover these options briefly.

External DNS

If you manage your DNS externally to webfaction, I recommend using aws or zerigo if you want to be able to automate things. This is however not strictly necessary. I recommend doing the following in order to be able to switch-over quickly between the primary and failover servers:

  • set the TTL of your records to a low value (depending on how fast you want to be able to switch – I suggest less than 10 minutes)
  • You can set your main website record as a cname of the primary webfaction server. i.e. something like www.yoursite.com 600 CNAME web100.webfaction.com
  • When you need to switch-over to the failover server, simply update the record so it looks like www.yoursite.com 600 CNAME web200.webfaction.com

For this setup to work, webfaction should have both sites set up simulatenously:

  • Go to the web console
  • Click Domains / websites -> Websites
  • Click to add a new website
  • Give it a name, perhaps something like www-failover
  • Choose the IP address of the failover server (web200.webfaction.com)
  • Choose the subdomain for your site (e.g. www.yoursite.com)
  • Add an app – and choose the app you configured already on web200

Now both sites are configured on webfaction, and you control which one is active via DNS.

Webfaction DNS

If your DNS is managed on webfaction, configuring two sites like we did above might create a round-robin setup, which is not suitable for this failover setup. In that case, we can manually change the website when there’s a problem and we want to fail over to the other server. To do so:

  • Click Domains / websites -> Websites
  • Find the website that runs your site (e.g. www)
  • Click the Edit icon
  • Change the IP address from web100 to web200
  • Change the app from the one running on web100 to the one on web200
  • Click Update

Alternatively, using this small python script that uses the webfaction API, you can automate this process:

    #!/usr/bin/env python

    import xmlrpclib

    # enter your webfaction user name and password
    username = 'your-webfaction-username'
    password = 'your-webfaction-password'

    # update the details below based on your details
    app_name = 'www'  # name of app on webfaction
    ip_addr = '1.2.3.4'  # ip address of the failover server (web200)
    server_dns = 'www.yoursite.com'  # the subdomain on which your app is running
    use_https = False  # whether or not your app uses SSL / HTTPS

    server = xmlrpclib.ServerProxy('https://api.webfaction.com/')
    session_id, account = server.login(username, password)

    server.update_website(session_id, app_name, ip_addr, use_https, [server_dns], (app_name, '/'))
    

Running this script will automatically switch-over your site from Web100 to Web200 without having to use the webfaction web console.

Further thoughts and ideas

I tried to capture the essence of building a failover solution using webfaction shared hosting. Since experiencing a rather long and frustrating downtime recently, I’ve switched to using this method and hope this could give a really good solution for temporary problems, at a very reasonable cost. Here are some other ideas I’m already toying with, to take this solution a step further:

  • Change the solution from a primary/failover to a active/passive setup. This is only a subtle difference, but with a active/passive configuration, the passive server can be promoted to active at any time and the active demoted to passive. This means the service can switch from one server to another, but still remain fully active. It should be relatively simple to implement, but there are a number of small but important points to be aware of, particularly to avoid overwriting data by accident. The essence is that each server ‘knows’ whether it’s active or passive.
  • Automatically perform the failover once the website is down. I have been playing around with the pingdom API already, and this seems very easy to plug into. This way I can detect that the server is down, and act immediately. However, I still prefer at this point to have a human-operator involved to make a judgement call whether a switchover is really necessary.
  • Load-balancing setup. Webfaction already supports a round-robin DNS setup, and my external DNS providers support this too. To load-balance between two or more running servers is therefore possible. The main problem of course is how to synchronize changes between all servers. For example if a blog post comment is made, or a new post is added. This is not quite trivial, but an interesting direction to explore in future.

Hope this information is helpful. I recommend trying it out for yourselves. It’s actually easier than it might look.

11 Responses to “Webfaction fail. over.”

  1. Olaf Lederer

    Hi Yoav,

    this is a great article! I know webfaction since 2010 and used their services for more than a year, later I moved most of my sites to some VPS hosting. These days I will move my blog back to Webfaction and I’m sure my next step will be the failover configuration like described here. BTW is the failover really needed in your situation? Maybe you can share more information about how you use the pingdom API to switch between the servers?
    Thanks!

  2. Yoav Aner

    Thanks Olaf,

    I don’t actually use a fully-automated switch-over. So far we’re keeping it manual. As reliable as pingdom might be, and considering how (not so) critical our website on webfaction is, those kind of decisions are probably still best left to a person to do manually, even if it means longer downtime.

    However, technically it shouldn’t be too hard to write a script that monitors pingdom via the API. when it detects an alert, it runs something like the script above that performs the failover operation on webfaction.

    We haven’t had much trouble with webfaction. But the failover was recently very useful when webfaction upgraded one of the servers. Shortly before the maintenance period we switched to the failover, and once the upgrade was done, switched back.

    p.s. if you are considering the automated / API route, a webhook might be a better option to trigger the failover switch. However, pingdom don’t appear to support webhooks (yet?) – but you can use something like zapier, which looks nice and simple https://zapier.com/zapbook/pingdom/webhook/

    (note I haven’t tried it myself, so can’t vouch that it works, but in principle it should).

  3. Avi

    Hi Yoav,
    I’m trying to find the best solution for my ajax-intensive Django web application. It’s a SAAS web app with a drag&drop interface – so lots of queries, all the time. I’m kicking off with 20 users, looking to one day be in the thousands.

    I’m new to this game, so please forgive the noob questions. Would you recommend Webfaction as a starting point and do you think it’s a scalable environment for when the web app gets more popular?

    Thanks!

  4. Yoav Aner

    I don’t have any personal experience using django on webfaction. The django apps I’m dealing with are typically installed on a VPS. However, I see no reason why you couldn’t run even a relatively-busy django app on webfaction.

    The benefits: pretty good performance most of the time; webfaction support is pretty great and might even help you setting up your django (up to a point…); price is likely to be a little lower than a vps; no need to worry about the OS, software patches and upgrades etc.

    The downsides: shared hosting is less predictable, and you might be negatively impacted by other users; there are probably not that many online guides on doing this with webfaction, and there are plenty for setting django in an environment where you have root access; less flexibility and ease to control your environment (from the OS to the different packages you might need).

    Give it a try, and see how it goes for you. If you get stuck, try a VPS and see if that feels better. There’s no simple answer I’m afraid.

  5. Jesin

    There is one main problem when using DNS failover. The clients must “obey” the TTL value.

    Some DNS caching servers don’t do this and keep the record past TTL. In this case the visitor will hit the server which is down.

    Also Route53 recommends setting the TTL to 1min .

  6. Yoav Aner

    Yes, you’re right. From my experience TTL settings are obeyed pretty well on most DNS servers. Otherwise, they’re not doing their job properly. And of course the Route53 recommendation for low TTL is there to ensure it works (at least for clients who use well-behaving DNS servers).

  7. JP

    Hi Yoav,

    THank you for this interesting article. What I’m most interested in is the active/passive setup. Did you go further in exploring the possibilities there, with any luck?

  8. Yoav Aner

    Hi JP,

    Yes, I did. I implemented a health-check on Route53, and if the primary host is down, it will automatically direct users to the secondary. The secondary/failover is set not to allow any updates, to avoid losing any updates whilst the primary is down (it’s purely to avoid having to sync it back manually since the configuration is primary->secondary only).

    Works pretty well so far. Hope it helps.

    Cheers
    Yoav

  9. JP

    Hi Yoav,

    Thanks for your reply. Not allowing any updates on the failover seems like a simple solution indeed.
    Did you also happen to find a way to do live data syncing between the 2 servers (files+mysql) in order to have load balancing? (Which is the option I’m most interested in to be honest :) )

  10. i3factory

    I’m using Webfaction but my server is often unreachable , they suffer DDOS attack but the dont’t have any professional answer.
    I’m desperate because I had migrated many of my sites on Webfaction but I cannot work …I might recommend a more serious and reliable hosting service

  11. Yoav Aner

    I think that many hosting companies occasionally face DDoS attacks. My personal experience isn’t particularly different with Webfaction. With the fail-over in place however, especially across a different physical location (Webfaction lets you choose between US, EU and Asia) it’s far more unlikely that two servers will both suffer from a DDoS attack at the same time.

Leave a Reply

css.php