<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Gingerlime &#187; Technology</title>
	<atom:link href="http://blog.gingerlime.com/category/technology/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.gingerlime.com</link>
	<description></description>
	<lastBuildDate>Fri, 11 May 2012 07:41:08 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
		<item>
		<title>Fabric Installer for Graphite</title>
		<link>http://blog.gingerlime.com/fabric-installer-for-graphite/</link>
		<comments>http://blog.gingerlime.com/fabric-installer-for-graphite/#comments</comments>
		<pubDate>Tue, 01 May 2012 11:04:34 +0000</pubDate>
		<dc:creator>Yoav Aner</dc:creator>
				<category><![CDATA[django]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://blog.gingerlime.com/?p=1098</guid>
		<description><![CDATA[fabric-graphite is a fabric script to install Graphite, Nginx, uwsgi and all dependencies on a debian-based host. Why? I was reading a few interesting posts about graphite. When I tried to install it however, I couldn&#8217;t find anything that really &#8230; <a href="http://blog.gingerlime.com/fabric-installer-for-graphite/">Continued</a>]]></description>
			<content:encoded><![CDATA[<p><a href="https://github.com/gingerlime/graphite-fabric">fabric-graphite</a> is a fabric script to install <a href="http://graphite.wikidot.com/">Graphite</a>, Nginx, uwsgi and all dependencies on a debian-based host.</p>
<h2>Why?</h2>
<p>I was reading a few <a href="http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/">interesting</a> <a href="http://obfuscurity.com/Tags/Graphite">posts</a> about graphite. When I tried to install it however, I couldn&#8217;t find anything that really covered all the steps. Some covered it well for Apache, others covered Nginx, but had steps missing or assumed the reader knows about them etc.</p>
<p>I&#8217;m a big fan of fabric, and try to do all deployments and installations using it. This way I can re-run the process, and also better document what needs to be done. So instead of writing another guide, I created this fabric script.</p>
<p><span id="more-1098"></span></p>
<h2>Requirements</h2>
<ul>
<li>Workstation running python (version 2.7 recommended). All platforms should be supported.</li>
<li><a href="http://docs.fabfile.org/en/1.4.1/index.html">Fabric</a> &#8211; can be installed via <code>pip install fabric</code> or <code>easy_install fabric</code></li>
<li>a new VPS/Dedicated server running a Debian-based distribution (Debian, Ubuntu etc)</li>
</ul>
<h3>Target Host</h3>
<p>Best to execute this on a clean virtual machine running Debian 6 (Squeeze).<br />
Also tested successfully on Ubuntu 12.04 VPS.</p>
<h2>Installation Instructions</h2>
<p><a href="https://github.com/gingerlime/graphite-fabric/zipball/master">download</a> the project zip file, or <a href="https://github.com/gingerlime/graphite-fabric">clone the project</a> from github.</p>
<p>run <code>fab graphite_install -H root@{hostname}</code><br />
(hostname should be the name of a virtual server you&#8217;re installing onto)</p>
<p>It might prompt you for the root password on the host you are trying to instal onto.</p>
<p>You can use it with a user other than root, as long as this user can <code>sudo</code>.</p>
<p>During the installation, you would be asked to set up the django superuser account. You might want to create an account, but it&#8217;s not strictly necessary. If you answer <code>no</code>, the installation will still work fine.</p>
<h2>After Installation</h2>
<p>Simply open your browser and go to <code>http://[your-hostname]/graphite/</code><br />
It should be up and running.</p>
<p>Of course there&#8217;s a lot more configuration to be done, but at the very least you should have a working environment to play with Graphite.</p>
<h2>Thanks</h2>
<p>Thanks to the authors of these online guides and resources who provided very useful information that I stitched together into this fabric script, and others who provided inspiration about Graphite in General:</p>
<ul>
<li><a href="http://readthedocs.org/docs/graphite/en/latest/install.html">Graphite Docs</a></li>
<li><a href="http://www.frlinux.eu/?p=199">frl1nuX &#8211; Graphite and Nginx</a></li>
<li><a href="http://agiletesting.blogspot.de/2011/04/installing-and-configuring-graphite.html">Agile Testing &#8211; Installing and configuring Graphite</a></li>
<li><a href="http://coreygoldberg.blogspot.de/2012/04/installing-graphite-099-on-ubuntu-1204.html">Corey Goldberg &#8211; Installing Graphite 0.9.9 on Ubuntu 12.04 LTS</a></li>
<li><a href="http://tompurl.com/2011/08/12/installing-graphite-on-ubuntu-10-4-lts/">Tom Purl &#8211; Installing Graphite on Ubuntu 10.4 LTS</a></li>
</ul>
<p>Although not installed with this fabric script, I&#8217;d love to try these some time:</p>
<ul>
<li><a href="http://jondot.github.com/graphene/">Graphene</a></li>
<li><a href="https://github.com/ripienaar/gdash">GDash</a></li>
<li><a href="https://github.com/etsy/logster">Logster</a></li>
</ul>
<h2>DISCLAIMER</h2>
<p>Please try this at your own risk. Please run this only with a newly installed host that you can easily throw away!<br />
I tested it with both Debian 6 and Ubuntu 12.04 successfully. However, you may experience different results.</p>
<h2>Help and Contribution</h2>
<p>I&#8217;d be happy to try to help if I can, but given the complexity of linux-based operating-systems, and my limited time, I might not be able to know why a certain operation fails or an error is generated. Feel free to <a href="https://github.com/gingerlime/graphite-fabric">fork the project on github</a> for your own special requirements or needs.</p>
<p><!-- #VIMPRESS_TAG# http://blog.gingerlime.com/assets/wpid1117-vimpress_4f9fc3cd_mkd.txt wpid1117-vimpress_4f9fc3cd_mkd.txt --></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingerlime.com/fabric-installer-for-graphite/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>bootstrap shooting at the clouds</title>
		<link>http://blog.gingerlime.com/bootstrap-shooting-at-the-clouds/</link>
		<comments>http://blog.gingerlime.com/bootstrap-shooting-at-the-clouds/#comments</comments>
		<pubDate>Mon, 09 Apr 2012 15:57:21 +0000</pubDate>
		<dc:creator>Yoav Aner</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://blog.gingerlime.com/?p=1063</guid>
		<description><![CDATA[One of my primary aims when building a resillient cloud architecture, is being able to spawn instances quickly. Many cloud providers give you tools to create images or snapshots of existing cloud instances and launch them. This is great, but &#8230; <a href="http://blog.gingerlime.com/bootstrap-shooting-at-the-clouds/">Continued</a>]]></description>
			<content:encoded><![CDATA[<p>One of my primary aims when building a resillient cloud architecture, is being able to spawn instances quickly. Many cloud providers give you tools to create images or snapshots of existing cloud instances and launch them. This is great, but not particularly portable. If I have one instance on Linode and I want to clone it to Rackspace, I can&#8217;t easily do that. </p>
<p>That&#8217;s one of the reasons I am creating bootstrap scripts that completely automate a server (re)build process. Given an IP address and root password, the script should connect to the instance, install all necessary packages, pull the code from the repository, initialize the database, configure the web server and get the server ready for restore of user-data. </p>
<p>I&#8217;m primarily using <a href="http://docs.fabfile.org/en/1.4.1/index.html">fabric</a> for automating this process, and use a standard operating system across different cloud providers. This allows a fairly consistent deployments across different providers. This also means the architecture is not dependent on a single provider, which in my opinion gives a huge benefit. Not only can my architecture run on different data centres or geographic locations, but I can also be flxeible in the choice of hosting providers.</p>
<p>All that aside however, building and refining this bootstrapping process allowed me to run it across different cloud providers, namely: <a href="http://www.rackspace.com/cloud/">Rackspace</a>, <a href="http://www.linode.com">Linode</a>, and <a href="http://aws.amazon.com/ec2/">EC2</a>. Whilst running the bootrstrapping process many times, I thought it might be a great opportunity to compare performance of those providers side-by-side. My bootstrap process runs the same commands in order, and covers quite a variety of operations. This should give an interesting indication on how each of the cloud providers performs.<br />
<span id="more-1063"></span></p>
<h2>Tested platforms</h2>
<p>The tests were carried out using the default Debian 6 Squeeze on the lowest-end cloud instances on all three providers:</p>
<ul>
<li>Rackspace 256Mb and 512Mb &#8211; using the London data centre.</li>
<li>Linode 512 &#8211; using the London data centre.</li>
<li>EC2 micro instance (EBS volume) &#8211; using the Ireland data centre.</li>
</ul>
<h2>Bootstrap process</h2>
<p>The bootstrap process executes the following tasks:</p>
<ul>
<li><code>apt-get update &amp;&amp; apt-get upgrade</code> and installing a list of prerequisite packages</li>
<li>Installing Postgresql from backports</li>
<li>Downloading, compiling and installing ruby and sphinx from source</li>
<li>Setting up SSH keys</li>
<li>Pulling code from a remote git repository</li>
<li>Creating a couple of small (empty) databases and user accounts</li>
<li>Tweaking some configuration files</li>
<li>Performing <code>bundle install</code> on a rails project</li>
<li>Performing rake tasks to set the database schema and seed the database</li>
</ul>
<p>These are relatively I/O intensive operations, but also involve CPU tasks (compiling code) and network access (downloading sources and packages), so should provide a reasonable benchmark for comparing the performance of those cloud providers.</p>
<h2>Results</h2>
<p>These highly-unscientific results are quite basic. No fancy charts or anything. All I measured was how long the entire bootstrap operation was taking on each of the cloud providers.</p>
<ul>
<li>Rackspace 256: 1269 seconds (~21 minutes)</li>
<li>Rackspace 512: 1144 seconds (~19 minutes)</li>
<li>Linode 512: 1053 seconds (~17.5 minutes)</li>
<li>EC2 micro: 4090 seconds (1 hour and 8 minutes!!??)</li>
</ul>
<p>Linode seems to be the winner, running around 20% faster than Rackspace 256 and 8% faster than rackspace 512. What&#8217;s much more surprising however (for me anyway), is how slow EC2 is in comparison, running 378% slower than Linode&#8230; I am guessing this is down to EBS storage. Quite a big performance hit for persistent storage though.</p>
<p><!-- #VIMPRESS_TAG# http://blog.gingerlime.com/assets/wpid1084-vimpress_4f830760_mkd.txt wpid1084-vimpress_4f830760_mkd.txt --></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingerlime.com/bootstrap-shooting-at-the-clouds/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>How much (cache) is too much?</title>
		<link>http://blog.gingerlime.com/how-much-cache-is-too-much/</link>
		<comments>http://blog.gingerlime.com/how-much-cache-is-too-much/#comments</comments>
		<pubDate>Sat, 17 Mar 2012 16:49:16 +0000</pubDate>
		<dc:creator>Yoav Aner</dc:creator>
				<category><![CDATA[optimization]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://blog.gingerlime.com/?p=876</guid>
		<description><![CDATA[One of the best rules of thumb I know is the 80/20 rule. I can&#8217;t think of a more practical rule in almost any situation. Combined with the law of diminishing returns, it pretty much sums up how the universe &#8230; <a href="http://blog.gingerlime.com/how-much-cache-is-too-much/">Continued</a>]]></description>
			<content:encoded><![CDATA[<p>One of the best rules of thumb I know is the <a href="https://en.wikipedia.org/wiki/Pareto_principle">80/20 rule</a>. I can&#8217;t think of a more practical rule in almost any situation. Combined with the <a href="https://en.wikipedia.org/wiki/Diminishing_returns#Diminishing_marginal_returns">law of diminishing returns</a>, it pretty much sums up how the universe works. One case-study that hopes to illustrate both of these, if only a little, is a short experiment in optimization I carried out recently. I was reading <a href="http://www.go2linux.org/linux/2011/04/nginx-varnish-compared-nginx-941">so</a> <a href="http://elivz.com/blog/single/wordpress_with_w3tc_on_nginx/">many</a> <a href="http://danielmiessler.com/blog/optimizing-wordpress-with-nginx-varnish-w3-total-cache-amazon-s3-and-memcached">posts</a> <a href="http://ocaoimh.ie/2011/08/09/speed-up-wordpress-with-apache-and-varnish/">about</a> optimizing wordpress using <a href="http://nginx.org/">nginx</a>, <a href="https://www.varnish-cache.org/">varnish</a>, <a href="http://wordpress.org/extend/plugins/w3-total-cache/">W3-Total-Cache</a> and <a href="http://php-fpm.org/">php-fpm</a>. The <a href="http://www.axelsegebrecht.com/how-to/benchmark-nginx-varnish-wordpress-site/">results</a> on <a href="http://danielmiessler.com/blog/optimizing-wordpress-with-nginx-varnish-w3-total-cache-amazon-s3-and-memcached">some of them</a> were staggering in terms of improvements, and I was inspired to try to come up with a similar setup that will really push the boundary of how fast I can serve a wordpress site. </p>
<h2>Spoiler &#8211; Conclusion</h2>
<p>So I know there isn&#8217;t such a thing as too much <em>cash</em>, but does the same apply to <strong>cache</strong>?<br />
<span id="more-876"></span><br />
The results were rather disappointing for me. It turns out that my existing configuration of <a href="http://wordpress.org/extend/plugins/w3-total-cache/">W3TC</a> was already closely matching that of a much more complex set-up involving two or three proxy layers involving varnish and nginx. The <a href="http://wordpress.org/extend/plugins/w3-total-cache/">W3TC</a> disk-enhanced page caching is simple but powerful enough to give nearly the same performance. How? The basic principle is that it creates a static version of every page, and hooks the webserver (Apache or Nginx) to serve this file directly if it exists. On most Linux platforms, file-caching is powerful enough to serve those files from memory. This means that it does in fact match the performance characteristics of an in-memory cache. Simple solutions are not necessarily weaker. With 20% of the effort, I reached 80% of my performance goal. In actual fact, I believe the results are more like 10/90&#8230; More on that (and some caveats) at the end.</p>
<h2>Hosting platform</h2>
<p>Unlike most guides and optimization benchmarks, I decided to try this &#8220;on a budget&#8221;. Not on a high-performance dedicated server, not even on a VPS, but rather on a shared hosting account with <a href="http://www.webfaction.com?affiliate=yoavaner">webfaction</a>. As shared hosting providers go, webfaction is quite unique in that it allows, perhaps even <em>encourages</em> you, to compile your own tools and run them. It doesn&#8217;t give you as much freedom as a VPS, but it&#8217;s actually good enough. This post doubles-up as a mini-guide on how to set-up php-fpm, nginx and varnish specifically on webfaction (unlike most guides, which assume you have root access and can use some package manager).</p>
<h2>Original Setup</h2>
<p>The baseline was a (real!) wordpress site running on webfaction, and already using <a href="http://wordpress.org/extend/plugins/w3-total-cache/">W3TC</a> with disk-enhanced page caching using <code>.htaccess</code> rules. W3TC is clever enough to manage those rewrite rules for you and also to detect whether your instance is running behind Apache or Nginx. My initial thinking was that since those static pages generated by W3TC are served by Apache, and are saved on-disk (rather than in-memory), I could benefit by introducing some in-memory caching (Varnish), and remove Apache out of the equation. It&#8217;s important to note that <a href="http://www.webfaction.com?affiliate=yoavaner">webfaction</a> already use nginx as a front-end proxy. However, with a wordpress app, they forward all requests via Apache.</p>
<pre><code>BASELINE:

{Internet} -&gt; Webfaction Nginx -&gt; Webfaction Apache -&gt; W3TC / WordPress
</code></pre>
<p>This is how the baseline setup looks like (pretty, isn&#8217;t it?)</p>
<h2>Nginx Setup</h2>
<p>The first variation was to install my own nginx instance, and serve the wordpress pages via it, instead of Apache. In order to also serve php pages effectively, I opted for <a href="http://php-fpm.org/">php-fpm</a>, which <a href="http://interfacelab.com/nginx-php-fpm-apc-awesome/">seems like the most recommended option</a> with nginx. <a href="http://php-fpm.org/">php-fpm</a> (fastcgi process manager) means you need to run a mini server that listens to requests to serve php files&#8230; Nginx uses it when it needs to serve a php page. I include some installation instructions for <a href="http://php-fpm.org/">php-fpm</a> here too. I couldn&#8217;t find any guide online on how to do it with <a href="http://www.webfaction.com?affiliate=yoavaner">webfaction</a>.</p>
<pre><code>NGINX:

{Internet} -&gt; Webfaction Nginx -&gt; My Nginx / php-fpm -&gt; W3TC / WordPress
</code></pre>
<p>and this is what the Nginx setup will look like (once we install it, it&#8217;s not THAT easy)</p>
<h3>Installing Nginx on webfaction</h3>
<p>The easiest option to run your own nginx instance on webfaction is to use the ready-made <code>passenger</code> application. This is usually used to host a ruby-on-rails app, but there&#8217;s nothing stopping us from removing all the passenger stuff, and just using it as an nginx instance. In the webfaction control panel go to Applications, Add a new app, lets call it <code>engine</code>, and then choose <code>Passenger</code> for <em>app category</em> and then <code>Passenger 3.0.11 (nginx 1.0.10/Ruby 1.9.3)</code> in the <em>App type</em>. This will install it under <code>~/webapps/engine</code>.</p>
<h3>Installing PHP-FPM</h3>
<p>Next up is getting php-fpm compiled, installed and running. This is done by compiling php with the right option to include php-fpm. SSH to your webfaction account and run these commands</p>
<pre class="brush: bash; title: ; notranslate">
    mkdir ~/src
    cd ~/src
    wget http://www.php.net/get/php-5.3.10.tar.gz/from/sg2.php.net/mirror
    tar -zxvf php-5.3.10.tar.gz
    cd php-5.3.10
    ./configure --prefix=$HOME --with-pdo-mysql --with-pdo-pgsql=/usr/pgsql-9.1 --enable-fpm --enable-bcmath --enable-calendar --enable-exif --enable-ftp --enable-mbstring --enable-soap --enable-zip --with-curl --with-freetype-dir --with-gd --with-gettext --with-gmp --with-iconv --with-jpeg-dir --with-kerberos --with-mhash --with-mysql --with-mysqli --with-openssl --with-pgsql=/usr/pgsql-9.1 --with-png-dir --with-regex --with-xmlrpc --with-xsl --with-zlib-dir --without-pear --enable-sockets --enable-intl --with-mysql-sock=/var/lib/mysql/mysql.sock
    make
    make install
    </pre>
</p>
<p>We should now have php-fpm installed under <code>~/sbin</code> and a default configuration file created in <code>~/etc/php-fpm.conf.default</code>.</p>
<h3>PHP-FPM configuration</h3>
<p>Follow these steps to copy the configuration file and creating a folder where our Unix socket will live</p>
<pre class="brush: bash; title: ; notranslate">
    cp ~/etc/php-fpm.conf.default ~/etc/php-fpm.conf
    mkdir -p ~/var/spool
    </pre>
</p>
<p>Then edit our <code>php-fpm.conf</code> file. We only need to replace the line that says <code>listen = 127.0.0.1:9000</code> with</p>
<pre class="brush: plain; title: ; notranslate">
    listen = /home/WEBFACTION_USER/var/spool/phpfpm.sock
    </pre>
<p>Replace the <code>WEBFACTION_USER</code> with your webfaction username</p>
<p>Now we can launch our php-fpm and it will listen on the unix socket. To launch it, simply run</p>
<pre class="brush: bash; title: ; notranslate">
    ~/sbin/php-fpm
    </pre>
</p>
<p>Note that if you plan to use this for your server you will need to create a cron job that checks whether it&#8217;s running and launches it. Follow this <a href="http://docs.webfaction.com/software/custom.html">webfaction guide about custom apps</a> for more info.</p>
<p>One last thing is to create the <code>fastcgi_params</code> parameters file required by nginx to the nginx conf folder (<code>~/webapps/engine/nginx/conf/fastcgi_params</code>)</p>
<pre><code>
<pre class="brush: plain; title: ; notranslate">
fastcgi_param  QUERY_STRING       $query_string;
fastcgi_param  REQUEST_METHOD     $request_method;
fastcgi_param  CONTENT_TYPE       $content_type;
fastcgi_param  CONTENT_LENGTH     $content_length;

fastcgi_param  SCRIPT_NAME        $fastcgi_script_name;
fastcgi_param  REQUEST_URI        $request_uri;
fastcgi_param  DOCUMENT_URI       $document_uri;
fastcgi_param  DOCUMENT_ROOT      $document_root;
fastcgi_param  SERVER_PROTOCOL    $server_protocol;

fastcgi_param  GATEWAY_INTERFACE  CGI/1.1;
fastcgi_param  SERVER_SOFTWARE    nginx/$nginx_version;

fastcgi_param  REMOTE_ADDR        $remote_addr;
fastcgi_param  REMOTE_PORT        $remote_port;
fastcgi_param  SERVER_ADDR        $server_addr;
fastcgi_param  SERVER_PORT        $server_port;
fastcgi_param  SERVER_NAME        $server_name;

# PHP only, required if PHP was built with --enable-force-cgi-redirect
fastcgi_param  REDIRECT_STATUS    200;
</pre>
<p></code></pre>
<h3>Nginx Configuration</h3>
<p>This part was getting a little tricky, since I wasn&#8217;t sure which guide to follow for configuring nginx in the best way for using with <a href="http://wordpress.org/extend/plugins/w3-total-cache/">W3TC</a>. It also seemed to me that all the online guides miss an important part. W3TC actually generates a nginx configuration for you. This is not a complete configuration, but you need to make sure it is included within your nginx.conf in order to really get the most out of the W3TC plugin.</p>
<p>First step was to create a folder where we&#8217;ll get our W3TC-generated nginx.conf file into</p>
<pre class="brush: bash; title: ; notranslate">
    mkdir -p ~/webapps/engine/nginx/conf/sites-enabled
    </pre>
</p>
<p>Then edit your <code>~/webapps/engine/nginx/conf/nginx.conf</code> file, so it looks something like this:</p>
<section>
<pre class="brush: bash; title: ; notranslate">
    worker_processes  1;

    events {
        worker_connections  1024;
    }

    http {
        access_log  /home/WEBFACTION_USER/logs/user/access_engine.log  combined;
        error_log   /home/WEBFACTION_USER/logs/user/error_engine.log   crit;

        include         mime.types;
        sendfile        on;
        tcp_nodelay on;
        tcp_nopush on;
        port_in_redirect off;

        server {
            listen             99999; # make sure the port is the same as configured on your webfaction `engine` app
            server_name        localhost;
            root               /home/WEBFACTION_USER/webapps/wptst;  # point this to your wordpress folder
            index              index.php;
            include /home/WEBFACTION_USER/webapps/engine/nginx/conf/sites-enabled/*;

            location / {
                try_files $uri $uri/ /index.php?q=$uri&amp;$args;
            }
            # Deny access to hidden files
            location ~* /\.ht {
                deny            all;
                access_log      off;
                log_not_found   off;
            }

            # Pass PHP scripts on to PHP-FPM
            location ~* \.php$ {
                try_files       $uri /index.php;
                fastcgi_index   index.php;
                fastcgi_pass    unix:/home/WEBFACTION_USER/var/spool/phpfpm.sock;
                include         fastcgi_params;
                fastcgi_param   SCRIPT_FILENAME    $document_root$fastcgi_script_name;
                fastcgi_param   SCRIPT_NAME        $fastcgi_script_name;
            }

        }
    }
    </pre>
</section>
<p>Make sure to replace <code>WEBFACTION_USER</code> with your username, so the folders are correct. Also update the listen port from 99999 to the port that was given to your app.</p>
<p>(Re)start your nginx process by running</p>
<pre class="brush: bash; title: ; notranslate">
    ~/webapps/engine/bin/restart
    </pre>
</p>
<h3>Configuring wordpress via nginx</h3>
<p>Now that we have both our php-fpm and nginx processes running, lets plug our site to use this new configuration. This is quite easy for those familiar with the webfaction control panel. Either change your existing website instance to point to the <code>engine</code> app, or create a new website, and point it to <code>engine</code>. Note that the domain name should match that of your wordpress instance. This is all done under the <code>Domains / websites -&gt; Websites</code> menu.</p>
<p>Give webfaction a couple of minutes to sync, and you should be ready to access your wordpress, now running under nginx. Next step is to configure W3TC to generate the configuration for nginx</p>
<h3>W3TC configuration</h3>
<p>Login to your wordpress admin and go to the W3TC <code>Performance</code> menu. Ignore any errors or warnings you might see for now.</p>
<p>Scroll down to the <code>Miscellaneous</code> section. You should see a <code>Nginx server configuration file path</code> option. You should enter this path</p>
<pre class="brush: plain; title: ; notranslate">
    /home/WEBFACTION_USER/webapps/engine/nginx/conf/sites-enabled/nginx.conf
    </pre>
<p>(replace with your own webfaction username).</p>
<p>Then click <code>Save all settinegs</code>.</p>
<p>Now you can click <code>auto-install</code> on all the warning messages that W3TC spits out. This will generate a custom nginx.conf file for you in the right folder. W3TC will probably keep complaining with an error that says </p>
<blockquote>
<p><code>It appears Page Cache URL rewriting is not working. If using apache, verify that the server configuration allows .htaccess or if using nginx verify all configuration files are included in the configuration.</code>. </p>
</blockquote>
<p>You now need to restart your nginx to pick up the new configuration file generated by W3TC.</p>
<pre><code>
<pre class="brush: bash; title: ; notranslate">
~/webapps/engine/bin/restart
</pre>
<p></code></pre>
<h2>Varnish</h2>
<p>The second variation was to use the previous nginx configuration, but also place Varnish cache in-front of it.</p>
<pre><code>VARNISH:

{Internet} -&gt; Webfaction Nginx -&gt; My Varnish -&gt; My Nginx / php-fpm -&gt; W3TC / WordPress
</code></pre>
<p>and this is what it would look like with Varnish. It&#8217;s like the cherry on the cake. Probably not as sweet though.</p>
<h3>Installing Varnish on Webfaction</h3>
<p>Compiling and installing varnish is quite similar to php-fpm and nginx</p>
<pre class="brush: bash; title: ; notranslate">
    cd ~/src
    wget wget http://repo.varnish-cache.org/source/varnish-3.0.2.tar.gz
    tar -zxvf varnish-3.0.2.tar.gz
    cd varnish-3.0.2
    ./autogen.sh
    ./configure --prefix=$HOME
    make
    # this small hack was also required... (from http://community.webfaction.com/questions/5470/easy-reverse-proxy-cache-with-webfaction/5514)
    mv ./libtool ./libtool_old
    ln -s /usr/bin/libtool ./libtool
    make install
    </pre>
</p>
<p>Unlike php-fpm, which can use unix sockets, varnish needs a TCP port to listen to. Luckily, <a href="http://www.webfaction.com?affiliate=yoavaner">webfaction</a> makes it relatively easy. To do that, create a <code>varnish</code> app on webfaction. We&#8217;ll call it <code>varnish</code>, and choose <code>Custom</code> for the <em>App category</em> and <code>Custom app (listening on port)</code> for the <em>App type</em>. Click <em>Create</em> and notice the port number. We&#8217;ll use this port to listen to on our varnish configuration.</p>
<p>The configuration lives under <code>~/etc/varnish/default.vcl</code>. You can simply overwrite the sample file already created there with this configuration:</p>
<pre><code>
<pre class="brush: bash; title: ; notranslate">
 backend default {
  .host = &quot;localhost&quot;;
# Replace this with your NGINX (Engine app) port !
  .port = &quot;99999&quot;;
}
acl purge {
        &quot;localhost&quot;;
}
sub vcl_recv {
        if (req.request == &quot;PURGE&quot;) {
                if (!client.ip ~ purge) {
                        error 405 &quot;Not allowed.&quot;;
                }
                return(lookup);
        }
if (req.url ~ &quot;^/$&quot;) {
               unset req.http.cookie;
            }
}
sub vcl_hit {
        if (req.request == &quot;PURGE&quot;) {
                set obj.ttl = 0s;
                error 200 &quot;Purged.&quot;;
        }
}
sub vcl_miss {
        if (req.request == &quot;PURGE&quot;) {
                error 404 &quot;Not in cache.&quot;;
        }
if (!(req.url ~ &quot;wp-(login|admin)&quot;)) {
                        unset req.http.cookie;
                }
    if (req.url ~ &quot;^/[^?]+.(jpeg|jpg|png|gif|ico|js|css|txt|gz|zip|lzma|bz2|tgz|tbz|html|htm)(\?.|)$&quot;) {
       unset req.http.cookie;
       set req.url = regsub(req.url, &quot;\?.$&quot;, &quot;&quot;);
    }
    if (req.url ~ &quot;^/$&quot;) {
       unset req.http.cookie;
    }
}
sub vcl_fetch {
        if (req.url ~ &quot;^/$&quot;) {
                unset beresp.http.set-cookie;
        }
        if (!(req.url ~ &quot;wp-(login|admin)&quot;)) {
                unset beresp.http.set-cookie;
        }
}
</pre>
<p></code></pre>
<p>Notice that the port number in this configuration file is of the <strong>NGINX</strong> server (which we created earlier and called it <code>engine</code>)!</p>
<p>Now to run varnish:</p>
<pre class="brush: bash; title: ; notranslate">
    ~/sbin/varnishd -f ~/etc/varnish/default.vcl -s malloc,64M -a 127.0.0.1:55555
    </pre>
</p>
<p>The port here is of the newly created <code>varnish</code> app (replace <strong>55555</strong> with your own port). You can tweak this to use more or less memory. <a href="http://www.webfaction.com?affiliate=yoavaner">webfaction</a> gives each account 256Mb of application-usable memory, so it depends on how much stuff you have running already.</p>
<h2>Benchmark results</h2>
<p>To carry out those tests, I used <a href="https://en.wikipedia.org/wiki/ApacheBench">ApacheBench</a> from a near-by location with good internet connection and low latency (ping test showed around 4ms). I repeated each test several times and checked that there were no configuration problems or issues that might skew the results. The tests were carried out against the exact same wordpress site and testing both static and dynamic pages against each of the configurations (BASELINE, NGINX, VARNISH). I also used httperf for sanity-testing, to make sure the apache bench results were accurate, and on the varnish testing made sure <code>varnishhits</code> show realistic information about cache hits/misses.</p>
<p>The command I used was</p>
<pre class="brush: plain; title: ; notranslate">
    ab -kc 10 -n 1000 {url}
    </pre>
</p>
<h2>Static Pages</h2>
<p>Static pages are not just plain html, but rather pages that W3TC caches and converts into static files. Ideally most, if not all of the pages on a typical wordpress setup can be cached this way. I have tested a couple of pages of different sizes. </p>
<h2>Pretty Big Page (~100kb)</h2>
<h3>VARNISH</h3>
<pre><code>
<pre class="brush: plain; title: ; notranslate">
Document Path:          /category/dinosaurs/
Document Length:        113742 bytes

Concurrency Level:      10
Time taken for tests:   9.779 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Keep-Alive requests:    0
Total transferred:      114047000 bytes
HTML transferred:       113742000 bytes
Requests per second:    102.26 [#/sec] (mean)
Time per request:       97.793 [ms] (mean)
Time per request:       9.779 [ms] (mean, across all concurrent requests)
Transfer rate:          11388.76 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        2   10   2.1     10      33
Processing:    57   87   6.2     87     112
Waiting:        3   11   2.3     11      34
Total:         67   97   6.4     97     124

Percentage of the requests served within a certain time (ms)
  50%     97
  66%    100
  75%    101
  80%    103
  90%    105
  95%    107
  98%    112
  99%    118
 100%    124 (longest request)
</pre>
<p></code></pre>
<h3>NGINX</h3>
<pre><code>
<pre class="brush: plain; title: ; notranslate">
Document Path:          /category/dinosaurs/
Document Length:        113742 bytes

Concurrency Level:      10
Time taken for tests:   9.723 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Keep-Alive requests:    993
Total transferred:      114059965 bytes
HTML transferred:       113742000 bytes
Requests per second:    102.85 [#/sec] (mean)
Time per request:       97.232 [ms] (mean)
Time per request:       9.723 [ms] (mean, across all concurrent requests)
Transfer rate:          11455.74 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   5.5      0      87
Processing:    43   96  17.5     97     310
Waiting:        3   66  22.5     74      88
Total:         43   97  20.9     97     397

Percentage of the requests served within a certain time (ms)
  50%     97
  66%     97
  75%     97
  80%     98
  90%    105
  95%    117
  98%    129
  99%    155
 100%    397 (longest request)
</pre>
<p></code></pre>
<h3>BASELINE</h3>
<pre><code>
<pre class="brush: plain; title: ; notranslate">
Document Path:          /category/dinosaurs/
Document Length:        113753 bytes

Concurrency Level:      10
Time taken for tests:   9.736 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Keep-Alive requests:    994
Total transferred:      114099970 bytes
HTML transferred:       113753000 bytes
Requests per second:    102.72 [#/sec] (mean)
Time per request:       97.356 [ms] (mean)
Time per request:       9.736 [ms] (mean, across all concurrent requests)
Transfer rate:          11445.21 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   3.6      0      63
Processing:    29   97  51.3     91     643
Waiting:        3   50  14.8     50     116
Total:         29   97  51.8     92     643

Percentage of the requests served within a certain time (ms)
  50%     92
  66%     98
  75%    101
  80%    106
  90%    126
  95%    149
  98%    213
  99%    407
 100%    643 (longest request)
</pre>
<p></code></pre>
<p>Notice that nginx and our baseline setup are able to use HTTP keep-alive, whereas varnish didn&#8217;t. I&#8217;m not sure why this happens.</p>
<h2>Smaller Page (~35kb)</h2>
<h3>VARNISH</h3>
<pre><code>
<pre class="brush: plain; title: ; notranslate">
Document Path:          /category/elephants/
Document Length:        38304 bytes

Concurrency Level:      10
Time taken for tests:   3.317 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Keep-Alive requests:    0
Total transferred:      38607989 bytes
HTML transferred:       38304000 bytes
Requests per second:    301.50 [#/sec] (mean)
Time per request:       33.167 [ms] (mean)
Time per request:       3.317 [ms] (mean, across all concurrent requests)
Transfer rate:          11367.56 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        2    8   1.5      8      11
Processing:    17   25   2.2     26      35
Waiting:        2    8   1.5      8      13
Total:         23   33   2.2     33      40

Percentage of the requests served within a certain time (ms)
  50%     33
  66%     34
  75%     34
  80%     35
  90%     36
  95%     37
  98%     38
  99%     38
 100%     40 (longest request)
</pre>
<p></code></pre>
<h3>NGINX</h3>
<pre><code>
<pre class="brush: plain; title: ; notranslate">
Document Path:          /category/elephants/
Document Length:        38304 bytes

Concurrency Level:      10
Time taken for tests:   3.308 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Keep-Alive requests:    993
Total transferred:      38620965 bytes
HTML transferred:       38304000 bytes
Requests per second:    302.30 [#/sec] (mean)
Time per request:       33.080 [ms] (mean)
Time per request:       3.308 [ms] (mean, across all concurrent requests)
Transfer rate:          11401.33 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   1.4      0      23
Processing:    10   33  13.5     31     117
Waiting:        3   18   4.2     18      28
Total:         10   33  13.7     31     117

Percentage of the requests served within a certain time (ms)
  50%     31
  66%     36
  75%     40
  80%     42
  90%     46
  95%     51
  98%     79
  99%     95
 100%    117 (longest request)
</pre>
<p></code></pre>
<h3>BASELINE</h3>
<pre><code>
<pre class="brush: plain; title: ; notranslate">
Document Path:          /category/elephants/
Document Length:        38315 bytes

Concurrency Level:      10
Time taken for tests:   3.333 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Keep-Alive requests:    994
Total transferred:      38660970 bytes
HTML transferred:       38315000 bytes
Requests per second:    300.05 [#/sec] (mean)
Time per request:       33.327 [ms] (mean)
Time per request:       3.333 [ms] (mean, across all concurrent requests)
Transfer rate:          11328.49 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   1.5      0      23
Processing:     9   33  13.8     29     131
Waiting:        3   20   4.6     20      45
Total:          9   33  14.1     29     131

Percentage of the requests served within a certain time (ms)
  50%     29
  66%     34
  75%     39
  80%     43
  90%     50
  95%     52
  98%     76
  99%     93
 100%    131 (longest request)
</pre>
<p></code></pre>
<h2>Even smaller (~15kb)</h2>
<h3>VARNISH</h3>
<pre><code>
<pre class="brush: plain; title: ; notranslate">
Document Path:          /cateogry/kittens/
Document Length:        16986 bytes

Concurrency Level:      10
Time taken for tests:   1.594 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Keep-Alive requests:    0
Total transferred:      17290000 bytes
HTML transferred:       16986000 bytes
Requests per second:    627.16 [#/sec] (mean)
Time per request:       15.945 [ms] (mean)
Time per request:       1.594 [ms] (mean, across all concurrent requests)
Transfer rate:          10589.52 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        2    5   1.7      4      11
Processing:     5   11   2.2     11      17
Waiting:        2    6   1.7      5      13
Total:          9   16   2.2     16      25

Percentage of the requests served within a certain time (ms)
  50%     16
  66%     17
  75%     17
  80%     18
  90%     19
  95%     19
  98%     21
  99%     21
 100%     25 (longest request)
</pre>
<p></code></pre>
<h3>NGINX</h3>
<pre><code>
<pre class="brush: plain; title: ; notranslate">
Document Path:          /cateogry/kittens/
Document Length:        16986 bytes

Concurrency Level:      10
Time taken for tests:   1.482 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Keep-Alive requests:    990
Total transferred:      17302950 bytes
HTML transferred:       16986000 bytes
Requests per second:    674.94 [#/sec] (mean)
Time per request:       14.816 [ms] (mean)
Time per request:       1.482 [ms] (mean, across all concurrent requests)
Transfer rate:          11404.67 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.3      0       3
Processing:     5   15   1.0     15      24
Waiting:        3   13   1.1     13      14
Total:          5   15   1.0     15      24

Percentage of the requests served within a certain time (ms)
  50%     15
  66%     15
  75%     15
  80%     15
  90%     15
  95%     15
  98%     16
  99%     18
 100%     24 (longest request)
</pre>
<p></code></pre>
<h3>BASELINE</h3>
<pre><code>
<pre class="brush: plain; title: ; notranslate">
Document Path:          /cateogry/kittens/
Document Length:        16997 bytes

Concurrency Level:      10
Time taken for tests:   1.488 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Keep-Alive requests:    992
Total transferred:      17342960 bytes
HTML transferred:       16997000 bytes
Requests per second:    671.83 [#/sec] (mean)
Time per request:       14.885 [ms] (mean)
Time per request:       1.488 [ms] (mean, across all concurrent requests)
Transfer rate:          11378.42 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.6      0      13
Processing:     6   15   0.9     15      26
Waiting:        3   13   1.2     13      15
Total:          6   15   1.1     15      26

Percentage of the requests served within a certain time (ms)
  50%     15
  66%     15
  75%     15
  80%     15
  90%     15
  95%     15
  98%     16
  99%     19
 100%     26 (longest request)
</pre>
<p></code></pre>
<h2>Dynamic Pages</h2>
<p>I chose the <code>wp-login.php</code> for the test. Ideally most pages will be cached anyway. Nevertheless, it&#8217;s interesting to see what overhead (if any) is added by varnish or nginx, and also to compare PHP-FPM with the standard php-cgi provided by default.</p>
<h3>VARNISH</h3>
<pre><code>
<pre class="brush: plain; title: ; notranslate">
Document Path:          /wp-login.php
Document Length:        3643 bytes

Concurrency Level:      10
Time taken for tests:   32.551 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Keep-Alive requests:    993
Total transferred:      4309921 bytes
HTML transferred:       3643000 bytes
Requests per second:    30.72 [#/sec] (mean)
Time per request:       325.514 [ms] (mean)
Time per request:       32.551 [ms] (mean, across all concurrent requests)
Transfer rate:          129.30 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.3      0       3
Processing:   137  324  42.2    316     545
Waiting:      137  323  42.1    314     544
Total:        139  324  42.3    316     548

Percentage of the requests served within a certain time (ms)
  50%    316
  66%    339
  75%    355
  80%    360
  90%    380
  95%    396
  98%    408
  99%    413
 100%    548 (longest request)
</pre>
<p></code></pre>
<h3>NGINX</h3>
<pre><code>
<pre class="brush: plain; title: ; notranslate">
Document Path:          /wp-login.php
Document Length:        3643 bytes

Concurrency Level:      10
Time taken for tests:   32.085 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Keep-Alive requests:    0
Total transferred:      4213000 bytes
HTML transferred:       3643000 bytes
Requests per second:    31.17 [#/sec] (mean)
Time per request:       320.852 [ms] (mean)
Time per request:       32.085 [ms] (mean, across all concurrent requests)
Transfer rate:          128.23 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        2    3   0.5      3       6
Processing:   142  317  40.9    309     427
Waiting:      141  315  41.0    307     427
Total:        144  319  40.9    312     431

Percentage of the requests served within a certain time (ms)
  50%    312
  66%    332
  75%    348
  80%    357
  90%    380
  95%    397
  98%    412
  99%    416
 100%    431 (longest request)
</pre>
<p></code></pre>
<h3>BASELINE</h3>
<pre><code>
<pre class="brush: plain; title: ; notranslate">
Document Path:          /wp-login.php
Document Length:        3654 bytes

Concurrency Level:      10
Time taken for tests:   41.740 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Keep-Alive requests:    0
Total transferred:      4201000 bytes
HTML transferred:       3654000 bytes
Requests per second:    23.96 [#/sec] (mean)
Time per request:       417.396 [ms] (mean)
Time per request:       41.740 [ms] (mean, across all concurrent requests)
Transfer rate:          98.29 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        2    3   0.5      3       4
Processing:   236  414  56.0    404     607
Waiting:      215  380  53.0    371     555
Total:        238  416  56.0    407     610

Percentage of the requests served within a certain time (ms)
  50%    407
  66%    434
  75%    452
  80%    465
  90%    494
  95%    519
  98%    554
  99%    577
 100%    610 (longest request)
</pre>
<p></code></pre>
<h2>Analysis</h2>
<p>If I&#8217;m reading the numbers from apache-bench correctly, then nginx does offer some (relatively modest) improvement in performance. This is actually a little more noticable on dynamic php pages. This is not a huge surprise, since my nginx setup was configured with php-fpm, whereas the baseline setup had to use php-cgi. Adding Varnish on top does not boost performance that much further though. Perhaps my configuration wasn&#8217;t optimal, or maybe you really start to benefit when you have more memory. I don&#8217;t know. Considering the added complexity of two proxying layers, I&#8217;d say it&#8217;s not really worth it for me. I even wonder if managing my own nginx is worth the hassle. This is because with nginx, there&#8217;s a bigger risk of some weirdness in compatibility with wordpress and the myriad of plugins it supports, most of which are running on Apache. </p>
<p>It&#8217;s worth remembering that those benchmark don&#8217;t really simulate real-life scenarios. They hit the server with 1,000 request in a short span. Sure, it&#8217;s great for simulating your site getting slashdotted, but it&#8217;s not that realistic. Furthermore, browsing experience is affected by many other factors and elements on the page. Images, javascript, browser caching. The location of the user in relation to the server also makes a huge difference of course. Those tests were running from a close-proximity location with low latency. Doing the same from across the globe might produce much different results.</p>
<p>And of course, I also might have made some blatant configuration mistakes or used sub-optimal settings. I&#8217;d be happy to hear some ideas on how to work things even better!</p>
<h2>Insights</h2>
<p>I was reading a <a href="http://samsaffron.com/archive/2012/03/01/why-upgrading-your-linux-kernel-will-make-your-customers-much-happier?utm_source=Coder+Weekly&amp;utm_campaign=2293b739df-Coder_Weekly_Issue_6&amp;utm_medium=email">fascinating post recently, talking about why the web is slow</a>. I know it&#8217;s not strictly related to my optimization, but there are many things to consider when trying to improve your site&#8217;s performance. Oddly, one of the things that really hit me. hard. The thing that I ignored completely before starting this process, was how much the actual <em>size of the page</em> matters. The few performance benchmarks which I came across online failed to even mention which pages were tested, or used really tiny out-of-the-box wordpress pages. If you really want to boost your site&#8217;s performance &#8211; <strong>make your web pages as small as possible</strong>. I am now starting to experiment with minifying my html using W3TC (A feature which isn&#8217;t enabled by-default), as well as trying to reduce page size and moving across unnecessary stuff to be fetched via ajax.</p>
<h2>Thanks</h2>
<p>Special thanks to <a href="http://www.webfaction.com?affiliate=yoavaner">webfaction</a> support. It seems to me like they went beyond their call of duty to help me install stuff on the host on a couple of occasions. Considering it&#8217;s a shared-hosting provider, they really do a fantastic job. I hope this post might give some pointers to people who want to install nginx, php-fpm, varnish or anything else on webfaction. Perhaps it&#8217;s not as easy as using <code>apt-get install</code>, but it&#8217;s definitely possible.</p>
<h2>Update</h2>
<p>Looking at the results I realised that apache-bench does not use gzip compression by default. However, it can be used with gzip compression, which makes page sizes considerably smaller, and hence the results much faster. I only did a very quick cursary test, and so <em>pretty big page</em> became rather skinny, dropping from around 100kb to only around 15kb (and the response times accordingly). To test your site with gzip switched-on, use</p>
<pre><code>
<pre class="brush: plain; title: ; notranslate">
ab -kc 10 -n 1000 -H 'Accept-Encoding: gzip'
</pre>
<p></code></pre>
<p><!-- #VIMPRESS_TAG# http://blog.gingerlime.com/assets/wpid1058-vimpress_4f64c32b_mkd.txt wpid1058-vimpress_4f64c32b_mkd.txt --></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingerlime.com/how-much-cache-is-too-much/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>encryption is not the right solution</title>
		<link>http://blog.gingerlime.com/encryption-is-not-the-right-solution/</link>
		<comments>http://blog.gingerlime.com/encryption-is-not-the-right-solution/#comments</comments>
		<pubDate>Wed, 04 Jan 2012 19:40:53 +0000</pubDate>
		<dc:creator>Yoav Aner</dc:creator>
				<category><![CDATA[Security]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://blog.gingerlime.com/?p=372</guid>
		<description><![CDATA[When talking about security, the first thing that usually comes to mind is encryption. Spies secretly coding (or de-coding) some secret message that should not be revealed to the enemy. Encryption is this mysterious thing that turns all text into &#8230; <a href="http://blog.gingerlime.com/encryption-is-not-the-right-solution/">Continued</a>]]></description>
			<content:encoded><![CDATA[<p>When talking about security, the first thing that usually comes to mind is encryption. Spies secretly coding (or de-coding) some secret message that should not be revealed to the enemy. Encryption is this mysterious thing that turns all text into a part of the matrix. Developers generally like encryption. It&#8217;s kinda cool. You pass stuff into a function, get some completely scrambled output. Nobody can tell what&#8217;s in there. You pass it back through another function &#8211; the text is clear again. Magic.<br />
<br/><br />
Encryption is cool. It is fundamental to doing lots of things on the Internet. How could you pay with your credit card on Amazon without encryption? How can you check your bank balance? How can MI5 pass their secret messages without Al-Qaida intercepting it?<br />
<br />
But encryption is actually not as useful as people think. It is often used in the wrong place. It can easily give a false sense of security. Why? People forget that encryption, by itself, is usually not sufficient. You <strong>cannot read</strong> the encrypted data. But nothing stops you from <strong>changing it</strong>. In many cases, it is very easy to change encrypted data, <strong>without</strong> knowledge of the encryption key. <span id="more-372"></span>This seemingly small &#8216;misconception&#8217; can lead to some really serious security holes. I think it is primarily a perception issue. As soon as something becomes obfuscated, scrambled, illegible (as encrypted data does), it&#8217;s hard to intuitively figure out the risks to it. People tend to make assumptions that are not always valid about it.<br />
</p>
<h2>WEP</h2>
<p>There must be plenty of examples of when encryption was used incorrectly, and how it lead to unforeseen consequences. The most famous example I am aware of is <a href="http://en.wikipedia.org/wiki/Wired_Equivalent_Privacy">WEP</a>. I won&#8217;t go into the details. There are plenty of resources online describing the flaws in great detail. The essence of it however was that WEP used <strong>relatively good</strong> ciphers <strong>incorrectly</strong>, or insufficiently. The encryption algorithm it used (RC4), was not &#8216;broken&#8217;. The algorithm itself is quite robust (No algorithm is perfect, but this wasn&#8217;t the major flaw in WEP). It&#8217;s the design of WEP, not the core algorithm, that were broken.</p>
<h2>Some illustration</h2>
<p>Lets walk through a simplified, imaginary scenario, to illustrate when NOT to use encryption. It goes something like this:<br />
<br />
Developer: &#8220;We need to allow access to the X section of the website to more people&#8221;<br />
Me: &#8220;Ok, then ask them to login before they can have access&#8221;<br />
Developer: &#8220;No. We want to send them a secure link with expiry date and not all users will have an account&#8221;<br />
Me: &#8220;In that case, we should use something like HMAC or OAUTH. It&#8217;s designed for this purpose&#8221;<br />
Developer: &#8220;No. I don&#8217;t have time to read about Oauth, it&#8217;s too confusing. But I&#8217;ve written a real cool function that encrypts the part of the URL, and only if we can decrypt it properly, we allow access&#8230; It uses Triple-DES, so it&#8217;s super-secure. Wikipedia says that <a href="http://en.wikipedia.org/wiki/Data_Encryption_Standard">The algorithm is believed to be practically secure in the form of Triple DES, although there are theoretical attacks</a>&#8221;<br />
Me: &#8220;Lets have a look at your &#8216;secure&#8217; solution then&#8230;&#8221;<br />
<br />
The solution was something along those lines (hugely simplified for the sake of illustration):</p>
<pre class="brush: plain; title: ; notranslate">

http://site.com/secure-access/{encrypted_string}
</pre>
<p>where encrypted_string contained the expiry date of the page, e.g. 20120315 (15th March 2012).<br />
Can you see where this is flawed??<br />
<br/><br />
There are at least a couple of possible attacks here, that don&#8217;t involve anything with breaking the encryption or trying to get the key:</p>
<ol>
<li>Randomly change the encrypted string &#8211; perhaps not very sophisticated, but since all we need is a date larger than today, there is still a fair bit of chance we might be lucky. Even without calculating the entire search space, it&#8217;s quite easy to see it is quite narrow.</li>
<li>Replace it with another encrypted string &#8211; this is even easier. All you need as another URL string with a known-to-work encrypted string. Copy this string and paste it, and you&#8217;re good to go!</li>
</ol>
<p>
Another flaw the developer hasn&#8217;t worked out was that he was using triple-DES in ECB mode. This is the most basic mode, which makes the attacks I described much easier. So even if the algorithm itself is robust, the way it is applied can be much more important. In addition to that, under some circumstances, if the encryption process failed (which it could easily, since there was no problem manipulating the encrypted string), the code was very kind as to output the decrypted string inside the error message&#8230; Back to the user (or wannabe hacker as the case may be).<br />
<br />
Of course, the solution was a little more complicated. It contained not just an encrypted date, but some other data. Nevertheless, the same principles of attack applies. There are many <a href="http://en.wikipedia.org/wiki/Bit-flipping_attack">bit-flipping</a> attacks, that can alter encrypted data and generate a predictable output. The bottom line is this: <strong>Encrypting data does not prevent modification</strong>.</p>
<h2>The alternative</h2>
<p>In this case, all we want to achieve is authentication. We want to verify that the request was legitimate, and that nobody else can fake such a request. There&#8217;s nothing in particular that needs hiding. The expiry date of the access is not such a big secret. All we need is some kind of a signature, to validate the legitimacy of the request. This is where hash functions, and HMAC / Oauth come into play. Those mechanisms, for some strange reason, are less appealing to many developers. I&#8217;m not entirely sure why, but maybe it&#8217;s just not as fun to see an extra hash at the end of the url as it is to encrypt a string. These mechanisms are much more effective in this case. This is what they&#8217;re designed to do.<br />
<br/><br />
So how does this work? Again, for the sake of simplicity I&#8217;m not going to cover the detailed aspects of those algorithms. But the principle is quite simple: You take a string you want to &#8216;sign&#8217;, plus a secret key, and generate a unique hash value from both of them. This value will be totally different even with the slightest modification to the original string (or the key). The key will never be published or shown, only the string with the generated &#8216;signature&#8217; (this unique hash value we produced). How is this more secure? As I mentioned earlier, even the slightest modification of the string will produce a completely different signature. This will prevent any undetected modification of the url. Producing the unique hash signature is virtually impossible without the knowledge of the secret key. Voila! Simple. Secure. Elegant. Of course, as always, god is in the details, so even these algorithms can be used wrongly, producing an insecure outcome. However, from my experience, following the guidelines and using the oauth/hmac libraries is far easier and less error-prone than using any encryption algorithm. </p>
<h2>Size doesn&#8217;t matter, it&#8217;s how you use it</h2>
<p>The <a href="http://www.isaac.cs.berkeley.edu/isaac/wep-faq.html">Security of the WEP algorithm</a> page sums it up quite nicely: &#8220;The [WEP] protocol&#8217;s problems are a result of misunderstanding of some cryptographic primitives and therefore combining them in insecure ways&#8221;. Even if you pick the best and most secure encryption algorithm, it might not be enough to make your solution secure. In fact, as I tried to illustrate, encryption might not be necessary at all.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingerlime.com/encryption-is-not-the-right-solution/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>dynamic goal values in google analytics</title>
		<link>http://blog.gingerlime.com/dynamic-goal-values-in-google-analytics/</link>
		<comments>http://blog.gingerlime.com/dynamic-goal-values-in-google-analytics/#comments</comments>
		<pubDate>Sat, 31 Dec 2011 00:05:38 +0000</pubDate>
		<dc:creator>Yoav Aner</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[troubleshooting]]></category>

		<guid isPermaLink="false">http://blog.gingerlime.com/?p=511</guid>
		<description><![CDATA[Scoring a goal against google is never easy. Google analytics allows you to do some strange and wonderful things, but not without some teeth grinding. I was struggling with this for a little while, and it was a great source &#8230; <a href="http://blog.gingerlime.com/dynamic-goal-values-in-google-analytics/">Continued</a>]]></description>
			<content:encoded><![CDATA[<p>Scoring a goal against google is never easy. Google analytics allows you to do some strange and wonderful things, but not without some teeth grinding. I was struggling with this for a little while, and it was a great source of frustration, since there&#8217;s hardly any info out there about it. Or maybe there is lots of info, but no solution to this particular problem. I think I finally nailed it. </p>
<h2>Dynamic Goal Conversion Values</h2>
<p>I was trying to get some <strong>dynamic</strong> goal conversion values into Analytics. I ended up reading about <a href="http://code.google.com/apis/analytics/docs/tracking/gaTrackingEcommerce.html#Guidelines">Ecommerce tracking</a> and it seemed like the way to go. Not only would I be able to pick the goal conversion value dynamically, it gives you a breakdown of each and every transaction. Very nice. After implementing it, I was quite impressed to see each transaction, product, sku etc appear neatly on the ecommerce reports. So far so good. But somehow, goals &#8211; which were set on the very same page as the ecommerce tracking code &#8211; failed to add the transaction value. The goals were tracked just fine, I could see them adding up, but not the goal <strong>value</strong>. grrrr&#8230;<br />
<span id="more-511"></span><br />
I was particularly frustrated after meticulously reading through the <a href="http://support.google.com/analytics/bin/answer.py?hl=en&#038;answer=1116091#ecommerce">Ecommerce Transaction Page Goals</a>, which is very clear about this being possible, as long as you set the conversion value to zero.</p>
<h2>The solution</h2>
<p>There&#8217;s a small thing that doesn&#8217;t seem to get mentioned anywhere, and I&#8217;m still not sure why it causes a problem. If it doesn&#8217;t work for you, I have a few alternative approaches worth trying too&#8230;</p>
<p>I discovered that in my case, the google analytics code was included on the goal conversion page twice. Once, as in every page, the standard tracking code on the header. And again on the body, together with the ecommerce tracking values. Both were running fine, and as I mentioned the results appeared. But somehow it seems to create some conflict. Removing the &#8216;standard&#8217; tracking section seemed to have solved this mysterious problem.</p>
<p>Here&#8217;s what the HTML looked like &#8211; Remove the first part</p>
<pre class="brush: xml; title: ; notranslate">
&lt;head&gt;
...
&lt;!-- FIRST SECTION ON THE HEADER : TO REMOVE --&gt;
&lt;script&gt;

  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-xxxxxxxx-x']);
  _gaq.push(['_trackPageview']);

  (function() {
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
  })();

&lt;/script&gt;
&lt;!-- DON'T REMOVE ANY FURTHER --&gt;
&lt;/head&gt;
&lt;body&gt;
...
&lt;!-- SECOND SECTION - KEEP THIS ! --&gt;
&lt;script&gt;
  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-XXXXX-X']);
  _gaq.push(['_trackPageview']); // this tracks the page view
  _gaq.push(['_addTrans',
    '1234',           // order ID - required
    'Acme Clothing',  // affiliation or store name
    '11.99',          // total - required
    '1.29',           // tax
    '5',              // shipping
    'San Jose',       // city
    'California',     // state or province
    'USA'             // country
  ]);

   // add item might be called for every item in the shopping cart
   // where your ecommerce engine loops through each item in the cart and
   // prints out _addItem for each
  _gaq.push(['_addItem',
    '1234',           // order ID - required
    'DD44',           // SKU/code - required
    'T-Shirt',        // product name
    'Green Medium',   // category or variation
    '11.99',          // unit price - required
    '1'               // quantity - required
  ]);
  _gaq.push(['_trackTrans']); //submits transaction to the Analytics servers

  (function() {
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
  })();
&lt;/script&gt;
...
&lt;/body&gt;
</pre>
<h2>Alternative Method</h2>
<p>Since analytics seem to encompass lots of black magic, I&#8217;m still not 100% sure that&#8217;s the only possible solution. However, along the way I discovered some other interesting things&#8230;</p>
<h3>Event Tracking</h3>
<p>Instead of goal tracking by URL, you can actually <a href="http://code.google.com/apis/analytics/docs/tracking/eventTrackerGuide.html">track an event</a>. If you&#8217;ve already coded the ecommerce tracking section anyway, adding event tracking to it is a doddle. Simply add this line somewhere before or after the ecommerce tracking code</p>
<pre class="brush: jscript; title: ; notranslate">
_gaq.push(['_trackEvent', 'Cart', 'Checkout', 'Success', 39.99]);
</pre>
<p>In this case I defined the event category to be &#8216;Cart&#8217;, the action to be &#8216;Checkout&#8217;, the option to &#8216;Success&#8217;, and the value to 39.99. Once you do that, you can set your goal to use an Event instead of a URL. Here&#8217;s more on info on <a href="http://analytics.blogspot.com/2011/04/new-google-analytics-events-goals.html">setting the event goals</a> and choosing the value as the conversion value.</p>
<p>If your main concern is tracking conversion values, and you&#8217;re not too fussed about the whole ecommerce tracking, then this is actually a much easier way. All you need is one line of code. It can however work perfectly well together with your existing ecommerce tracking. </p>
<h2>Wet paint &#8211; slow updates</h2>
<p>Analytics seems to update rather slowly. It also seems to update and un-update some data if you keep refreshing. You see a new transaction. You refresh. It&#8217;s gone. Refresh again, it&#8217;s back&#8230; This is normally not a huge problem, just confusing, but when you&#8217;re trying to debug something, can make things much more difficult. For example, if you <strong>update your Goal configuration</strong>. Say you change the URL, or the tracking event, or added a new goal. It <strong>might take a good few minutes</strong> for it to update. If you trigger the goal, e.g. by going to the URL page, <strong>it might not trigger</strong> &#8211; because Analytics still uses the old goal settings. You simply have no way to find out whether the goal is active or not. You must wait for the paint to dry before you try anything new.</p>
<h2>Other stuff</h2>
<p>A couple of more things to watch out for:</p>
<ul>
<li>The Ecommerce page says that<br />
<blockquote><p>For URL: Supply the URL for your shopping cart. For example: http://www.we-sell-for-you.com/mysite/myCart.asp</p></blockquote>
<p>In most cases, you only need to supply the relative path, e.g. /mysite/myCart.asp
</li>
<li>
I&#8217;m not sure whether it makes a difference, but if you&#8217;re using <strong>_gaq.push(['_setDomainName', '.domain.com']);</strong>, this might also cause some issues? (didn&#8217;t test this explicitly though)
</li>
</ul>
<h2>Final grunt</h2>
<p>Hope this is helping someone. I was pulling my hair over this so hope I can save some other people&#8217;s hairline from receding. The time it takes for the goal/ecommerce tracking to appear on analytics also makes testing it rather long and much more difficult to resolve. Having tested a few too many different combos, I&#8217;m still not 100% sure this is a bullet-proof solution, but I&#8217;m definitely seeing some goal conversion values now.</p>
<p>UPDATE: looks like it will only work if you also use event tracking. With URL tracking alone it doesn&#8217;t seem to pick the goal values.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingerlime.com/dynamic-goal-values-in-google-analytics/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>unicode url double-encoding 404 redirect trick</title>
		<link>http://blog.gingerlime.com/unicode-url-double-encoding-404-redirect-trick/</link>
		<comments>http://blog.gingerlime.com/unicode-url-double-encoding-404-redirect-trick/#comments</comments>
		<pubDate>Thu, 29 Dec 2011 08:39:33 +0000</pubDate>
		<dc:creator>Yoav Aner</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://blog.gingerlime.com/?p=486</guid>
		<description><![CDATA[I&#8217;ve come across a small nuisance that seemed to appear occasionally with unicode urls. Some websites seem to encode/escape/quote urls as soon as they see any symbol (particularly % sign). They appear to assume it needs to be encoded, and &#8230; <a href="http://blog.gingerlime.com/unicode-url-double-encoding-404-redirect-trick/">Continued</a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve come across a small nuisance that seemed to appear occasionally with unicode urls. Some websites seem to encode/escape/quote urls as soon as they see any symbol (particularly % sign). They appear to assume it needs to be encoded, and convert any such character to its URL-Encoded form. For example, percent (%) symbol will convert to %25, ampersand (&#038;) to %26 and so on.</p>
<p>This is not normally a problem, unless the URL is <strong>already</strong> encoded. Since all unicode-based urls use this encoding, they are more prone to these errors. What happens then is that a URL that looks like this:<br />
<a href="http://www.frau-vintage.com/2011/%E3%81%95%E3%81%8F%E3%82%89%E3%82%93%E3%81%BC%E3%81%AE%E3%82%AD%E3%83%83%E3%83%81%E3%83%B3%E3%82%AF%E3%83%AD%E3%82%B9/">http://www.frau-vintage.com/2011/%E3%81%95%E3%81%8F%E3%82%89 &#8230;</a></p>
<p>will be encoded again to this:<br />
<a href="http://www.frau-vintage.com/2011/%25E3%2581%2595%25E3%2581%258F%25E3%2582%2589%25E3%2582%2593%25E3%2581%25BC%25E3%2581%25AE%25E3%2582%25AD%25E3%2583%2583%25E3%2583%2581%25E3%2583%25B3%25E3%2582%25AF%25E3%2583%25AD%25E3%2582%25B9">http://www.frau-vintage.com/2011/%25E3%2581%2595%25E3%25 &#8230;</a></p>
<p>So clicking on such a double-encoded link will unfortunately lead to a 404 page (don&#8217;t try it with the links above, because the workaround was already applied there).</p>
<h2>A workaround</h2>
<p>This workaround is specific to wordpress 404.php, but can be applied quite easily in other frameworks like django, drupal, and maybe even using apache htaccess rule(?).</p>
<pre class="brush: php; title: ; notranslate">

&lt;?php
/* detecting 'double-encoded' urls
 *  if the request uri contain %25 (the urlncoded form of '%' symbol)
 *  within the first few characeters, we try to decode the url and redirect
 */
$pos = strpos($_SERVER['REQUEST_URI'],'%25');
if ($pos!==false &amp;&amp; $pos &lt; 10) :
    header(&quot;Status: 301 Moved Permanently&quot;);
    header(&quot;Location:&quot; . urldecode($_SERVER['REQUEST_URI']));
else:
    get_header(); ?&gt;
    &lt;h2&gt;Error 404 - Page Not Found&lt;/h2&gt;
    &lt;?php get_sidebar(); ?&gt;
    &lt;?php get_footer();
endif; ?&gt;
</pre>
<p>This is placed only in the 404 page. It then grabs the request URI and checks if it contains the string &#8216;%25&#8242; within the first 10 characters (you can modify the check to suit your needs). If it finds it, it redirects to a <strong>urldecoded</strong> version of the same page&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingerlime.com/unicode-url-double-encoding-404-redirect-trick/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>django memory leaks, part II</title>
		<link>http://blog.gingerlime.com/django-memory-leaks-part-ii/</link>
		<comments>http://blog.gingerlime.com/django-memory-leaks-part-ii/#comments</comments>
		<pubDate>Fri, 16 Dec 2011 13:55:30 +0000</pubDate>
		<dc:creator>Yoav Aner</dc:creator>
				<category><![CDATA[django]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://blog.gingerlime.com/?p=449</guid>
		<description><![CDATA[On my previous post I talked about django memory management, the little-known maxrequests parameter in particular, and how it can help &#8216;pop&#8217; some balloons, i.e. kill and restart some django processes in order to release some memory. On this post &#8230; <a href="http://blog.gingerlime.com/django-memory-leaks-part-ii/">Continued</a>]]></description>
			<content:encoded><![CDATA[<p>On my <a href="/django-memory-leaks-part-i">previous post</a> I talked about django memory management, the little-known <strong>maxrequests</strong> parameter in particular, and how it can help &#8216;pop&#8217; some balloons, i.e. kill and restart some django processes in order to release some memory. On this post I&#8217;m going to cover some of the things to do or avoid in order to keep memory usage low from within your code. In addition, I am going to show at least one method to monitor (and act automatically!) when memory usage shoots through the roof.<br />
<span id="more-449"></span></p>
<h2>Efficient code</h2>
<p>Django makes a lot of things very easy. One of its most prominent features is its ORM (Object Relational Mapper). The django ORM really makes it easy to write complex database queries without a single line of SQL. This however might come at a price of sub-optimal queries, which can take a very long time to process, and also consume more memory. If you hit django memory issues, it&#8217;s very likely related to some heavy data being loaded from the database, consuming lots and lots of memory. Luckily, there are a few rules that if followed can hugely improve both the queries and memory usage. Most of those are documented clearly on the <a href="https://docs.djangoproject.com/en/1.2/topics/db/optimization/" title="Database access optimization">Database access optimization</a> page. I won&#8217;t repeat everything that&#8217;s being documented there, but rather focus on a few key points. </p>
<h2>Where do you start?</h2>
<p>First of all, you might already see crashes on your django, get reports from users about pages loading slowly, and you&#8217;ve hopefully checked your server (e.g. using <em>top</em>/<a href="http://htop.sourceforge.net/" title="htop">htop</a> or <em>free</em> commands) and noticed high memory utilization. The recommendations on part I should give a little stability until you manage to figure out where the problem lies, but it is by no means a sufficient solution.</p>
<h3>Monitor log files</h3>
<p>The easiest place to start is your log files. If you&#8217;re not logging requests already, you seriously should. One helpful method that doesn&#8217;t add much overhead to django logging is adding the time that the request took to the logs. This can be achieved with a simple <a href="http://djangosnippets.org/snippets/2624/">logging middleware</a>. The time each request takes usually gives a fair indication of what&#8217;s going on, and from my experience there&#8217;s usually a close link between execution time and memory footprint. The same logging middleware can be used with DEBUG=True to view how many SQL queries are executed for each request. This can give another indication of &#8216;hot-spots&#8217; to look for.</p>
<h3>Profiling</h3>
<p>The next step is to profile django and see which requests consume high amount of memory. I would suggest following the instructions on <a href="http://www.toofishes.net/blog/using-guppy-debug-django-memory-leaks/" title="Using Guppy to debug Django memory leaks">Using Guppy to debug Django memory leaks</a>. This should really help pin-pointing areas of code that consume large amounts of memory. Once those are identified, it is easier to try to optimize the code there. Don&#8217;t try to do everything at once, but from optimizing one area of code you would learn a lot and could easily apply the same methods across the entire codebase.</p>
<h2>Quick-n-dirty improvements</h2>
<p>There are a few recommendations that generally help with reducing memory footprint. Be aware not to use them blindly though. They can have a knock-on effect on performance in other areas. So exercise some judgment. The most useful pointers are:</p>
<ul>
<li>Make sure DEBUG=False</li>
<li><a href="https://docs.djangoproject.com/en/1.2/topics/db/optimization/#use-iterator">.iterator()</a> &#8211; In many cases, there&#8217;s simply a need to retrieve a list of objects from the database, and return those in a certain format. If you go over the list more-or-less sequentially and use each object only once, it makes a lot of sense to use iterator() &#8211; this will eliminate django internal caching, which can save on memory. If, however, you might go back to the same object &#8211; this caching could really speed things up, so be aware of what you&#8217;re doing</li>
<li><a href="https://docs.djangoproject.com/en/1.2/topics/db/optimization/#use-queryset-update-and-delete">Using .update()/.delete()</a> &#8211; can be a huge saver if you perform a simple update of many objects, or delete a bunch of them. Rather than walking through objects one by one, you can perform this operation with one query</li>
<li><a href="https://docs.djangoproject.com/en/1.2/topics/db/optimization/#use-queryset-values-and-values-list">Using .values()</a> &#8211; As above, if all you need is a bunch of values from the database, you can save a lot of memory by fetching them as values instead of as complex queryset objects</li>
<li><a href="https://docs.djangoproject.com/en/1.2/topics/db/optimization/#use-queryset-defer-and-only">Using .only()</a> &#8211; can also save a lot of memory, if you only use a portion of data from each object.</li>
</ul>
<p>These are just a few pointers to the same page on the django documentation. Please spend time reading through the entire page, as it contains many very useful techniques and other information to help you better understand how the ORM works and how to use it more effectively. </p>
<h2>External Monitoring</h2>
<p>Even if you manage to optimize your code, tweak the maxrequests parameter to ensure memory is cleared regularly, and brush your teeth morning and night, there are still cases when it simply won&#8217;t be enough. One user will make too many requests, the data-set will just be too big, or some query will stay sub-optimal, and you&#8217;d still end up with bloated processes eating through your memory. If that happens, you&#8217;re almost back to where we started. This is where some external tools can work wonders. They can:</p>
<ol>
<li>Let you know when this happens, perhaps even before memory is completely depleted.</li>
<li>Take (automatic) action, and more gracefully restart django, to avoid a full-scale crash.</li>
</ol>
<h3>Monit</h3>
<p>My weapon of choice in this case (and many others) is <a href="http://mmonit.com/monit/">Monit</a>. It&#8217;s lightweight, powerful and has a very easy and intuitive configuration syntax, which makes it a snap to use. There are so many uses of this little devil, but in this case I will focus on monitoring process memory. It only takes a few lines to let monit watch over django, and make sure things are running smoothly:</p>
<pre class="brush: plain; title: ; notranslate">
check process your-django-process with pidfile /path/to/your/django.pid
    start program &quot;/etc/init.d/your-django reload&quot;
    stop program &quot;/etc/init.d/your-django stop&quot;
    if totalmem &gt; 70% then exec &quot;/usr/local/bin/highmem&quot; # more about this highmem later...
    if totalmem &gt; 85% for 2 cycles then restart
</pre>
<p>Lets go over this monit config snippet quickly. What it does is very simple:</p>
<ol>
<li>Monit checks our django process. Make sure you specify the correct pid file (which we set when invoking django using manage.py)</li>
<li>start and stop program directives should point to your django daemon script</li>
<li>If the total memory of the django process (including children) exceeds 70% of available memory, then it executes a custom &#8216;highmem&#8217; command. The highmem command will get django to clear some memory by restarting its internal processes. We will cover this command shortly. This will also automatically send an email to alert you (make sure you configure your monit alert settings correctly)</li>
<li>The second check is a &#8220;safety-belt&#8221;, to restart django if memory stays high for too long despite all our efforts</li>
</ol>
<h3>highmem</h3>
<p>The highmem command is a very simple 1 line bash script:</p>
<pre class="brush: plain; title: ; notranslate">
#!/bin/bash
kill -SIGUSR1 `cat /path/to/your/django.pid`
</pre>
<p>All it does is send a SIGUSR1 to our django process. So what does SIGUSR1 do? When django runs in fastcgi, it uses <a href="http://trac.saddi.com/flup">flup</a> for process handling. Since version 1.0.3, flup uses the SIGUSR1 signal to safely respawn those django processes. Popping those balloons. If a request is in-progress, it will wait until it finishes, which is a very nice feature. Just hope that this process waiting to complete won&#8217;t take out ALL memory left&#8230; You can read more about it on <a href="http://rambleon.usebox.net/post/3279121000/how-to-gracefully-restart-django-running-fastcgi">How to Gracefully Restart Django Running FastCGI</a> (look for the comments section in particular). Please note that you&#8217;d need flup version 1.0.3. If you&#8217;re using an older version, it might just kill your django processes instead of respawning them safely. It&#8217;s easy to get flup, simply use <strong>sudo pip install flup</strong> (or <strong>sudo easy_install flup</strong>), and you&#8217;re done.</p>
<h3>The last resort</h3>
<p>The last thing on our arsenal of tools, the last resort, is if some process is eating our memory so fast, that even the highmem command didn&#8217;t manage to release it, and we simply have no choice but to restart django. However, even then, we&#8217;d probably want to do it as gracefully as possible. I&#8217;m not going to repeat it, but the page on <a href="http://rambleon.usebox.net/post/3279121000/how-to-gracefully-restart-django-running-fastcgi">How to Gracefully Restart Django Running FastCGI</a> covers what you need to do to modify your manage.py in order to give django just a few seconds before it&#8217;s restarted. Then just make sure that the scripts starting/restarting django send a KILL -HUP to your django process to restart it nicely.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingerlime.com/django-memory-leaks-part-ii/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>django memory leaks, part I</title>
		<link>http://blog.gingerlime.com/django-memory-leaks-part-i/</link>
		<comments>http://blog.gingerlime.com/django-memory-leaks-part-i/#comments</comments>
		<pubDate>Sun, 11 Dec 2011 12:36:43 +0000</pubDate>
		<dc:creator>Yoav Aner</dc:creator>
				<category><![CDATA[django]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://blog.gingerlime.com/?p=398</guid>
		<description><![CDATA[A while ago I was working on optimizing memory use for some django instances. During that process, I managed to better understand memory management within django, and thought it would be nice to share some of those insights. This is &#8230; <a href="http://blog.gingerlime.com/django-memory-leaks-part-i/">Continued</a>]]></description>
			<content:encoded><![CDATA[<p>A while ago I was working on optimizing memory use for some django instances. During that process, I managed to better understand memory management within django, and thought it would be nice to share some of those insights. This is by no means a definitive guide. It&#8217;s likely to have some mistakes, but I think it helped me grasp the configuration options better, and allowed easier optimization.</p>
<h2>Does django leak memory?</h2>
<p>In actual fact, No. It doesn&#8217;t. The title is therefore misleading. I know. However, if you&#8217;re not careful, your memory usage or configuration can easily lead to exhausting all memory and crashing django. So whilst django itself doesn&#8217;t leak memory, the end result is very similar.</p>
<h2>Memory management in Django &#8211; with (bad) illustrations</h2>
<p>Lets start with the basics. Lets look at a django process. A django process is a basic unit that handles requests from users. We have several of those on the server, to allow handling more than one request at the time. Each process however handles one request at any given time.</p>
<p>But lets look at just one.</p>
<div class="illustration">
<a href="http://dyt9j4djd5di6.cloudfront.net/assets/slide1.png"><img src="http://dyt9j4djd5di6.cloudfront.net/assets/slide1-150x150.png" alt="" title="django process" width="150" height="150" class="aligncenter size-medium wp-image-400" /></a>
</div>
<p>cute, isn&#8217;t it? it&#8217;s a little like a balloon actually (and balloons are generally cute). The balloon has a certain initial size to allow the process to do all the stuff it needs to. Lets say this is balloon size 1.<br />
<span id="more-398"></span><br />
Now every request that comes to the server gets sent to one of those (cute) django processes. Then to serve the request, the process loads objects into memory. Like this</p>
<div class="illustration">
<a href="http://dyt9j4djd5di6.cloudfront.net/assets/slide2.png"><img src="http://dyt9j4djd5di6.cloudfront.net/assets/slide2.png" alt="" title="django process with objects" width="149" height="142" class="aligncenter size-thumbnail wp-image-405" /></a>
</div>
<p>Those little bubbles are the objects loaded into memory. Once the process finishes processing a request it will clear all the objects from memory and go back to being &#8216;empty&#8217;. It is still size 1 since all the objects fitted within the space.</p>
<p>But some time the request is a bit heavier. It needs to load more objects than its size.</p>
<div class="illustration">
<a href="http://dyt9j4djd5di6.cloudfront.net/assets/slide5.png"><img src="http://dyt9j4djd5di6.cloudfront.net/assets/slide5-e1323604194844.png" alt="" title="django process full" width="149" height="143" class="aligncenter size-full wp-image-411" /></a>
</div>
<p>So the process simply inflates itself and grows a little. Easy. Now it&#8217;s size 2. More space for bubbles.</p>
<div class="illustration">
<a href="http://dyt9j4djd5di6.cloudfront.net/assets/slide6.png"><img src="http://dyt9j4djd5di6.cloudfront.net/assets/slide6-150x150.png" alt="" title="django process after inflation" width="150" height="150" class="aligncenter size-thumbnail wp-image-413" /></a>
</div>
<p>and of course, once the request finishes, it clears all those bubbles and there&#8217;s space for the next ones.</p>
<div class="illustration">
<a href="http://dyt9j4djd5di6.cloudfront.net/assets/slide8.png"><img src="http://dyt9j4djd5di6.cloudfront.net/assets/slide8-150x150.png" alt="" title="django process with cleared objects" width="150" height="150" class="aligncenter size-thumbnail wp-image-419" /></a>
</div>
<p>An import thing to note: The balloon (process) never shrinks. It can only grow. But this is (kind-of) ok, since it will never grow bigger than the biggest request we can get. So even a very big request (lets say one that uses 1Gb memory), we can probably handle. Right??</p>
<p>Not quite. So what&#8217;s the problem?</p>
<p>Well, like this little cute process we have other processes. Remember we have to serve more than one user at a time. So we must keep a few of those balloons running. So if more than one BIG request come at roughly the same time, they will inflate not just one balloon, but a few of those. And these balloons compete for space on the server (which is like a big room that contains the balloons, but the room does not grow).<br />
This is our room:</p>
<div class="illustration">
<a href="http://dyt9j4djd5di6.cloudfront.net/assets/slide9.png"><img src="http://dyt9j4djd5di6.cloudfront.net/assets/slide9-212x300.png" alt="" title="django processes on the server" width="212" height="300" class="aligncenter size-medium wp-image-422" /></a>
</div>
<p>Of course we can clear the room and start empty &#8211; this is what we do when we reboot the server or even just restart django. This is what we have to do when the server crashes. When in fact the balloons grew so big that other balloons couldn&#8217;t grow any more. So rebooting all the time is not an option. When we do that, everything stops. Including requests that are being processed. Even if they&#8217;re half-way through.</p>
<p>So &#8211; how about we &#8216;pop&#8217; those balloons every now and then &#8211; when they&#8217;re NOT processing a request (other balloons do it), and start from a small balloon? That&#8217;s actually possible. However, there are two limitations to be aware of:</p>
<ul>
<li>To create a new balloon takes some effort. We have to &#8216;make&#8217; the balloon. While we make a balloon the others are responding slower.</li>
<li>We cannot just &#8216;pop&#8217; a balloon based on its size. Instead we can only create an &#8216;automatic balloon popper&#8217; that pops the balloon after X requests.</li>
</ul>
<p>Our degree of control is as follows:</p>
<ol>
<li><strong>minspare</strong> &#8211; How many empty balloons do we start with. This will potentially save us effort later by having a few ready. The &#8216;cost&#8217; in term of memory is the {number of balloons} X {balloon initial size}. The benefit, is saving time creating a new process for simultaneous requests. However, This parameter is not very helpful to our problem.</li>
<li><strong>maxchildren</strong>/<strong>maxspare</strong> &#8211; What is the maximum number of balloons/processes we want to have on the system. This determines the maximum number of simultaneous requests we can deal with. The &#8216;cost&#8217; is the {number of balloons} X {balloon size}. The balloon size can obviously grow over time!</li>
<li><strong>maxrequests</strong> &#8211; this is the &#8216;auto-popper&#8217;. We can decide after how many requests we &#8216;pop&#8217; a balloon and start a new one.</li>
</ol>
<p>So if we set <strong>maxrequests</strong> too low, say 1 &#8211; then the system will work very hard to create a new process/balloon for every request. This is silly if the request is very small and doesn&#8217;t need a big balloon. With too high value however, the balloons might grow too much before they&#8217;re popped. Even if the maxrequests is 1, if we get a few requests at the same time, each causing our balloons to grow too much, we might still run out of space!</p>
<p>Our worse-case scenario is calculated by : {number of simultaneous requests} X {size of the request}. Lets say our server have 4Gb memory in total, which probably leaves about 3Gb memory for django itself. However, with requests that might take ~1Gb in memory (worst-case-scenario), we can only serve a maximum of 3 such requests. Not even simultaneously. Just in proximity to each other, before the server runs out of memory&#8230; </p>
<h2>Conclusion</h2>
<p>One of the core issues I wasn&#8217;t addressing here is obviously how to prevent high-memory usage within the django process. I hope to cover this on the next part. There are certainly some recommendations and best-practices when it comes to memory usage. However, with some types of requests, it might be impossible to avoid high-memory usage. Given enough simultaneous requests, even with optimization that leads to &#8216;reasonable&#8217; memory utilization, django might <em>still</em> run out of memory. The minspare, maxchildren, maxspare, and most importantly maxrequests parameters are therefore crucial to having a more stable django service. It&#8217;s not a bullet-proof solution, but from my experience it helps a lot. </p>
<h2>Sweet-Spot settings</h2>
<p>So what are my recommended settings? I found that setting maxrequests=100 seems to give a reasonably good performance overall. Simply run django in prefork mode with something like this:</p>
<pre class="brush: plain; title: ; notranslate">
manage.py runfcgi method=prefork host=$DAEMON_HOST port=$BACKUP_DAEMON_PORT pidfile=$PIDFILE maxrequests=100
</pre>
<p>I didn&#8217;t see any need to change the default minspare, maxchildren, or maxspare parameters however.</p>
<h2>What&#8217;s next?</h2>
<p>On Part II I am going to cover some more advanced tweaks. Those are designed to detect and recover from situations where django runs out of memory. Using the balloon popping analogy, those tweaks/methods allow &#8216;popping balloons&#8217; when memory runs out, rather than only after 100 requests. This gives another layer of protection against memory-related crashes. However, these require monitoring tools outside django. In addition, I hope to give at least some pointers on how to better utilize memory within the code.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingerlime.com/django-memory-leaks-part-i/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>timthumb vulnerability</title>
		<link>http://blog.gingerlime.com/timthumb-vulnerability/</link>
		<comments>http://blog.gingerlime.com/timthumb-vulnerability/#comments</comments>
		<pubDate>Thu, 04 Aug 2011 23:37:48 +0000</pubDate>
		<dc:creator>Yoav Aner</dc:creator>
				<category><![CDATA[Security]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://blog.gingerlime.com/?p=358</guid>
		<description><![CDATA[About a month ago I posted about tweaking timthumb to work with CDN. Timthumb is a great script, but great scripts also have bugs. A recently discovered one is a rather serious bug. It can allow attackers to inject arbitrary &#8230; <a href="http://blog.gingerlime.com/timthumb-vulnerability/">Continued</a>]]></description>
			<content:encoded><![CDATA[<p>About a month ago I <a href="http://blog.gingerlime.com/thumbs-up">posted</a> about tweaking timthumb to work with CDN. Timthumb is a great script, but great scripts also have bugs. A <a href="http://markmaunder.com/2011/zero-day-vulnerability-in-many-wordpress-themes/">recently discovered</a> one is a rather serious bug. It can allow attackers to inject arbitrary php code onto your site, and from there onwards, pretty much take control over it.</p>
<p>Luckily no websites I know or maintain were affected, possibly since the htaccess change I used shouldn&#8217;t allow using remote URLs in the first place (and also it renamed timthumb.php from the url string, making it slightly obfuscated). I still very strongly advise anybody using timthumb to upgrade to the <a href="https://code.google.com/p/timthumb/source/browse/trunk/timthumb.php">latest version</a> to avoid risks.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingerlime.com/timthumb-vulnerability/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ajaxizing</title>
		<link>http://blog.gingerlime.com/ajaxizing/</link>
		<comments>http://blog.gingerlime.com/ajaxizing/#comments</comments>
		<pubDate>Sun, 26 Jun 2011 13:48:30 +0000</pubDate>
		<dc:creator>Yoav Aner</dc:creator>
				<category><![CDATA[Security]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://blog.gingerlime.com/?p=313</guid>
		<description><![CDATA[Following from my previous post, I&#8217;ve come across another issue related to caching in wordpress: dynamic content. There&#8217;s a constant trade-off between caching and dynamic content. If you want your content to be truly dynamic, you can&#8217;t cache it properly. &#8230; <a href="http://blog.gingerlime.com/ajaxizing/">Continued</a>]]></description>
			<content:encoded><![CDATA[<p>Following from my <a href="/thumbs-up">previous post</a>, I&#8217;ve come across another issue related to caching in wordpress: dynamic content. There&#8217;s a constant trade-off between caching and dynamic content. If you want your content to be truly dynamic, you can&#8217;t cache it properly. If you cache the whole page, it won&#8217;t show the latest update. W3 Total Cache, WP Super Cache and others have some workarounds for this. For example, W3TC has something called <a href="http://wordpress.stackexchange.com/questions/7112/w3-total-cache-cache-refresh-programmatically">fragment caching</a>. So if you have a widget that displays dynamic content, you can use fragment caching to prevent caching. However, from what I worked out, all it does is essentially prevent the page with the fragment from being fully cached, which defeats the purpose of caching (especially if this widget is on the sidebar of all pages).<br />
<br />
The best solution for these cases is using ajax, to asynchronously pull dynamic content from the server using Javascript. So whilst many plugins already support ajax, and can load data dynamically for you, many others don&#8217;t. So what can you do if you have a plugin that you use, and you want to &#8216;ajaxize&#8217; it?? Well, there are a few solutions out there. For example <a href="http://omninoggin.com/wordpress-posts/make-any-plugin-work-with-wp-super-cache/">this post</a> shows you how to do it, and works quite well.<br />
<br />
The thing is, I wanted to take it a step further. If I can do it by following this manual process, why can&#8217;t I use a plugin that, erm, &#8216;ajaxizes&#8217; other plugins?? I tried to search for solutions, but found none. So I decided to write one myself. It&#8217;s my first &#8216;proper&#8217; plugin, but I think it works pretty well. <span id="more-313"></span><br />
</p>
<h3 class="storytitle"><a href="http://wordpress.org/extend/plugins/ajaxize/">Ajaxize</a></h3>
<p>The plugin allows you to take any wordpress function and &#8216;ajaxize&#8217; it into a special div. Typically, all plugins and core wordpress functionality boils down to a number of php functions. If you are able to figure out which function your plugin uses to output content, you can ajaxize it. How do you find the function name? This is not that complicated. Many plugins will actually tell you which function to use. For example, if you want to embed the output in one of your templates. They will instruct you to use something like this:</p>
<pre class="brush: php; title: ; notranslate">
&lt;?php echo plugin_function_name(); ?&gt;
</pre>
<p>So all you have to do is take this &#8216;plugin_function_name&#8217;, and ajaxize it using my plugin. The output is a div which looks like this:</p>
<pre class="brush: xml; title: ; notranslate">
&lt;div id=&quot;ajaxize_this:plugin_function_name:68e46660f7ce3bc77a51465219df5743879544bc&quot;&gt;&lt;/div&gt;
</pre>
<p>Then place this div inside your page, post, widget, header, anywhere really. The ajaxize plugin adds a small javascript that will find this div, and convert it automatically into an ajax call for you!<br />
<br/><br />
There are a few limitations though:</p>
<ul>
<li>Functions must return valid HTML &#8211; this will be called in php and returned via the Ajax call</li>
<li>Functions cannot accept any parameters (at least at the moment)</li>
<li><del>Functions that work within a context (e.g. of a post, page, category), will most likely lose the context information</del></li>
</ul>
<p>UPDATE: version 1.1 of the plugin now handles context much better. Ajaxize is now hooking in the right place, so the ajax call is made exactly where the div element is placed. This means plugins that use a post/category/taxonomy context information can now also be ajaxized. Special thanks to <a href="http://digitalnature.eu/">One Trick Pony</a> for helping me <a href="http://wordpress.stackexchange.com/questions/21526/how-to-get-context-information-inside-my-funcion/21529#21529">figure out</a> how to hook this correctly.<br />
<br/><br />
This was a perfect solution for mixing caching with dynamic content. I can convert almost any plugin or widget into a div. I can also write very simple PHP functions that will show dynamic content on my pages, with zero extra javascript code. The div itself can be cached, but the content will be pulled automatically by the browser when the page loads. I also found it useful for loading plugin buttons like Facebook like and Twitter tweet. Those can take a while to load and slow the page. When converted via ajaxize, they still take a while to load, but don&#8217;t seem to hold the page content from loading first.<br />
</p>
<h3> What about security? </h3>
<p>Some of you may have already started thinking&#8230; &#8220;but hang on a minute. If you can ajaxize one function, what stops somebody from calling other functions on my wordpress??!!&#8221;. Very true. This is why ajaxize was built-in with security in mind. It uses a very powerful algorithm called <a href="http://en.wikipedia.org/wiki/HMAC">HMAC</a>, with a secret key, so you can use ajax on any function you like, but only those functions and not others. This also means zero-configuration. The plugin only stores one value in the database &#8211; this is your secret key! I might cover the security aspects of the plugin on a future post. I encourage people to look through the code and validate the security I&#8217;ve implemented.<br />
<br />
Feel free to try it out and let me know what you think!<br />
<br/></p>
<div class="buttons">
<a id="download_plugin" class="button_big big_green float-left" href="http://wordpress.org/extend/plugins/ajaxize/">Get it here<i> </i></a>
</div>
<p><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingerlime.com/ajaxizing/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced
Content Delivery Network via Amazon Web Services: CloudFront: dyt9j4djd5di6.cloudfront.net

Served from: blog.gingerlime.com @ 2012-05-20 12:11:33 -->
