<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Gingerlime &#187; Technology</title>
	<atom:link href="http://blog.gingerlime.com/category/technology/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.gingerlime.com</link>
	<description></description>
	<lastBuildDate>Sun, 29 Jan 2012 14:44:09 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
		<item>
		<title>encryption is not the right solution</title>
		<link>http://blog.gingerlime.com/encryption-is-not-the-right-solution/</link>
		<comments>http://blog.gingerlime.com/encryption-is-not-the-right-solution/#comments</comments>
		<pubDate>Wed, 04 Jan 2012 19:40:53 +0000</pubDate>
		<dc:creator>Yoav Aner</dc:creator>
				<category><![CDATA[Security]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://blog.gingerlime.com/?p=372</guid>
		<description><![CDATA[When talking about security, the first thing that usually comes to mind is encryption. Spies secretly coding (or de-coding) some secret message that should not be revealed to the enemy. Encryption is this mysterious thing that turns all text into &#8230; <a href="http://blog.gingerlime.com/encryption-is-not-the-right-solution/">Continued</a>]]></description>
			<content:encoded><![CDATA[<p>When talking about security, the first thing that usually comes to mind is encryption. Spies secretly coding (or de-coding) some secret message that should not be revealed to the enemy. Encryption is this mysterious thing that turns all text into a part of the matrix. Developers generally like encryption. It&#8217;s kinda cool. You pass stuff into a function, get some completely scrambled output. Nobody can tell what&#8217;s in there. You pass it back through another function &#8211; the text is clear again. Magic.<br />
<br/><br />
Encryption is cool. It is fundamental to doing lots of things on the Internet. How could you pay with your credit card on Amazon without encryption? How can you check your bank balance? How can MI5 pass their secret messages without Al-Qaida intercepting it?<br />
<br />
But encryption is actually not as useful as people think. It is often used in the wrong place. It can easily give a false sense of security. Why? People forget that encryption, by itself, is usually not sufficient. You <strong>cannot read</strong> the encrypted data. But nothing stops you from <strong>changing it</strong>. In many cases, it is very easy to change encrypted data, <strong>without</strong> knowledge of the encryption key. <span id="more-372"></span>This seemingly small &#8216;misconception&#8217; can lead to some really serious security holes. I think it is primarily a perception issue. As soon as something becomes obfuscated, scrambled, illegible (as encrypted data does), it&#8217;s hard to intuitively figure out the risks to it. People tend to make assumptions that are not always valid about it.<br />
</p>
<h2>WEP</h2>
<p>There must be plenty of examples of when encryption was used incorrectly, and how it lead to unforeseen consequences. The most famous example I am aware of is <a href="http://en.wikipedia.org/wiki/Wired_Equivalent_Privacy">WEP</a>. I won&#8217;t go into the details. There are plenty of resources online describing the flaws in great detail. The essence of it however was that WEP used <strong>relatively good</strong> ciphers <strong>incorrectly</strong>, or insufficiently. The encryption algorithm it used (RC4), was not &#8216;broken&#8217;. The algorithm itself is quite robust (No algorithm is perfect, but this wasn&#8217;t the major flaw in WEP). It&#8217;s the design of WEP, not the core algorithm, that were broken.</p>
<h2>Some illustration</h2>
<p>Lets walk through a simplified, imaginary scenario, to illustrate when NOT to use encryption. It goes something like this:<br />
<br />
Developer: &#8220;We need to allow access to the X section of the website to more people&#8221;<br />
Me: &#8220;Ok, then ask them to login before they can have access&#8221;<br />
Developer: &#8220;No. We want to send them a secure link with expiry date and not all users will have an account&#8221;<br />
Me: &#8220;In that case, we should use something like HMAC or OAUTH. It&#8217;s designed for this purpose&#8221;<br />
Developer: &#8220;No. I don&#8217;t have time to read about Oauth, it&#8217;s too confusing. But I&#8217;ve written a real cool function that encrypts the part of the URL, and only if we can decrypt it properly, we allow access&#8230; It uses Triple-DES, so it&#8217;s super-secure. Wikipedia says that <a href="http://en.wikipedia.org/wiki/Data_Encryption_Standard">The algorithm is believed to be practically secure in the form of Triple DES, although there are theoretical attacks</a>&#8221;<br />
Me: &#8220;Lets have a look at your &#8216;secure&#8217; solution then&#8230;&#8221;<br />
<br />
The solution was something along those lines (hugely simplified for the sake of illustration):</p>
<pre class="brush: plain; title: ; notranslate">

http://site.com/secure-access/{encrypted_string}
</pre>
<p>where encrypted_string contained the expiry date of the page, e.g. 20120315 (15th March 2012).<br />
Can you see where this is flawed??<br />
<br/><br />
There are at least a couple of possible attacks here, that don&#8217;t involve anything with breaking the encryption or trying to get the key:</p>
<ol>
<li>Randomly change the encrypted string &#8211; perhaps not very sophisticated, but since all we need is a date larger than today, there is still a fair bit of chance we might be lucky. Even without calculating the entire search space, it&#8217;s quite easy to see it is quite narrow.</li>
<li>Replace it with another encrypted string &#8211; this is even easier. All you need as another URL string with a known-to-work encrypted string. Copy this string and paste it, and you&#8217;re good to go!</li>
</ol>
<p>
Another flaw the developer hasn&#8217;t worked out was that he was using triple-DES in ECB mode. This is the most basic mode, which makes the attacks I described much easier. So even if the algorithm itself is robust, the way it is applied can be much more important. In addition to that, under some circumstances, if the encryption process failed (which it could easily, since there was no problem manipulating the encrypted string), the code was very kind as to output the decrypted string inside the error message&#8230; Back to the user (or wannabe hacker as the case may be).<br />
<br />
Of course, the solution was a little more complicated. It contained not just an encrypted date, but some other data. Nevertheless, the same principles of attack applies. There are many <a href="http://en.wikipedia.org/wiki/Bit-flipping_attack">bit-flipping</a> attacks, that can alter encrypted data and generate a predictable output. The bottom line is this: <strong>Encrypting data does not prevent modification</strong>.</p>
<h2>The alternative</h2>
<p>In this case, all we want to achieve is authentication. We want to verify that the request was legitimate, and that nobody else can fake such a request. There&#8217;s nothing in particular that needs hiding. The expiry date of the access is not such a big secret. All we need is some kind of a signature, to validate the legitimacy of the request. This is where hash functions, and HMAC / Oauth come into play. Those mechanisms, for some strange reason, are less appealing to many developers. I&#8217;m not entirely sure why, but maybe it&#8217;s just not as fun to see an extra hash at the end of the url as it is to encrypt a string. These mechanisms are much more effective in this case. This is what they&#8217;re designed to do.<br />
<br/><br />
So how does this work? Again, for the sake of simplicity I&#8217;m not going to cover the detailed aspects of those algorithms. But the principle is quite simple: You take a string you want to &#8216;sign&#8217;, plus a secret key, and generate a unique hash value from both of them. This value will be totally different even with the slightest modification to the original string (or the key). The key will never be published or shown, only the string with the generated &#8216;signature&#8217; (this unique hash value we produced). How is this more secure? As I mentioned earlier, even the slightest modification of the string will produce a completely different signature. This will prevent any undetected modification of the url. Producing the unique hash signature is virtually impossible without the knowledge of the secret key. Voila! Simple. Secure. Elegant. Of course, as always, god is in the details, so even these algorithms can be used wrongly, producing an insecure outcome. However, from my experience, following the guidelines and using the oauth/hmac libraries is far easier and less error-prone than using any encryption algorithm. </p>
<h2>Size doesn&#8217;t matter, it&#8217;s how you use it</h2>
<p>The <a href="http://www.isaac.cs.berkeley.edu/isaac/wep-faq.html">Security of the WEP algorithm</a> page sums it up quite nicely: &#8220;The [WEP] protocol&#8217;s problems are a result of misunderstanding of some cryptographic primitives and therefore combining them in insecure ways&#8221;. Even if you pick the best and most secure encryption algorithm, it might not be enough to make your solution secure. In fact, as I tried to illustrate, encryption might not be necessary at all.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingerlime.com/encryption-is-not-the-right-solution/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>dynamic goal values in google analytics</title>
		<link>http://blog.gingerlime.com/dynamic-goal-values-in-google-analytics/</link>
		<comments>http://blog.gingerlime.com/dynamic-goal-values-in-google-analytics/#comments</comments>
		<pubDate>Sat, 31 Dec 2011 00:05:38 +0000</pubDate>
		<dc:creator>Yoav Aner</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[troubleshooting]]></category>

		<guid isPermaLink="false">http://blog.gingerlime.com/?p=511</guid>
		<description><![CDATA[Scoring a goal against google is never easy. Google analytics allows you to do some strange and wonderful things, but not without some teeth grinding. I was struggling with this for a little while, and it was a great source &#8230; <a href="http://blog.gingerlime.com/dynamic-goal-values-in-google-analytics/">Continued</a>]]></description>
			<content:encoded><![CDATA[<p>Scoring a goal against google is never easy. Google analytics allows you to do some strange and wonderful things, but not without some teeth grinding. I was struggling with this for a little while, and it was a great source of frustration, since there&#8217;s hardly any info out there about it. Or maybe there is lots of info, but no solution to this particular problem. I think I finally nailed it. </p>
<h2>Dynamic Goal Conversion Values</h2>
<p>I was trying to get some <strong>dynamic</strong> goal conversion values into Analytics. I ended up reading about <a href="http://code.google.com/apis/analytics/docs/tracking/gaTrackingEcommerce.html#Guidelines">Ecommerce tracking</a> and it seemed like the way to go. Not only would I be able to pick the goal conversion value dynamically, it gives you a breakdown of each and every transaction. Very nice. After implementing it, I was quite impressed to see each transaction, product, sku etc appear neatly on the ecommerce reports. So far so good. But somehow, goals &#8211; which were set on the very same page as the ecommerce tracking code &#8211; failed to add the transaction value. The goals were tracked just fine, I could see them adding up, but not the goal <strong>value</strong>. grrrr&#8230;<br />
<span id="more-511"></span><br />
I was particularly frustrated after meticulously reading through the <a href="http://support.google.com/analytics/bin/answer.py?hl=en&#038;answer=1116091#ecommerce">Ecommerce Transaction Page Goals</a>, which is very clear about this being possible, as long as you set the conversion value to zero.</p>
<h2>The solution</h2>
<p>There&#8217;s a small thing that doesn&#8217;t seem to get mentioned anywhere, and I&#8217;m still not sure why it causes a problem. If it doesn&#8217;t work for you, I have a few alternative approaches worth trying too&#8230;</p>
<p>I discovered that in my case, the google analytics code was included on the goal conversion page twice. Once, as in every page, the standard tracking code on the header. And again on the body, together with the ecommerce tracking values. Both were running fine, and as I mentioned the results appeared. But somehow it seems to create some conflict. Removing the &#8216;standard&#8217; tracking section seemed to have solved this mysterious problem.</p>
<p>Here&#8217;s what the HTML looked like &#8211; Remove the first part</p>
<pre class="brush: xml; title: ; notranslate">
&lt;head&gt;
...
&lt;!-- FIRST SECTION ON THE HEADER : TO REMOVE --&gt;
&lt;script&gt;

  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-xxxxxxxx-x']);
  _gaq.push(['_trackPageview']);

  (function() {
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
  })();

&lt;/script&gt;
&lt;!-- DON'T REMOVE ANY FURTHER --&gt;
&lt;/head&gt;
&lt;body&gt;
...
&lt;!-- SECOND SECTION - KEEP THIS ! --&gt;
&lt;script&gt;
  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-XXXXX-X']);
  _gaq.push(['_trackPageview']); // this tracks the page view
  _gaq.push(['_addTrans',
    '1234',           // order ID - required
    'Acme Clothing',  // affiliation or store name
    '11.99',          // total - required
    '1.29',           // tax
    '5',              // shipping
    'San Jose',       // city
    'California',     // state or province
    'USA'             // country
  ]);

   // add item might be called for every item in the shopping cart
   // where your ecommerce engine loops through each item in the cart and
   // prints out _addItem for each
  _gaq.push(['_addItem',
    '1234',           // order ID - required
    'DD44',           // SKU/code - required
    'T-Shirt',        // product name
    'Green Medium',   // category or variation
    '11.99',          // unit price - required
    '1'               // quantity - required
  ]);
  _gaq.push(['_trackTrans']); //submits transaction to the Analytics servers

  (function() {
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
  })();
&lt;/script&gt;
...
&lt;/body&gt;
</pre>
<h2>Alternative Method</h2>
<p>Since analytics seem to encompass lots of black magic, I&#8217;m still not 100% sure that&#8217;s the only possible solution. However, along the way I discovered some other interesting things&#8230;</p>
<h3>Event Tracking</h3>
<p>Instead of goal tracking by URL, you can actually <a href="http://code.google.com/apis/analytics/docs/tracking/eventTrackerGuide.html">track an event</a>. If you&#8217;ve already coded the ecommerce tracking section anyway, adding event tracking to it is a doddle. Simply add this line somewhere before or after the ecommerce tracking code</p>
<pre class="brush: jscript; title: ; notranslate">
_gaq.push(['_trackEvent', 'Cart', 'Checkout', 'Success', 39.99]);
</pre>
<p>In this case I defined the event category to be &#8216;Cart&#8217;, the action to be &#8216;Checkout&#8217;, the option to &#8216;Success&#8217;, and the value to 39.99. Once you do that, you can set your goal to use an Event instead of a URL. Here&#8217;s more on info on <a href="http://analytics.blogspot.com/2011/04/new-google-analytics-events-goals.html">setting the event goals</a> and choosing the value as the conversion value.</p>
<p>If your main concern is tracking conversion values, and you&#8217;re not too fussed about the whole ecommerce tracking, then this is actually a much easier way. All you need is one line of code. It can however work perfectly well together with your existing ecommerce tracking. </p>
<h2>Wet paint &#8211; slow updates</h2>
<p>Analytics seems to update rather slowly. It also seems to update and un-update some data if you keep refreshing. You see a new transaction. You refresh. It&#8217;s gone. Refresh again, it&#8217;s back&#8230; This is normally not a huge problem, just confusing, but when you&#8217;re trying to debug something, can make things much more difficult. For example, if you <strong>update your Goal configuration</strong>. Say you change the URL, or the tracking event, or added a new goal. It <strong>might take a good few minutes</strong> for it to update. If you trigger the goal, e.g. by going to the URL page, <strong>it might not trigger</strong> &#8211; because Analytics still uses the old goal settings. You simply have no way to find out whether the goal is active or not. You must wait for the paint to dry before you try anything new.</p>
<h2>Other stuff</h2>
<p>A couple of more things to watch out for:</p>
<ul>
<li>The Ecommerce page says that<br />
<blockquote><p>For URL: Supply the URL for your shopping cart. For example: http://www.we-sell-for-you.com/mysite/myCart.asp</p></blockquote>
<p>In most cases, you only need to supply the relative path, e.g. /mysite/myCart.asp
</li>
<li>
I&#8217;m not sure whether it makes a difference, but if you&#8217;re using <strong>_gaq.push(['_setDomainName', '.domain.com']);</strong>, this might also cause some issues? (didn&#8217;t test this explicitly though)
</li>
</ul>
<h2>Final grunt</h2>
<p>Hope this is helping someone. I was pulling my hair over this so hope I can save some other people&#8217;s hairline from receding. The time it takes for the goal/ecommerce tracking to appear on analytics also makes testing it rather long and much more difficult to resolve. Having tested a few too many different combos, I&#8217;m still not 100% sure this is a bullet-proof solution, but I&#8217;m definitely seeing some goal conversion values now.</p>
<p>UPDATE: looks like it will only work if you also use event tracking. With URL tracking alone it doesn&#8217;t seem to pick the goal values.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingerlime.com/dynamic-goal-values-in-google-analytics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>unicode url double-encoding 404 redirect trick</title>
		<link>http://blog.gingerlime.com/unicode-url-double-encoding-404-redirect-trick/</link>
		<comments>http://blog.gingerlime.com/unicode-url-double-encoding-404-redirect-trick/#comments</comments>
		<pubDate>Thu, 29 Dec 2011 08:39:33 +0000</pubDate>
		<dc:creator>Yoav Aner</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://blog.gingerlime.com/?p=486</guid>
		<description><![CDATA[I&#8217;ve come across a small nuisance that seemed to appear occasionally with unicode urls. Some websites seem to encode/escape/quote urls as soon as they see any symbol (particularly % sign). They appear to assume it needs to be encoded, and &#8230; <a href="http://blog.gingerlime.com/unicode-url-double-encoding-404-redirect-trick/">Continued</a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve come across a small nuisance that seemed to appear occasionally with unicode urls. Some websites seem to encode/escape/quote urls as soon as they see any symbol (particularly % sign). They appear to assume it needs to be encoded, and convert any such character to its URL-Encoded form. For example, percent (%) symbol will convert to %25, ampersand (&#038;) to %26 and so on.</p>
<p>This is not normally a problem, unless the URL is <strong>already</strong> encoded. Since all unicode-based urls use this encoding, they are more prone to these errors. What happens then is that a URL that looks like this:<br />
<a href="http://www.frau-vintage.com/2011/%E3%81%95%E3%81%8F%E3%82%89%E3%82%93%E3%81%BC%E3%81%AE%E3%82%AD%E3%83%83%E3%83%81%E3%83%B3%E3%82%AF%E3%83%AD%E3%82%B9/">http://www.frau-vintage.com/2011/%E3%81%95%E3%81%8F%E3%82%89 &#8230;</a></p>
<p>will be encoded again to this:<br />
<a href="http://www.frau-vintage.com/2011/%25E3%2581%2595%25E3%2581%258F%25E3%2582%2589%25E3%2582%2593%25E3%2581%25BC%25E3%2581%25AE%25E3%2582%25AD%25E3%2583%2583%25E3%2583%2581%25E3%2583%25B3%25E3%2582%25AF%25E3%2583%25AD%25E3%2582%25B9">http://www.frau-vintage.com/2011/%25E3%2581%2595%25E3%25 &#8230;</a></p>
<p>So clicking on such a double-encoded link will unfortunately lead to a 404 page (don&#8217;t try it with the links above, because the workaround was already applied there).</p>
<h2>A workaround</h2>
<p>This workaround is specific to wordpress 404.php, but can be applied quite easily in other frameworks like django, drupal, and maybe even using apache htaccess rule(?).</p>
<pre class="brush: php; title: ; notranslate">

&lt;?php
/* detecting 'double-encoded' urls
 *  if the request uri contain %25 (the urlncoded form of '%' symbol)
 *  within the first few characeters, we try to decode the url and redirect
 */
$pos = strpos($_SERVER['REQUEST_URI'],'%25');
if ($pos!==false &amp;&amp; $pos &lt; 10) :
    header(&quot;Status: 301 Moved Permanently&quot;);
    header(&quot;Location:&quot; . urldecode($_SERVER['REQUEST_URI']));
else:
    get_header(); ?&gt;
    &lt;h2&gt;Error 404 - Page Not Found&lt;/h2&gt;
    &lt;?php get_sidebar(); ?&gt;
    &lt;?php get_footer();
endif; ?&gt;
</pre>
<p>This is placed only in the 404 page. It then grabs the request URI and checks if it contains the string &#8216;%25&#8242; within the first 10 characters (you can modify the check to suit your needs). If it finds it, it redirects to a <strong>urldecoded</strong> version of the same page&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingerlime.com/unicode-url-double-encoding-404-redirect-trick/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>django memory leaks, part II</title>
		<link>http://blog.gingerlime.com/django-memory-leaks-part-ii/</link>
		<comments>http://blog.gingerlime.com/django-memory-leaks-part-ii/#comments</comments>
		<pubDate>Fri, 16 Dec 2011 13:55:30 +0000</pubDate>
		<dc:creator>Yoav Aner</dc:creator>
				<category><![CDATA[django]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://blog.gingerlime.com/?p=449</guid>
		<description><![CDATA[On my previous post I talked about django memory management, the little-known maxrequests parameter in particular, and how it can help &#8216;pop&#8217; some balloons, i.e. kill and restart some django processes in order to release some memory. On this post &#8230; <a href="http://blog.gingerlime.com/django-memory-leaks-part-ii/">Continued</a>]]></description>
			<content:encoded><![CDATA[<p>On my <a href="/django-memory-leaks-part-i">previous post</a> I talked about django memory management, the little-known <strong>maxrequests</strong> parameter in particular, and how it can help &#8216;pop&#8217; some balloons, i.e. kill and restart some django processes in order to release some memory. On this post I&#8217;m going to cover some of the things to do or avoid in order to keep memory usage low from within your code. In addition, I am going to show at least one method to monitor (and act automatically!) when memory usage shoots through the roof.<br />
<span id="more-449"></span></p>
<h2>Efficient code</h2>
<p>Django makes a lot of things very easy. One of its most prominent features is its ORM (Object Relational Mapper). The django ORM really makes it easy to write complex database queries without a single line of SQL. This however might come at a price of sub-optimal queries, which can take a very long time to process, and also consume more memory. If you hit django memory issues, it&#8217;s very likely related to some heavy data being loaded from the database, consuming lots and lots of memory. Luckily, there are a few rules that if followed can hugely improve both the queries and memory usage. Most of those are documented clearly on the <a href="https://docs.djangoproject.com/en/1.2/topics/db/optimization/" title="Database access optimization">Database access optimization</a> page. I won&#8217;t repeat everything that&#8217;s being documented there, but rather focus on a few key points. </p>
<h2>Where do you start?</h2>
<p>First of all, you might already see crashes on your django, get reports from users about pages loading slowly, and you&#8217;ve hopefully checked your server (e.g. using <em>top</em>/<a href="http://htop.sourceforge.net/" title="htop">htop</a> or <em>free</em> commands) and noticed high memory utilization. The recommendations on part I should give a little stability until you manage to figure out where the problem lies, but it is by no means a sufficient solution.</p>
<h3>Monitor log files</h3>
<p>The easiest place to start is your log files. If you&#8217;re not logging requests already, you seriously should. One helpful method that doesn&#8217;t add much overhead to django logging is adding the time that the request took to the logs. This can be achieved with a simple <a href="http://djangosnippets.org/snippets/2624/">logging middleware</a>. The time each request takes usually gives a fair indication of what&#8217;s going on, and from my experience there&#8217;s usually a close link between execution time and memory footprint. The same logging middleware can be used with DEBUG=True to view how many SQL queries are executed for each request. This can give another indication of &#8216;hot-spots&#8217; to look for.</p>
<h3>Profiling</h3>
<p>The next step is to profile django and see which requests consume high amount of memory. I would suggest following the instructions on <a href="http://www.toofishes.net/blog/using-guppy-debug-django-memory-leaks/" title="Using Guppy to debug Django memory leaks">Using Guppy to debug Django memory leaks</a>. This should really help pin-pointing areas of code that consume large amounts of memory. Once those are identified, it is easier to try to optimize the code there. Don&#8217;t try to do everything at once, but from optimizing one area of code you would learn a lot and could easily apply the same methods across the entire codebase.</p>
<h2>Quick-n-dirty improvements</h2>
<p>There are a few recommendations that generally help with reducing memory footprint. Be aware not to use them blindly though. They can have a knock-on effect on performance in other areas. So exercise some judgment. The most useful pointers are:</p>
<ul>
<li>Make sure DEBUG=False</li>
<li><a href="https://docs.djangoproject.com/en/1.2/topics/db/optimization/#use-iterator">.iterator()</a> &#8211; In many cases, there&#8217;s simply a need to retrieve a list of objects from the database, and return those in a certain format. If you go over the list more-or-less sequentially and use each object only once, it makes a lot of sense to use iterator() &#8211; this will eliminate django internal caching, which can save on memory. If, however, you might go back to the same object &#8211; this caching could really speed things up, so be aware of what you&#8217;re doing</li>
<li><a href="https://docs.djangoproject.com/en/1.2/topics/db/optimization/#use-queryset-update-and-delete">Using .update()/.delete()</a> &#8211; can be a huge saver if you perform a simple update of many objects, or delete a bunch of them. Rather than walking through objects one by one, you can perform this operation with one query</li>
<li><a href="https://docs.djangoproject.com/en/1.2/topics/db/optimization/#use-queryset-values-and-values-list">Using .values()</a> &#8211; As above, if all you need is a bunch of values from the database, you can save a lot of memory by fetching them as values instead of as complex queryset objects</li>
<li><a href="https://docs.djangoproject.com/en/1.2/topics/db/optimization/#use-queryset-defer-and-only">Using .only()</a> &#8211; can also save a lot of memory, if you only use a portion of data from each object.</li>
</ul>
<p>These are just a few pointers to the same page on the django documentation. Please spend time reading through the entire page, as it contains many very useful techniques and other information to help you better understand how the ORM works and how to use it more effectively. </p>
<h2>External Monitoring</h2>
<p>Even if you manage to optimize your code, tweak the maxrequests parameter to ensure memory is cleared regularly, and brush your teeth morning and night, there are still cases when it simply won&#8217;t be enough. One user will make too many requests, the data-set will just be too big, or some query will stay sub-optimal, and you&#8217;d still end up with bloated processes eating through your memory. If that happens, you&#8217;re almost back to where we started. This is where some external tools can work wonders. They can:</p>
<ol>
<li>Let you know when this happens, perhaps even before memory is completely depleted.</li>
<li>Take (automatic) action, and more gracefully restart django, to avoid a full-scale crash.</li>
</ol>
<h3>Monit</h3>
<p>My weapon of choice in this case (and many others) is <a href="http://mmonit.com/monit/">Monit</a>. It&#8217;s lightweight, powerful and has a very easy and intuitive configuration syntax, which makes it a snap to use. There are so many uses of this little devil, but in this case I will focus on monitoring process memory. It only takes a few lines to let monit watch over django, and make sure things are running smoothly:</p>
<pre class="brush: plain; title: ; notranslate">
check process your-django-process with pidfile /path/to/your/django.pid
    start program &quot;/etc/init.d/your-django reload&quot;
    stop program &quot;/etc/init.d/your-django stop&quot;
    if totalmem &gt; 70% then exec &quot;/usr/local/bin/highmem&quot; # more about this highmem later...
    if totalmem &gt; 85% for 2 cycles then restart
</pre>
<p>Lets go over this monit config snippet quickly. What it does is very simple:</p>
<ol>
<li>Monit checks our django process. Make sure you specify the correct pid file (which we set when invoking django using manage.py)</li>
<li>start and stop program directives should point to your django daemon script</li>
<li>If the total memory of the django process (including children) exceeds 70% of available memory, then it executes a custom &#8216;highmem&#8217; command. The highmem command will get django to clear some memory by restarting its internal processes. We will cover this command shortly. This will also automatically send an email to alert you (make sure you configure your monit alert settings correctly)</li>
<li>The second check is a &#8220;safety-belt&#8221;, to restart django if memory stays high for too long despite all our efforts</li>
</ol>
<h3>highmem</h3>
<p>The highmem command is a very simple 1 line bash script:</p>
<pre class="brush: plain; title: ; notranslate">
#!/bin/bash
kill -SIGUSR1 `cat /path/to/your/django.pid`
</pre>
<p>All it does is send a SIGUSR1 to our django process. So what does SIGUSR1 do? When django runs in fastcgi, it uses <a href="http://trac.saddi.com/flup">flup</a> for process handling. Since version 1.0.3, flup uses the SIGUSR1 signal to safely respawn those django processes. Popping those balloons. If a request is in-progress, it will wait until it finishes, which is a very nice feature. Just hope that this process waiting to complete won&#8217;t take out ALL memory left&#8230; You can read more about it on <a href="http://rambleon.usebox.net/post/3279121000/how-to-gracefully-restart-django-running-fastcgi">How to Gracefully Restart Django Running FastCGI</a> (look for the comments section in particular). Please note that you&#8217;d need flup version 1.0.3. If you&#8217;re using an older version, it might just kill your django processes instead of respawning them safely. It&#8217;s easy to get flup, simply use <strong>sudo pip install flup</strong> (or <strong>sudo easy_install flup</strong>), and you&#8217;re done.</p>
<h3>The last resort</h3>
<p>The last thing on our arsenal of tools, the last resort, is if some process is eating our memory so fast, that even the highmem command didn&#8217;t manage to release it, and we simply have no choice but to restart django. However, even then, we&#8217;d probably want to do it as gracefully as possible. I&#8217;m not going to repeat it, but the page on <a href="http://rambleon.usebox.net/post/3279121000/how-to-gracefully-restart-django-running-fastcgi">How to Gracefully Restart Django Running FastCGI</a> covers what you need to do to modify your manage.py in order to give django just a few seconds before it&#8217;s restarted. Then just make sure that the scripts starting/restarting django send a KILL -HUP to your django process to restart it nicely.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingerlime.com/django-memory-leaks-part-ii/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>django memory leaks, part I</title>
		<link>http://blog.gingerlime.com/django-memory-leaks-part-i/</link>
		<comments>http://blog.gingerlime.com/django-memory-leaks-part-i/#comments</comments>
		<pubDate>Sun, 11 Dec 2011 12:36:43 +0000</pubDate>
		<dc:creator>Yoav Aner</dc:creator>
				<category><![CDATA[django]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://blog.gingerlime.com/?p=398</guid>
		<description><![CDATA[A while ago I was working on optimizing memory use for some django instances. During that process, I managed to better understand memory management within django, and thought it would be nice to share some of those insights. This is &#8230; <a href="http://blog.gingerlime.com/django-memory-leaks-part-i/">Continued</a>]]></description>
			<content:encoded><![CDATA[<p>A while ago I was working on optimizing memory use for some django instances. During that process, I managed to better understand memory management within django, and thought it would be nice to share some of those insights. This is by no means a definitive guide. It&#8217;s likely to have some mistakes, but I think it helped me grasp the configuration options better, and allowed easier optimization.</p>
<h2>Does django leak memory?</h2>
<p>In actual fact, No. It doesn&#8217;t. The title is therefore misleading. I know. However, if you&#8217;re not careful, your memory usage or configuration can easily lead to exhausting all memory and crashing django. So whilst django itself doesn&#8217;t leak memory, the end result is very similar.</p>
<h2>Memory management in Django &#8211; with (bad) illustrations</h2>
<p>Lets start with the basics. Lets look at a django process. A django process is a basic unit that handles requests from users. We have several of those on the server, to allow handling more than one request at the time. Each process however handles one request at any given time.</p>
<p>But lets look at just one.</p>
<div class="illustration">
<a href="http://dyt9j4djd5di6.cloudfront.net/assets/slide1.png"><img src="http://dyt9j4djd5di6.cloudfront.net/assets/slide1-150x150.png" alt="" title="django process" width="150" height="150" class="aligncenter size-medium wp-image-400" /></a>
</div>
<p>cute, isn&#8217;t it? it&#8217;s a little like a balloon actually (and balloons are generally cute). The balloon has a certain initial size to allow the process to do all the stuff it needs to. Lets say this is balloon size 1.<br />
<span id="more-398"></span><br />
Now every request that comes to the server gets sent to one of those (cute) django processes. Then to serve the request, the process loads objects into memory. Like this</p>
<div class="illustration">
<a href="http://dyt9j4djd5di6.cloudfront.net/assets/slide2.png"><img src="http://dyt9j4djd5di6.cloudfront.net/assets/slide2.png" alt="" title="django process with objects" width="149" height="142" class="aligncenter size-thumbnail wp-image-405" /></a>
</div>
<p>Those little bubbles are the objects loaded into memory. Once the process finishes processing a request it will clear all the objects from memory and go back to being &#8216;empty&#8217;. It is still size 1 since all the objects fitted within the space.</p>
<p>But some time the request is a bit heavier. It needs to load more objects than its size.</p>
<div class="illustration">
<a href="http://dyt9j4djd5di6.cloudfront.net/assets/slide5.png"><img src="http://dyt9j4djd5di6.cloudfront.net/assets/slide5-e1323604194844.png" alt="" title="django process full" width="149" height="143" class="aligncenter size-full wp-image-411" /></a>
</div>
<p>So the process simply inflates itself and grows a little. Easy. Now it&#8217;s size 2. More space for bubbles.</p>
<div class="illustration">
<a href="http://dyt9j4djd5di6.cloudfront.net/assets/slide6.png"><img src="http://dyt9j4djd5di6.cloudfront.net/assets/slide6-150x150.png" alt="" title="django process after inflation" width="150" height="150" class="aligncenter size-thumbnail wp-image-413" /></a>
</div>
<p>and of course, once the request finishes, it clears all those bubbles and there&#8217;s space for the next ones.</p>
<div class="illustration">
<a href="http://dyt9j4djd5di6.cloudfront.net/assets/slide8.png"><img src="http://dyt9j4djd5di6.cloudfront.net/assets/slide8-150x150.png" alt="" title="django process with cleared objects" width="150" height="150" class="aligncenter size-thumbnail wp-image-419" /></a>
</div>
<p>An import thing to note: The balloon (process) never shrinks. It can only grow. But this is (kind-of) ok, since it will never grow bigger than the biggest request we can get. So even a very big request (lets say one that uses 1Gb memory), we can probably handle. Right??</p>
<p>Not quite. So what&#8217;s the problem?</p>
<p>Well, like this little cute process we have other processes. Remember we have to serve more than one user at a time. So we must keep a few of those balloons running. So if more than one BIG request come at roughly the same time, they will inflate not just one balloon, but a few of those. And these balloons compete for space on the server (which is like a big room that contains the balloons, but the room does not grow).<br />
This is our room:</p>
<div class="illustration">
<a href="http://dyt9j4djd5di6.cloudfront.net/assets/slide9.png"><img src="http://dyt9j4djd5di6.cloudfront.net/assets/slide9-212x300.png" alt="" title="django processes on the server" width="212" height="300" class="aligncenter size-medium wp-image-422" /></a>
</div>
<p>Of course we can clear the room and start empty &#8211; this is what we do when we reboot the server or even just restart django. This is what we have to do when the server crashes. When in fact the balloons grew so big that other balloons couldn&#8217;t grow any more. So rebooting all the time is not an option. When we do that, everything stops. Including requests that are being processed. Even if they&#8217;re half-way through.</p>
<p>So &#8211; how about we &#8216;pop&#8217; those balloons every now and then &#8211; when they&#8217;re NOT processing a request (other balloons do it), and start from a small balloon? That&#8217;s actually possible. However, there are two limitations to be aware of:</p>
<ul>
<li>To create a new balloon takes some effort. We have to &#8216;make&#8217; the balloon. While we make a balloon the others are responding slower.</li>
<li>We cannot just &#8216;pop&#8217; a balloon based on its size. Instead we can only create an &#8216;automatic balloon popper&#8217; that pops the balloon after X requests.</li>
</ul>
<p>Our degree of control is as follows:</p>
<ol>
<li><strong>minspare</strong> &#8211; How many empty balloons do we start with. This will potentially save us effort later by having a few ready. The &#8216;cost&#8217; in term of memory is the {number of balloons} X {balloon initial size}. The benefit, is saving time creating a new process for simultaneous requests. However, This parameter is not very helpful to our problem.</li>
<li><strong>maxchildren</strong>/<strong>maxspare</strong> &#8211; What is the maximum number of balloons/processes we want to have on the system. This determines the maximum number of simultaneous requests we can deal with. The &#8216;cost&#8217; is the {number of balloons} X {balloon size}. The balloon size can obviously grow over time!</li>
<li><strong>maxrequests</strong> &#8211; this is the &#8216;auto-popper&#8217;. We can decide after how many requests we &#8216;pop&#8217; a balloon and start a new one.</li>
</ol>
<p>So if we set <strong>maxrequests</strong> too low, say 1 &#8211; then the system will work very hard to create a new process/balloon for every request. This is silly if the request is very small and doesn&#8217;t need a big balloon. With too high value however, the balloons might grow too much before they&#8217;re popped. Even if the maxrequests is 1, if we get a few requests at the same time, each causing our balloons to grow too much, we might still run out of space!</p>
<p>Our worse-case scenario is calculated by : {number of simultaneous requests} X {size of the request}. Lets say our server have 4Gb memory in total, which probably leaves about 3Gb memory for django itself. However, with requests that might take ~1Gb in memory (worst-case-scenario), we can only serve a maximum of 3 such requests. Not even simultaneously. Just in proximity to each other, before the server runs out of memory&#8230; </p>
<h2>Conclusion</h2>
<p>One of the core issues I wasn&#8217;t addressing here is obviously how to prevent high-memory usage within the django process. I hope to cover this on the next part. There are certainly some recommendations and best-practices when it comes to memory usage. However, with some types of requests, it might be impossible to avoid high-memory usage. Given enough simultaneous requests, even with optimization that leads to &#8216;reasonable&#8217; memory utilization, django might <em>still</em> run out of memory. The minspare, maxchildren, maxspare, and most importantly maxrequests parameters are therefore crucial to having a more stable django service. It&#8217;s not a bullet-proof solution, but from my experience it helps a lot. </p>
<h2>Sweet-Spot settings</h2>
<p>So what are my recommended settings? I found that setting maxrequests=100 seems to give a reasonably good performance overall. Simply run django in prefork mode with something like this:</p>
<pre class="brush: plain; title: ; notranslate">
manage.py runfcgi method=prefork host=$DAEMON_HOST port=$BACKUP_DAEMON_PORT pidfile=$PIDFILE maxrequests=100
</pre>
<p>I didn&#8217;t see any need to change the default minspare, maxchildren, or maxspare parameters however.</p>
<h2>What&#8217;s next?</h2>
<p>On Part II I am going to cover some more advanced tweaks. Those are designed to detect and recover from situations where django runs out of memory. Using the balloon popping analogy, those tweaks/methods allow &#8216;popping balloons&#8217; when memory runs out, rather than only after 100 requests. This gives another layer of protection against memory-related crashes. However, these require monitoring tools outside django. In addition, I hope to give at least some pointers on how to better utilize memory within the code.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingerlime.com/django-memory-leaks-part-i/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>timthumb vulnerability</title>
		<link>http://blog.gingerlime.com/timthumb-vulnerability/</link>
		<comments>http://blog.gingerlime.com/timthumb-vulnerability/#comments</comments>
		<pubDate>Thu, 04 Aug 2011 23:37:48 +0000</pubDate>
		<dc:creator>Yoav Aner</dc:creator>
				<category><![CDATA[Security]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://blog.gingerlime.com/?p=358</guid>
		<description><![CDATA[About a month ago I posted about tweaking timthumb to work with CDN. Timthumb is a great script, but great scripts also have bugs. A recently discovered one is a rather serious bug. It can allow attackers to inject arbitrary &#8230; <a href="http://blog.gingerlime.com/timthumb-vulnerability/">Continued</a>]]></description>
			<content:encoded><![CDATA[<p>About a month ago I <a href="http://blog.gingerlime.com/thumbs-up">posted</a> about tweaking timthumb to work with CDN. Timthumb is a great script, but great scripts also have bugs. A <a href="http://markmaunder.com/2011/zero-day-vulnerability-in-many-wordpress-themes/">recently discovered</a> one is a rather serious bug. It can allow attackers to inject arbitrary php code onto your site, and from there onwards, pretty much take control over it.</p>
<p>Luckily no websites I know or maintain were affected, possibly since the htaccess change I used shouldn&#8217;t allow using remote URLs in the first place (and also it renamed timthumb.php from the url string, making it slightly obfuscated). I still very strongly advise anybody using timthumb to upgrade to the <a href="https://code.google.com/p/timthumb/source/browse/trunk/timthumb.php">latest version</a> to avoid risks.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingerlime.com/timthumb-vulnerability/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ajaxizing</title>
		<link>http://blog.gingerlime.com/ajaxizing/</link>
		<comments>http://blog.gingerlime.com/ajaxizing/#comments</comments>
		<pubDate>Sun, 26 Jun 2011 13:48:30 +0000</pubDate>
		<dc:creator>Yoav Aner</dc:creator>
				<category><![CDATA[Security]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://blog.gingerlime.com/?p=313</guid>
		<description><![CDATA[Following from my previous post, I&#8217;ve come across another issue related to caching in wordpress: dynamic content. There&#8217;s a constant trade-off between caching and dynamic content. If you want your content to be truly dynamic, you can&#8217;t cache it properly. &#8230; <a href="http://blog.gingerlime.com/ajaxizing/">Continued</a>]]></description>
			<content:encoded><![CDATA[<p>Following from my <a href="/thumbs-up">previous post</a>, I&#8217;ve come across another issue related to caching in wordpress: dynamic content. There&#8217;s a constant trade-off between caching and dynamic content. If you want your content to be truly dynamic, you can&#8217;t cache it properly. If you cache the whole page, it won&#8217;t show the latest update. W3 Total Cache, WP Super Cache and others have some workarounds for this. For example, W3TC has something called <a href="http://wordpress.stackexchange.com/questions/7112/w3-total-cache-cache-refresh-programmatically">fragment caching</a>. So if you have a widget that displays dynamic content, you can use fragment caching to prevent caching. However, from what I worked out, all it does is essentially prevent the page with the fragment from being fully cached, which defeats the purpose of caching (especially if this widget is on the sidebar of all pages).<br />
<br />
The best solution for these cases is using ajax, to asynchronously pull dynamic content from the server using Javascript. So whilst many plugins already support ajax, and can load data dynamically for you, many others don&#8217;t. So what can you do if you have a plugin that you use, and you want to &#8216;ajaxize&#8217; it?? Well, there are a few solutions out there. For example <a href="http://omninoggin.com/wordpress-posts/make-any-plugin-work-with-wp-super-cache/">this post</a> shows you how to do it, and works quite well.<br />
<br />
The thing is, I wanted to take it a step further. If I can do it by following this manual process, why can&#8217;t I use a plugin that, erm, &#8216;ajaxizes&#8217; other plugins?? I tried to search for solutions, but found none. So I decided to write one myself. It&#8217;s my first &#8216;proper&#8217; plugin, but I think it works pretty well. <span id="more-313"></span><br />
</p>
<h3 class="storytitle"><a href="http://wordpress.org/extend/plugins/ajaxize/">Ajaxize</a></h3>
<p>The plugin allows you to take any wordpress function and &#8216;ajaxize&#8217; it into a special div. Typically, all plugins and core wordpress functionality boils down to a number of php functions. If you are able to figure out which function your plugin uses to output content, you can ajaxize it. How do you find the function name? This is not that complicated. Many plugins will actually tell you which function to use. For example, if you want to embed the output in one of your templates. They will instruct you to use something like this:</p>
<pre class="brush: php; title: ; notranslate">
&lt;?php echo plugin_function_name(); ?&gt;
</pre>
<p>So all you have to do is take this &#8216;plugin_function_name&#8217;, and ajaxize it using my plugin. The output is a div which looks like this:</p>
<pre class="brush: xml; title: ; notranslate">
&lt;div id=&quot;ajaxize_this:plugin_function_name:68e46660f7ce3bc77a51465219df5743879544bc&quot;&gt;&lt;/div&gt;
</pre>
<p>Then place this div inside your page, post, widget, header, anywhere really. The ajaxize plugin adds a small javascript that will find this div, and convert it automatically into an ajax call for you!<br />
<br/><br />
There are a few limitations though:</p>
<ul>
<li>Functions must return valid HTML &#8211; this will be called in php and returned via the Ajax call</li>
<li>Functions cannot accept any parameters (at least at the moment)</li>
<li><del>Functions that work within a context (e.g. of a post, page, category), will most likely lose the context information</del></li>
</ul>
<p>UPDATE: version 1.1 of the plugin now handles context much better. Ajaxize is now hooking in the right place, so the ajax call is made exactly where the div element is placed. This means plugins that use a post/category/taxonomy context information can now also be ajaxized. Special thanks to <a href="http://digitalnature.eu/">One Trick Pony</a> for helping me <a href="http://wordpress.stackexchange.com/questions/21526/how-to-get-context-information-inside-my-funcion/21529#21529">figure out</a> how to hook this correctly.<br />
<br/><br />
This was a perfect solution for mixing caching with dynamic content. I can convert almost any plugin or widget into a div. I can also write very simple PHP functions that will show dynamic content on my pages, with zero extra javascript code. The div itself can be cached, but the content will be pulled automatically by the browser when the page loads. I also found it useful for loading plugin buttons like Facebook like and Twitter tweet. Those can take a while to load and slow the page. When converted via ajaxize, they still take a while to load, but don&#8217;t seem to hold the page content from loading first.<br />
</p>
<h3> What about security? </h3>
<p>Some of you may have already started thinking&#8230; &#8220;but hang on a minute. If you can ajaxize one function, what stops somebody from calling other functions on my wordpress??!!&#8221;. Very true. This is why ajaxize was built-in with security in mind. It uses a very powerful algorithm called <a href="http://en.wikipedia.org/wiki/HMAC">HMAC</a>, with a secret key, so you can use ajax on any function you like, but only those functions and not others. This also means zero-configuration. The plugin only stores one value in the database &#8211; this is your secret key! I might cover the security aspects of the plugin on a future post. I encourage people to look through the code and validate the security I&#8217;ve implemented.<br />
<br />
Feel free to try it out and let me know what you think!<br />
<br/></p>
<div class="buttons">
<a id="download_plugin" class="button_big big_green float-left" href="http://wordpress.org/extend/plugins/ajaxize/">Get it here<i> </i></a>
</div>
<p><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingerlime.com/ajaxizing/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>thumbs up</title>
		<link>http://blog.gingerlime.com/thumbs-up/</link>
		<comments>http://blog.gingerlime.com/thumbs-up/#comments</comments>
		<pubDate>Sun, 12 Jun 2011 19:36:45 +0000</pubDate>
		<dc:creator>Yoav Aner</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://blog.gingerlime.com/?p=297</guid>
		<description><![CDATA[[IMPORTANT: please check that you have the latest version of timthumb! older versions might have a serious security vulnerability. A little more about it here] I&#8217;ve been recently trying to optimize a wordpress based site. It was running fine, but &#8230; <a href="http://blog.gingerlime.com/thumbs-up/">Continued</a>]]></description>
			<content:encoded><![CDATA[<p>[IMPORTANT: please check that you have the latest version of timthumb! older versions might have a serious security vulnerability. A little more about it <a href="http://blog.gingerlime.com/timthumb-vulnerability">here</a>]<br />
<br />
I&#8217;ve been recently trying to optimize a wordpress based site. It was running fine, but I wanted to run it even faster, and make the best use of resources. So I ended up picking <a href="http://www.w3-edge.com/wordpress-plugins/w3-total-cache/">W3 Total Cache (W3TC)</a>. It&#8217;s very robust and highly configurable, if perhaps a bit complicated to fully figure out. So eventually things were running fine, and my next task was to boost it even further by using a Content Delivery Network (CDN). In this case, the choice was <a href="http://aws.amazon.com/cloudfront/">Amazon Cloudfront</a>. The <a href="http://aws.amazon.com/releasenotes/CloudFront/3318264461288616">recent release</a> allowed managing custom origin from the console, which made things even easier. One of the remaining issues however, was trying to optimize <a href="http://www.binarymoon.co.uk/projects/timthumb/">timthumb</a>.<br />
<br />
Timthumb was already included with the theme, and I liked the way it works. It allowed some neat features, like fitting screenshots nicely, and also fitting company logos well within a fixed size (with <a href="http://www.binarymoon.co.uk/2011/03/timthumb-proportional-scaling-security-improvements/">zc=2</a> option). Google search has led me to a couple of sources. However, for some reason none of them worked, so I ended using a slightly different solution&#8230; <span id="more-297"></span><br />
<br />
The <a href="http://www.binarymoon.co.uk/2010/11/timthumb-cdn-amazon-s3-good/">first article</a> was about using timthumb with a cdn and it&#8217;s on the timthumb website itself. I tried following the advice there, and even some of the comments that followed the post*, but Amazon cloudfront wouldn&#8217;t accept a php file. Or otherwise, I suspect it removed the query string from the request, which produced this error:</p>
<pre class="brush: bash; title: ; notranslate">
no image specified
Query String :
TimThumb version : 1.28
</pre>
<p>
The <a href="http://return-true.com/2010/05/optimizing-wordpress-for-shared-hosting/">second article</a> gave me the idea for a solution actually. It talks about optimizing timthumb with apache rewrite rules, such that once a thumbnail is generated, it can be served directly by apache, bypassing any php processing on subsequent access to thumbnails. Nice idea, and looks like a good solution too, but it had a couple of disadvantages for me:<br />
1. It required modifying timthumb.php. It&#8217;s not a huge issue, but for future compatibility, allowing an easy update of timthumb to the next version etc, I didn&#8217;t like patching the code<br />
2. It didn&#8217;t support CDN. (Well, it didn&#8217;t intend to do that I guess).<br />
<br />
My solution in a way is a cocktail of both suggestions, which somehow produced an even simpler solution. The idea in principle was very simple: Change <strong>timthumb.php?src=&#8230;</strong> url to a &#8220;cdn-friendly&#8221; url. i.e. something without php, and without a real query string, Something like <strong>http://my.host.com/cdn-thumb/src=&#8230;</strong> &#8211; which will in turn also work on <strong>http://my.cdn.com/cdn-thumb/src=&#8230;</strong><br />
<br />
All it took was a fairly simple Apache rewrite rule. I don&#8217;t really like rewrite rules and however many times I try to figure it out, I always get confused and decide to abandon it. This time it was simple enough even for me to work it out. </p>
<pre class="brush: bash; title: ; notranslate">
&lt;IfModule mod_rewrite.c&gt;
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^cdn-thumb/(.*)$ /relative/path/to/thumb.php?$1 [L]
&lt;/IfModule&gt;
</pre>
<p>Place this inside any folder along the wordpress path. Then change from <strong>thumb.php?src=&#8230;</strong> to <strong>cdn-thumb/src=</strong> in your templates<br />
<br />
You can add this rule to any folder within your wordpress installation. It can go in the root, or in your theme sub-path. The path to the &#8216;real&#8217; php however is relative to the root of the site, not the OS path. For example:<br />
<br />
If your root wordpress installation is in <strong>/home/joe/wordpress</strong> and thumb.php is in <strong>/home/joe/wordpress/wp-content/themes/xyz/thumb.php</strong> then use
<pre class="brush: bash; title: ; notranslate">RewriteRule ^cdn-thumb/(.*)$ /wp-content/themes/xyz/thumb.php?$1 [L]</pre>
<p></p>
<p>* There was a <a href="http://www.binarymoon.co.uk/2010/11/timthumb-cdn-amazon-s3-good/comment-page-1/#comment-33563">similar solution</a> on timthumb&#8217;s website. However it also required changes to timthumb.php&#8230;<br />
<br/></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingerlime.com/thumbs-up/feed/</wfw:commentRss>
		<slash:comments>45</slash:comments>
		</item>
		<item>
		<title>timing is everything</title>
		<link>http://blog.gingerlime.com/timing-is-everything/</link>
		<comments>http://blog.gingerlime.com/timing-is-everything/#comments</comments>
		<pubDate>Sun, 08 May 2011 11:40:22 +0000</pubDate>
		<dc:creator>Yoav Aner</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://blog.gingerlime.com/?p=290</guid>
		<description><![CDATA[A quick-tip on the importance of timestamps and making sure your time zone is set correctly. I was recently playing around with fail2ban. It&#8217;s a really cool little tool that monitors your log files, matches certain patterns, and can act &#8230; <a href="http://blog.gingerlime.com/timing-is-everything/">Continued</a>]]></description>
			<content:encoded><![CDATA[<p>A quick-tip on the importance of timestamps and making sure your time zone is set correctly. </p>
<p>I was recently playing around with <a href="http://www.fail2ban.org/">fail2ban</a>. It&#8217;s a really cool little tool that monitors your log files, matches certain patterns, and can act on it. Fail2ban would typically monitor your authentication log file, and if for example it spots 5 or more consecutive failures, it would simply add a filter to your iptables to block this IP address for a certain amount of time. I like fail2ban because it&#8217;s simple and effective. It does not try to be too sophisticated, or have too many features. It does one thing, and does it very well.</p>
<p>I was trying to build a custom-rule to watch a specific application log-file. I had a reasonably simple regular expression and I was able to test it successfully using <a href="http://linux.die.net/man/1/fail2ban-regex">fail2ban-regex</a>. It matched the lines in the log file, and gave me a successful result</p>
<pre class="brush: plain; title: ; notranslate">
Success, the total number of match is 6
</pre>
<p>However, when running fail2ban, even though it loaded the configuration file correctly, and detected changes in the log files, fail2ban, erm, failed to ban&#8230; I couldn&#8217;t work out what was the problem. </p>
<p>As it turns-out, the timestamps on my log file was set to a different time-zone, so fail2ban treated those log entries as too old and did not take action. Make sure your timestamps are correct and on the same timezone as your system!! Once the timezone was set, fail2ban was working just fine.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingerlime.com/timing-is-everything/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>passwordless password manager</title>
		<link>http://blog.gingerlime.com/passwordless-password-manager/</link>
		<comments>http://blog.gingerlime.com/passwordless-password-manager/#comments</comments>
		<pubDate>Thu, 03 Mar 2011 17:44:59 +0000</pubDate>
		<dc:creator>Yoav Aner</dc:creator>
				<category><![CDATA[Security]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://blog.gingerlime.com/?p=274</guid>
		<description><![CDATA[[Also published on testuff.com] Most people I know tend to simply use the same password on ALL websites. Email, Paypal, Amazon, Ebay, Facebook, Twitter. This is obviously a very bad idea. Passwords are always a problem. Difficult to remember, hard &#8230; <a href="http://blog.gingerlime.com/passwordless-password-manager/">Continued</a>]]></description>
			<content:encoded><![CDATA[<p>[Also published on <a href="http://www.testuff.com/blog/2011/03/passwordless-password-manager/">testuff.com</a>]</p>
<p>Most people I know tend to simply use the same password on ALL websites. Email, Paypal, Amazon, Ebay, Facebook, Twitter. This is obviously a very bad idea. </p>
<p>Passwords are always a problem. Difficult to remember, hard to think of a good one when you need a new one, tricky to keep safe. For the moderately-paranoid and the sufficiently-techie there are many good solutions out there. Password managers. Online, offline, commercial, free. So I usually suggest to my friends and colleagues to use a password manager.<br />
<span id="more-274"></span><br />
I personally like to use <a href="http://clipperz.com/">clipperz</a> (online). I also used <a href="http://keepass.info/">keepass</a> (multi-platform). Both free and open source tools and do a good job.</p>
<p>However, I doubt many of my friends actually follow my advice. They&#8217;ll have to install something, or log on to somewhere JUST FOR THAT. It&#8217;s a little annoying to use and make every login complicated. It might not be available when they&#8217;re using the computer at work/friend&#8217;s house. So they end up doing the same thing and simply use one password. </p>
<p>So what&#8217;s the solution?? Well, lets refine the problem. The main concern for me is that IF I use the same password and it gets compromised. Even if it&#8217;s super-strong, ALL my &#8216;online assets&#8217; almost immediately get compromised too. By the time I log in to change the password on ALL those websites, it&#8217;s probably too late. That&#8217;s assuming I know it was compromised on one of those websites. So I tend to trust the security of Amazon and Paypal (not that they are 100% immune to attacks and leaks), but what about this <a href="http://www.hasbean.co.uk/">website I order coffee at</a> (great coffee and a great website, and I do not imply that their security is not good, it probably is as good as their coffee), or that other website I ordered some computer parts at 2 years ago&#8230; The thing is, it only takes ONE. And then if someone grabs this password, the first thing they&#8217;re going to try is logging in to paypal, amazon, ebay etc. </p>
<p>It got me thinking. What if I carried on using the same (super-strong) password, but instead of using my usual email, used a different one for each website??! you must be thinking now &#8220;How can I use a different email for each website?? Sign up with a gazillion hotmail accounts??&#8221; No, there&#8217;s a simpler way, but lets leave it aside for now, I&#8217;ll show you how to in a moment. So what are the benefits? Even if someone grabs my (super-strong) password from this one website, the email address won&#8217;t work on any other website. And they won&#8217;t be able to guess my other email accounts Because each email address is different and hard to guess!!</p>
<p>How do I get those gazillion email addresses without signing gazillion times for an email account then? Do you have a gmail account? Hotmail? Yahoo?! No??!! Well, you should probably get one of those (although others may allow the same thing). All these webmail accounts allow you to create <strong>aliases</strong>. (here&#8217;s a quick overview for <a href="http://mail.google.com/support/bin/answer.py?hl=en&#038;answer=12096">gmail</a>, <a href="http://windowsteamblog.com/windows_live/b/windowslive/archive/2011/02/03/hotmail-delivers-aliases-to-help-you-manage-and-secure-your-email-account.aspx">hotmail</a> and <a href="http://help.yahoo.com/tutorials/prof/prof/prof_identity3.html">yahoo</a>). Essentially it&#8217;s another email address that is linked to your main email. So instead of john.smith@gmail.com you can use john.smith+f9230382@gmail.com. Not the most friendly address, but virtually impossible to guess. All you need to do now is sign-up for an account using this alias. Don&#8217;t forget to create a new alias for every online account you create though! And make sure the alias is hard to guess. Just stick a bunch of random characters and digits at the end. the longer the better (size DOES matter).</p>
<p>The only remaining question is therefore &#8220;how do I know which alias I used for &lt;insert name of website&gt;?&#8221;. My suggestion is relatively simple. Keep it inside your email account. Keep a draft email, send an email to yourself, add a task/note or whatever you can use inside your online webmail account. The list would look something like this:</p>
<blockquote><p>
facebook &#8211; john.smith+kjdi23982ndsa@gmail.com<br />
ebay &#8211; john.smith+484jqcqwl2@gmail.com<br />
amazon &#8211; john.smith+hgqozcmn21kf@gmail.com<br />
&#8230;
</p></blockquote>
<p>It&#8217;s not super-secure, but:<br />
1. Only if your email account gets hacked they would see this list (and they need to know to look for it too)<br />
2. The list of aliases would NEVER include your super-secure password. This is something you still have to remember.<br />
3. If you make sure you use a <strong>different</strong> (super-secure) <strong>password</strong> for your email account, then even if your regular (super-secure) password gets compromised, they won&#8217;t be able to get into your email account and get this list.</p>
<p>So you end up with two super-secure passwords you have to remember, and a (not super secure) list of email aliases inside your email account. That&#8217;s the passwordless password-manager.</p>
<p>NOTE:<br />
* Be <strong>very careful</strong> of <strong>losing this list</strong>. Without it, you won&#8217;t be able to log in to your online accounts. So if you do plan on using this, make sure you have a few copies of this list elsewhere. A simple solution is every time you use an email alias, send yourself an email with a note about it. Try to use the same subject so you can find it later and send this email to more than one email if you have more than one. Then you can simply search through your email account(s). I suggest somthing like</p>
<blockquote><p>
From: john.smith@gmail.com<br />
To: john.smith@gmail.com, john.smith@hotmail.com<br />
Subject: new alias for &lt;website name&gt;</p>
<p>john.smith+fkfjdl93823@gmail.com
</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingerlime.com/passwordless-password-manager/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced
Content Delivery Network via Amazon Web Services: CloudFront: dyt9j4djd5di6.cloudfront.net

Served from: blog.gingerlime.com @ 2012-02-05 05:37:56 -->
