On my previous post, I described the architecture of Gimel – an A/B testing backend using AWS Lambda and redis HyperLogLog. One of the commenters suggested looking into Google BigQuery as a potential alternative backend.
It looked quite promising, with the potential of increasing result accuracy even further. HyperLogLog is pretty awesome, but trades space for accuracy. Google BigQuery offers a very affordable analytics data storage with an SQL query interface.
There was one more thing I wanted to look into and could also improve the redis backend – batching writes. The current gimel architecture writes every event directly to redis. Whilst redis itself is fast and offers low latency, the AWS Lambda architecture means we might have lots of active simultaneous connections to redis. As another commenter noted, this can become a bottleneck, particularly on lower-end redis hosting plans. In addition, any other backend that does not offer low-latency writes could benefit from batching. Even before trying out BigQuery, I knew I’d be looking at much higher latency and needed to queue and batch writes.
tip-toeing on the shoulders of giants
Before I dive into the reasons for writing Gimel in the first place, I’d like to cover what it’s based on. Clearly, 100 lines of code won’t get you that far on their own. There are two (or three) essential components this backend is running on, which makes it scalable and also light-weight in terms of actual code:
- AWS Lambda (and Amazon API Gateway) – handle the requests to both store experiment data and to return the experiment results.
- Redis – using Sets and HyperLogLog data structures to store the experiment data. It provides an extremely efficient memory footprint and great performance.
I haven’t noticed it much before, but it’s becoming a pet peeve once I started paying attention to it.
We LOVE homepages. Like eyes being the key to our souls, our homepage shows who we really are. What we stand for. They turn random visitors to loyal customers. They inspire trust, build an emotional connection, they bind us together… ok ok. You got the picture. Homepages are great.
But once I’m sold. I’m in. I gave you my email. I’m a loyal customer. I go to your site every. single. day. Do I really need to see your homepage again??! Do I actually care that you changed the photo on the frontpage and highlighted another benefit to potential customers? Or most important – do I really have to click the ‘Login’, ‘Go to my app’, ‘Dashboard’ or whatever other link you give me to get started?
This is the final post on this series. I started by covering the method for A/B testing coffee, as well as the motivation and approach. I later wrote about the first test session using Hario V60, comparing those beans by making Espresso and the last post described two preparation methods Aeropress and Cappucino.
I repeated a similar process using various combinations of
E coffee beans. This post will be more brief, with the “results” based on my personal preferences and how I ended up scoring all 5 types of beans.
On previous posts I covered the method for A/B testing coffee, as well as the motivation and approach. I later wrote about the first test session using Hario V60. The last post was comparing those beans by making Espresso.
This post will cover two tasting sessions of the same mysterious
B beans: Aeropress and Cappuccino.
On my previous post, I covered the first blind A/B tasting session using the “Gingerlime Tasting Technique” ™. You can read some more background about the motivation and method, as well as a full list of coffees I’m comparing on the first post in the series.
After the first taste using pour-over Hario V60 filter, I was anxious to find out whether both
B coffees will show similar characteristics using other preparation methods. Namely: Espresso, Aeroproess and Cappuccino. Would
B stay my favourite when served with milk? Would the Aeropress extract different flavours out of
A than I managed with the Hario?
This is the second post in a series, exploring the “Gingerlime Tasting Technique” ™. You can read some background on the previous post, where I explain the motivation, testing method and how I started exploring A/B testing for coffee. Different tasting sessions comparing two types of beans and trying to choose the best out of the two.
A taste test
The first tasting was between coffee A and B (still unknown to me at this point in time). The test was actually a series of 4 different tasting sessions. Each session used a different method of making coffee: Hario V60 filter, Espresso, Aeropress and a Cappuccino.
I do quite a bit of A/B testing and find it to be a great tool for experimenting and ultimately improving things.
But what’s “Coffee A/B testing”?
The idea came to me when I was visiting my wife’s family in Japan. We went to a restaurant and my father and brother in-law ordered two types of Sake. They let me taste both and decide which one I liked the most. It was a simple task, but an interesting one. The tastes were subtly different, but enough that I could clearly pick my personal favourite.
It then occurred to me that as much as I love coffee, and tend to pick some beans over others, I don’t quite know what makes me like a certain type, or what it is that I’m looking for for my “ultimate” coffee.
What if I could A/B test coffees? Try two types of beans (or blends), and pick the one I like. Then repeating the process I could gradually find the one I like the most. And in doing that, I can also figure out what it is that I like, and pay more attention to the difference. I rarely compare coffees. Well, not any more!
It’s not all about the technology. Stripe does one thing that makes it light-years better than its competition: Time to market. Or in simpler terms, its activation process to allow you to receive actual payments.
My wife and I run a small website selling vintage items from Germany to Japan. So far, my wife was asking all her customers to pay via bank-transfer. This is naturally time-consuming and for most of her customers, Japanese housewives and arts & crafts lovers, inconvenient. A few months ago I suggested to her to introduce credit-card payments on her website. How difficult could this be to implement?
My wife and I recently had a baby. Amongst the toys and cloths we received as gifts, there were a few CDs and DVDs with music for the little one. We then realised that we no longer have a CD or DVD drive in our computers. So we bought an external USB DVD/CD. When playing the DVDs, the region-selection menu appeared. I nearly forgot about it. Oh, the good ol’ copy-protection of the 90’s. So I chalked it up as one of those oddities of life, and thought how silly it seems today in the Internet age and all that. My wife is japanese, and I’m Israeli. And we live in Berlin. Naturally each side of the family wanted to send us Music in their own language, so there you go.
Only a few days later, my wife asked for my help with her Nexus 7. She bought a few eBooks from a Japanese site. Those work fine on her iPhone and Mac. But somehow the Play store won’t install the app (never mind the question why someone needs a bespoke app to read books).
“This item is not available in your country”.
This time I was determined to work around this.
Here’s a quick howto which does not require a rooted android.
UPDATE: AWS recently introduced SSL Health checks. So the method in this post should no longer be necessary.
Amazon Route53 offers a DNS healthcheck that allows you to failover to another host / region if one IP is not responsive. This works great if you want to create a secondary site, or even a simple maintenance page to give your users a little more info than just an empty browser window.
There are some limitations to the healthchecks currently. Route53 allows you to choose between TCP and HTTP. However, there’s no HTTPS / SSL support for URLs.
So what can you do if your site is running only with SSL?
Just a quick&dirty guide on setting up SSL tunnelling in your development environment. This is written for Rails, but can be easily used for Django, Node, or any other web development.
Why SSL in development?
There’s no important reason to use SSL for development, but some times, you just seem to have to. I was trying to build an integration with helpscout, using their dynamic custom app. For some reason, helpscout forces you to use SSL for the external URL. Even for development. I won’t go into details why I think it’s unnecessary, but rather focus on how to set it up. After all, it might be something else that requires SSL within development, so here’s one quick way to do so.
I spend a lot of time working with monitoring solutions, and like to measure and track things. The information we collect from our apps tells us a lot about what’s going on. Who’s using it. How frequently they access it. Where they are from. How much time they spend accessing the app etc. And then there’s a lot we can do as app owners with this data. We can measure it, trend it, slice and dice and produce nice reports. We can also action on this info. Offer people stuff based on their behaviour. Use those ‘lifecycle’ emails to improve conversion. Increase our sales. Bring people back to using our products.
I’m getting used to those supposedly-personal email from Matt, the founder of Widgets inc. who’s “just checking if I need any help using the product”, or Stuart from Rackspace who has “only one question”. I know it’s automated, but it’s fine. As long as I can hit reply and actually reach a person, that’s ok with me. I pretend to not notice.
However, I’m feeling recently that some of those emails get a little creepy. A couple of random examples:
Just a quick link to my recent talk at Ruby User Group Berlin
Slides are available on github
“Russian doll Caching” gained some popularity recently, I suspect in part due to its catchy (or cachie?) name and how easy it is to visualize the concept. Rails 4 should have this improved caching available by default. With Rails 3 you need to install the cache-digests gem. It’s pretty easy to get started with it, and the documentation is clear. It makes a lot of sense to start using it in your Rails app. I won’t attempt to cover the basics and will assume you are already familiar with it. I want to talk about a specific aspect of fragment caching surrounding the generation of the cache keys.
If a request using django tastypie is not authorized, please make sure to
raise Unauthorized() exception in your
_detail authorization methods in Tastypie v0.9.12.
The longer version
On one of my previous posts I wrote at length about django-tastypie authorization and gave some tips and tricks on how to work more flexibly and securely with this framework. A lot has happened since, and it was hard to keep track of all the various changes and updates to Tastypie.
Since version 0.9.12, the authorization mechanisms in tastypie changed rather radically, and that’s a very good improvement. It plugged some holes with nested resources and authorization, and made authorization decisions more granular. From a simple
apply_limits, now each operation can be authorized, broken down to CRUD elements (
delete). Each element is authorized for
_detail operations (I’ll try to cover this in more depth on a follow-up post at some stage).
For now, I just wanted to highlight an important pitfall you might want to avoid when using the new tastypie authorization that could leave you exposed. There’s a fix in the pipeline very soon, but until then, you should protect yourself by making a small change to your authorization methods, and the
_detail ones in particular
The crux of the issue is that the
_detail authorization methods should make a binary decision – is this authorized? (yes/no). If the method returns
True, or does nothing, the request is authorized. If the method returns
False or raises an
Unauthorized exception, the request should be blocked.
The glitch is that if your authorization
_detail functions return False, the request still goes through and is effectively authorized. Until the fix is in place, please make sure to
raise Unauthorized() exception if you’re using Tastypie v0.9.12.
I’ve had a strange conversation with my wife this morning.
She told me that google reader is closing down.
She’s using it much more than I do. So I said to her something like “I’m sure you can install some other RSS reader software to replace Google”.
Her response was a bit of a surprise for me: “Software?! eugh!”.
Then I said “Ok then, or an app”, and she seemed rather pleased.
How did software become such a dirty word?!
I love Graphite. It’s the most robust, flexible, kick-ass monitoring tool out there. But when I say
monitoring, I’m actually not describing what graphite really does. In fact, it does almost anything but monitoring. It collects metrics via carbon, it stores them using whisper, and it provides a front-end (both API and web-based), via graphite-web. It does not however monitor anything, and certainly does not alert when certain things happen (or fail to happen).
So graphite is great for collecting, viewing and analyzing data, particularly with the multitude of dashboard front-ends, my favourite being giraffe ;-). But what can you do when you want to get an email or a text message when, say, carbon throws some errors, or your web server starts to bleed with 500’s like there’s no tomorrow? Even better – do you want to get an email when your conversion signup rates drops below a certain mark??
So what can you use if you want to monitor stuff using graphite? And what kind of stuff can you monitor? I’ve come across a really great approach using nagios. In fact, I ‘borrowed’ the method the author was using for alerting on 500 errors for my own approach. So I wanted to do something very similar, but I really didn’t want nagios. It’s an overkill for me, if all I want is to get an email (or run a script) when something goes wrong.
I’ve recently bumped into an interesting post about a stackoverflow vulnerability discovered by Anthony Ferrara. I didn’t think too much about it. I’ve come across similar issues before, where the application relies on a piece of information that might be easy to forge. Telephony systems are vulnerable to Caller ID spoofing, which becomes increasingly easier with Voice-Over-IP providers. Web based applications can also be fooled if they rely on header information, such as the X-Forwarded-For, typically used by Proxy servers.
I was experimenting with switching rails from Phusion Passenger to Unicorn, when I suddenly came across a strange error message:
ActionDispatch::RemoteIp::IpSpoofAttackError (IP spoofing attack?!HTTP_CLIENT_IP="192.168.0.131"HTTP_X_FORWARDED_FOR="192.168.0.131"): app/controllers/application_controller.rb:138:in `append_info_to_payload'
That looked quite impressive. Rails is trying to identify spoofing attacks and raise an exception when it happens? Nice.
However, after digging a little deeper, trying to figure out what’s actually happening, it seems that Rails might actually be vulnerable to spoofing attacks under certain setups. I will try to describe those scenarios and suggest a few workarounds to avoid any pitfalls.
What I observed applies to Rails latest stable (3.2.9 at the time of writing), previous versions and potentially future versions as well (including 4.0).
Your rails application might be vulnerable to IP spoofing. To test it, try to add a fake
X-Forwarded-For header and check which IP address appears in your log files.
curl -H "X-Forwarded-For: 184.108.40.206" http://your.website.com
You can try to implement one of the workarounds mentioned below.
Just a quick rant this time.
I recently signed-up for pinterest. I wasn’t actually interested in signing-up, but wanted to see what their sign-up process looks like. If you’ve read one of my previous posts, you’d know I nearly always use unique, unpredictable email addresses for new services I sign-up to. Pinterest registration is quite nice, and only asks for a few details and an email address (that is, if you prefer a username and password instead of using Facebook or Twitter to login). Once you enter the details, pinterest sends you a Please verify your email message to your inbox. So far, so good.
However, what happens if you don’t verify your email? As was the case here. I wasn’t actually interested in creating an account. I assumed that I won’t hear from pinterest again. Wrong. I just received an email from pinterest, announcing their new secret boards. So much for confirming my account. According to Spamhaus, this is considered unconfirmed opt-in which is categorized as spam.
To add insult to injury, if I try to opt-out from the email I just received, Pinterest asks me to login to my (unconfirmed) account. These are all small annoyances, I know. But is it really that difficult to do things right? An unconfirmed account should not receive any messages. Opt-out links should just be one click and that’s it.
I’ve written about installing and using Graphite and it’s a really great tool for measuring lots of kinds of metrics. Most of the guides online don’t touch on the security aspects of this setup, and there was at least one thing that I thought should be worth writing about.
How are we measuring
Metrics we gather from our applications have the current characteristics / requirements:
- We want to gather lots of data over time.
- Any single data-point isn’t significant on its own. Only in aggregate.
- Measuring is important, but not if it slows down our application in any way.
For those who followed my previous post, I thought I should post a quick update.
I was naturally quite surprised to be contacted rather quickly by Rackspace shortly after posting. This was a nice surprise, and the contact afterwards were somehow more understanding. At least I could sense they are feeling sorry for my situation.
As expected, there was no way to recover the lost image. I received a follow-up message on the original ticket confirming this quite clearly. They then rather swiftly changed the tone into legal-speak and referred me to their terms of service, which I quote here for the benefit of the world at large.
One of the greatest promises of cloud computing is resilliency. Store your data ‘in the cloud’ and access it from anywhere, enjoy high durability and speed. You know the marketing spiel already. A recent incident reminded me the importance of backups. In fact, the importance of backups of backups. Sounds strange? of course. This is the tale of a missing server image.
Coming from Django, I was a little surprised/disappointed that permissions aren’t very tightly integrated with the Rails ActiveAdmin as they are with the django admin. Luckily, my search for better authorization for ActiveAdmin has led me to this very informative post by Chad Boyd. It makes things much easier so we can authorize resources more flexibly.
However, there were a couple of aspects that I still wasn’t 100% happy with:
- When an unauthorized action is attempted, the user is simply redirected with an error message. I personally like to return a 403 response / page. Yes, I’m nitpicking. I know.
- Default actions like Edit, View and Delete still appear. They are not dependent on the permission the user has. Clicking on those won’t actually allow you to do anything, but why have some option on the screen if they are not actually allowed??
So with my rather poor Ruby/Rails skill, and together with my much more experienced colleague, we’ve made a few tweaks to the proposal on Chad’s post to make it happen.
It’s always nice to be able to get some feedback, or for users to make a contact via a simple Contact form. However, it didn’t take too long before spammers started hitting those forms too. It was quite interesting to see the kind of messages we started receiving. In a way, most of those submissions were more like stories, or snippets from an email to a friend. They didn’t have any of those very much expected keywords for fake watches or erectile dysfunction enhancers. Many didn’t even have any links either. So what were these messages then? My personal guess was that these were some kind of a reconnaissance attempts. The bots were sending innocent messages first to various online forms. Then I imagine they will crawl the site more, trying to see if those submissions appear elsewhere. If/when they do, they will hit those forms hard with the real spam content. In any case, these were all speculations that I didn’t really care to prove right or wrong. I just wanted to get rid of this junk. Fast.
A recent comment by Martyn on my cloud performance shoot-out post prompted me to do another round of testing. As the bootstrap process I described on the last post evolved, it’s always a good idea to test it anyway, so why not kill two birds with one stone? The comment suggested that the Amazon EC2
micro instance is CPU throttled, and that after a long period (in computer terms: about 20 seconds according to the comment), you could lose up to 99% of CPU power. Whereas on a
small instance, this shouldn’t happen. So is EC2
small going to perform way-better than the
micro instance? How is it going to perform against Linode or Rackspace equivalent VPS?
This post starts as a rant about webfaction, but somehow turns into a rave. I recently discovered (the hard way) that I can failover almost any site to a secondary host in a different data centre, all with a few scripts on a webfaction shared hosting account.
fabric-graphite is a fabric script to install Graphite, Nginx, uwsgi and all dependencies on a debian-based host.
I was reading a few interesting posts about graphite. When I tried to install it however, I couldn’t find anything that really covered all the steps. Some covered it well for Apache, others covered Nginx, but had steps missing or assumed the reader knows about them etc.
I’m a big fan of fabric, and try to do all deployments and installations using it. This way I can re-run the process, and also better document what needs to be done. So instead of writing another guide, I created this fabric script.
One of my primary aims when building a resillient cloud architecture, is being able to spawn instances quickly. Many cloud providers give you tools to create images or snapshots of existing cloud instances and launch them. This is great, but not particularly portable. If I have one instance on Linode and I want to clone it to Rackspace, I can’t easily do that.
That’s one of the reasons I am creating bootstrap scripts that completely automate a server (re)build process. Given an IP address and root password, the script should connect to the instance, install all necessary packages, pull the code from the repository, initialize the database, configure the web server and get the server ready for restore of user-data.
I’m primarily using fabric for automating this process, and use a standard operating system across different cloud providers. This allows a fairly consistent deployments across different providers. This also means the architecture is not dependent on a single provider, which in my opinion gives a huge benefit. Not only can my architecture run on different data centres or geographic locations, but I can also be flxeible in the choice of hosting providers.
All that aside however, building and refining this bootstrapping process allowed me to run it across different cloud providers, namely: Rackspace, Linode, and EC2. Whilst running the bootrstrapping process many times, I thought it might be a great opportunity to compare performance of those providers side-by-side. My bootstrap process runs the same commands in order, and covers quite a variety of operations. This should give an interesting indication on how each of the cloud providers performs.
One of the best rules of thumb I know is the 80/20 rule. I can’t think of a more practical rule in almost any situation. Combined with the law of diminishing returns, it pretty much sums up how the universe works. One case-study that hopes to illustrate both of these, if only a little, is a short experiment in optimization I carried out recently. I was reading so many posts about optimizing wordpress using nginx, varnish, W3-Total-Cache and php-fpm. The results on some of them were staggering in terms of improvements, and I was inspired to try to come up with a similar setup that will really push the boundary of how fast I can serve a wordpress site.
Spoiler – Conclusion
So I know there isn’t such a thing as too much cash, but does the same apply to cache?
It’s always nice to discover a new tool or service that does things differently. Even if just a little. I remember when someone first told me about hipmunk. Just when I thought all flight search websites are pretty much the same, here’s one example of something different.
Perhaps this wasn’t as obviously different as hipmunk is, but one of the tools I came across recently within the security testing world is Arachni. A number of things made it stand out a little. First of all, it is written in Ruby. That already sparked some curiosity. I’m not entirely sure why, but I guess I’m naturally more interested in programs and tools in Ruby and Python. The next thing that was evidently different from other web scanners was the fact that Arachni seems to be very pluggable and interface-able. Arachni appears to be geared towards interfacing with external scripts or programs though an API. One of its core features is its distributed architecture, allowing to launch many modules independently and control them programmatically.
After playing around with it, I came across some issues and couldn’t make it work as I expected. Most of them out of my own lack of knowledge or being lazy reading through the extensive documentation. Luckily, it didn’t take more than a few minutes after posting a question on github, that I received a response from Arachni’s creator, Tasos Laskos, aka Zapotek.
After chatting with Tasos a few times via email, I became even more intrigued about him and the project. I then decided it would be interesting to interview him for my blog. I have no experience interviewing people, but what the heck.
Tasos accepted my invitation for an interview, with the condition that it must be a text-based interview. So this interview was carried out via email alone. I personally suspect his voice is funny, but he (obviously) denied it :)
Tasos is certainly not an ordinary person. It becomes apparent when you read his blog, or even the documentation for Arachni. As you could see from the interview, Tasos appears to have very strong and clear opinions. He doesn’t mince his words, and very directly expresses what he thinks. Nevertheless, Tasos and Arachni seem to be doing something a little different, and there’s definitely more to wait for.
If you’re updating your debian stable (squeeze) and using Lighttpd as your web server, you might come across a security notice on how to fix your lighttpd against the BEAST attack.
lighttpd (1.4.28-2+squeeze1) stable-security; urgency=high
This releases includes an option to force Lighttpd to honor the cipher order
in ssl.cipher-list. This mitigates the effects of a SSL CBC attack commonly
referred to as "BEAST attack". See  and CVE-2011-3389 for more details.
To minimze the risk of this attack it is recommended either to disable all CBC
ciphers (beware: this will break older clients), or pursue clients to use safe
ciphers where possible at least. To do so, set
ssl.ciphers = "ECDHE-RSA-AES256-SHA384:AES256-SHA256:RC4:HIGH:!MD5:!aNULL:!EDH:!AESGCM"
ssl.honor-cipher-order = "enable"
in your /etc/lighttpd/conf-available/10-ssl.conf file or on any SSL enabled
host you configured. If you did not change this file previously, this upgrade
will update it automatically.
There’s a mistake on this note however. Instead of
Please note that since Tastypie v0.9.12 the authorization framework was rewritten. Lots of information on this post no longer applies. I’m hoping to write a follow-up post at some stage.
I’ve been using tastypie, the very awesome django REST API framework for a little while now (btw, that’s not the official title, but it might as well be). I’m not going to write yet another comparison between tastypie and django-piston. My reasons for choosing tastypie were that its code looked nicer, and it seemed a much more active project.
One of the things that I immediately liked about tastypie, being a security-geek and all, was the security framework built into it. Primarily the authentication and authorization classes. They make it very easy to extend, and almost a no-brainer to apply to any resource. This means that providing resource-level authorization is also very easy and clean.
However, whilst working with tastypie and applying some authorization rules to my resources, I noticed a couple of pitfalls. Those are quite easy to miss if you’re not very familiar with the tastypie codebase. I wouldn’t say it’s a vulnerability or a bug as such, perhaps more of a (sub-optimal) design choice from a security-perspective. That said, if you use tastypie incorrectly, or unaware of those pitfalls, you might create a security vulnerability on your otherwise delicious API.
When talking about security, the first thing that usually comes to mind is encryption. Spies secretly coding (or de-coding) some secret message that should not be revealed to the enemy. Encryption is this mysterious thing that turns all text into a part of the matrix. Developers generally like encryption. It’s kinda cool. You pass stuff into a function, get some completely scrambled output. Nobody can tell what’s in there. You pass it back through another function – the text is clear again. Magic.
Encryption is cool. It is fundamental to doing lots of things on the Internet. How could you pay with your credit card on Amazon without encryption? How can you check your bank balance? How can MI5 pass their secret messages without Al-Qaida intercepting it?
But encryption is actually not as useful as people think. It is often used in the wrong place. It can easily give a false sense of security. Why? People forget that encryption, by itself, is usually not sufficient. You cannot read the encrypted data. But nothing stops you from changing it. In many cases, it is very easy to change encrypted data, without knowledge of the encryption key. (more…)
Scoring a goal against google is never easy. Google analytics allows you to do some strange and wonderful things, but not without some teeth grinding. I was struggling with this for a little while, and it was a great source of frustration, since there’s hardly any info out there about it. Or maybe there is lots of info, but no solution to this particular problem. I think I finally nailed it.
Dynamic Goal Conversion Values
I was trying to get some dynamic goal conversion values into Analytics. I ended up reading about Ecommerce tracking and it seemed like the way to go. Not only would I be able to pick the goal conversion value dynamically, it gives you a breakdown of each and every transaction. Very nice. After implementing it, I was quite impressed to see each transaction, product, sku etc appear neatly on the ecommerce reports. So far so good. But somehow, goals – which were set on the very same page as the ecommerce tracking code – failed to add the transaction value. The goals were tracked just fine, I could see them adding up, but not the goal value. grrrr…
I’ve come across a small nuisance that seemed to appear occasionally with unicode urls. Some websites seem to encode/escape/quote urls as soon as they see any symbol (particularly % sign). They appear to assume it needs to be encoded, and convert any such character to its URL-Encoded form. For example, percent (%) symbol will convert to %25, ampersand (&) to %26 and so on.
This is not normally a problem, unless the URL is already encoded. Since all unicode-based urls use this encoding, they are more prone to these errors. What happens then is that a URL that looks like this:
will be encoded again to this:
So clicking on such a double-encoded link will unfortunately lead to a 404 page (don’t try it with the links above, because the workaround was already applied there).
This workaround is specific to wordpress 404.php, but can be applied quite easily in other frameworks like django, drupal, and maybe even using apache htaccess rule(?).
/* detecting 'double-encoded' urls
* if the request uri contain %25 (the urlncoded form of '%' symbol)
* within the first few characeters, we try to decode the url and redirect
$pos = strpos($_SERVER['REQUEST_URI'],'%25');
if ($pos!==false && $pos < 10) :
header("Status: 301 Moved Permanently");
header("Location:" . urldecode($_SERVER['REQUEST_URI']));
<h2>Error 404 - Page Not Found</h2>
<?php get_sidebar(); ?>
This is placed only in the 404 page. It then grabs the request URI and checks if it contains the string ‘%25’ within the first 10 characters (you can modify the check to suit your needs). If it finds it, it redirects to a urldecoded version of the same page…
On my previous post I talked about django memory management, the little-known maxrequests parameter in particular, and how it can help ‘pop’ some balloons, i.e. kill and restart some django processes in order to release some memory. On this post I’m going to cover some of the things to do or avoid in order to keep memory usage low from within your code. In addition, I am going to show at least one method to monitor (and act automatically!) when memory usage shoots through the roof.
A while ago I was working on optimizing memory use for some django instances. During that process, I managed to better understand memory management within django, and thought it would be nice to share some of those insights. This is by no means a definitive guide. It’s likely to have some mistakes, but I think it helped me grasp the configuration options better, and allowed easier optimization.
Does django leak memory?
In actual fact, No. It doesn’t. The title is therefore misleading. I know. However, if you’re not careful, your memory usage or configuration can easily lead to exhausting all memory and crashing django. So whilst django itself doesn’t leak memory, the end result is very similar.
Memory management in Django – with (bad) illustrations
Lets start with the basics. Lets look at a django process. A django process is a basic unit that handles requests from users. We have several of those on the server, to allow handling more than one request at the time. Each process however handles one request at any given time.
But lets look at just one.
cute, isn’t it? it’s a little like a balloon actually (and balloons are generally cute). The balloon has a certain initial size to allow the process to do all the stuff it needs to. Lets say this is balloon size 1.
About a month ago I posted about tweaking timthumb to work with CDN. Timthumb is a great script, but great scripts also have bugs. A recently discovered one is a rather serious bug. It can allow attackers to inject arbitrary php code onto your site, and from there onwards, pretty much take control over it.
Luckily no websites I know or maintain were affected, possibly since the htaccess change I used shouldn’t allow using remote URLs in the first place (and also it renamed timthumb.php from the url string, making it slightly obfuscated). I still very strongly advise anybody using timthumb to upgrade to the latest version to avoid risks.
Following from my previous post, I’ve come across another issue related to caching in wordpress: dynamic content. There’s a constant trade-off between caching and dynamic content. If you want your content to be truly dynamic, you can’t cache it properly. If you cache the whole page, it won’t show the latest update. W3 Total Cache, WP Super Cache and others have some workarounds for this. For example, W3TC has something called fragment caching. So if you have a widget that displays dynamic content, you can use fragment caching to prevent caching. However, from what I worked out, all it does is essentially prevent the page with the fragment from being fully cached, which defeats the purpose of caching (especially if this widget is on the sidebar of all pages).
The thing is, I wanted to take it a step further. If I can do it by following this manual process, why can’t I use a plugin that, erm, ‘ajaxizes’ other plugins?? I tried to search for solutions, but found none. So I decided to write one myself. It’s my first ‘proper’ plugin, but I think it works pretty well. (more…)
[IMPORTANT: please check that you have the latest version of timthumb! older versions might have a serious security vulnerability. A little more about it here]
I’ve been recently trying to optimize a wordpress based site. It was running fine, but I wanted to run it even faster, and make the best use of resources. So I ended up picking W3 Total Cache (W3TC). It’s very robust and highly configurable, if perhaps a bit complicated to fully figure out. So eventually things were running fine, and my next task was to boost it even further by using a Content Delivery Network (CDN). In this case, the choice was Amazon Cloudfront. The recent release allowed managing custom origin from the console, which made things even easier. One of the remaining issues however, was trying to optimize timthumb.
Timthumb was already included with the theme, and I liked the way it works. It allowed some neat features, like fitting screenshots nicely, and also fitting company logos well within a fixed size (with zc=2 option). Google search has led me to a couple of sources. However, for some reason none of them worked, so I ended using a slightly different solution… (more…)
A quick-tip on the importance of timestamps and making sure your time zone is set correctly.
I was recently playing around with fail2ban. It’s a really cool little tool that monitors your log files, matches certain patterns, and can act on it. Fail2ban would typically monitor your authentication log file, and if for example it spots 5 or more consecutive failures, it would simply add a filter to your iptables to block this IP address for a certain amount of time. I like fail2ban because it’s simple and effective. It does not try to be too sophisticated, or have too many features. It does one thing, and does it very well.
I was trying to build a custom-rule to watch a specific application log-file. I had a reasonably simple regular expression and I was able to test it successfully using fail2ban-regex. It matched the lines in the log file, and gave me a successful result
Success, the total number of match is 6
However, when running fail2ban, even though it loaded the configuration file correctly, and detected changes in the log files, fail2ban, erm, failed to ban… I couldn’t work out what was the problem.
As it turns-out, the timestamps on my log file was set to a different time-zone, so fail2ban treated those log entries as too old and did not take action. Make sure your timestamps are correct and on the same timezone as your system!! Once the timezone was set, fail2ban was working just fine.
[Also published on testuff.com]
Most people I know tend to simply use the same password on ALL websites. Email, Paypal, Amazon, Ebay, Facebook, Twitter. This is obviously a very bad idea.
Passwords are always a problem. Difficult to remember, hard to think of a good one when you need a new one, tricky to keep safe. For the moderately-paranoid and the sufficiently-techie there are many good solutions out there. Password managers. Online, offline, commercial, free. So I usually suggest to my friends and colleagues to use a password manager.
This saturday, 8th January 2011 I’m running a small geeky arts project at Madame Lillie’s gallry in Stoke Newington.
SMILE – a temporary exhibition
The smile project attempts to capture snapshots within the exhibition space. The audience takes an active role as part of thework and passively or actively affects it. The exhibition space is a number of webcams, each captures still-image snapshots at random. Some cameras are hidden, whilst others are visible. Those snapshots are then randomly layed-out and printed onto a photographs every few minutes. The audience is invited to take those snapshots home, as a souvenir and a piece of the artwork. Each snapshot is unique and cannot be reproduced. The images are deleted immediately after being processed and printed out.
Influenced by thoughts about the London surveillance network, the smile project looks at the proliferation of cameras that capture parts of our lives, and the knowledge that we all, willingly or unknowingly appear in images captured by others. With the advances in technology it is becoming increasingly easy to take photos and videos. It is also cheap and easy to keep those on file for a long period of time, perhaps indefinitely. Photos and videos that we take these days are instant and perishable: they appear briefly on our facebook page and get immediate attention until quickly replaced by others. Yet at the same time we cannot truly delete them. Once posted online, they are beyond our control. They are stored electronically, archived and backed-up. They are searchable and indexed. Whether we are the subjects of the images or those who create them, we have little control over them.
smile is attempting to both make use of and question the technology that dominates our modern lives. It is meant to be a fun and humorous experience, involving the audience and rewarding it. It uses digital imaging technology, but produces a tangible, unique output. The creation process involves programming in various scripting languages, using a mix of digital tools, primarily open-source, all form a part of a random montage.
I was really pleased when my good friend chris asked me to help him with his edition of 1 project. I guess it was exciting working on an arts project. I also liked his project because randomness is an interesting concept, particularly when it comes to computers. Put very simply: computers have trouble picking stuff at random.
One-Time-Passwords always fascinated me. Long long time ago in a land far far away I suddenly had this idea. The idea was simple and in today’s terms pretty common, perhaps trivial. One-Time-Password without the need for an extra token. After the user keys in their username and password, they get sent a random password via SMS. Ten years ago there wasn’t anything that did that. I created a basic RADIUS implementation with support for different SMS gateways, all in Java. Sadly however, with no funding, no clue how to turn it into a business, and just finishing my computer science degree, it had to be abandoned for an easier day job.
I was recently pulled into looking at two-factor-authentication (2FA) solutions. I used SecurID at a previous job, and know of several solutions in this area. I was quite pleased to discover many soft-token solutions working on mobile phones (iphone, blackberry, HTC, Nokia) and USB-based ideas like Yubikey. I was even more pleased to discover open source initiatives in this area, and OATH HOTP in particular.
I recently noticed my iphone clock wasn’t accurate. I’m not exactly sure why. It was only a few minutes behind, but it still annoyed me. Why couldn’t my iphone sync its time with an internet time server?? I know it is supposed to sync with my mobile network operator, but I think mine doesn’t sync… Maybe it’s my operator?
For jailbroken iphones, there’s a neat app on cydia called NTPDate. It’s a great app and I recommend installing it. All you need is specify the ntp server, and click ‘set’ and it will sync your clock for you. However, I wanted to go a step further. I wanted my iphone to sync itself automatically for me, using a cron job. Well, not quite using cron, but it can be done automatically.
On my last post I described how I get my asterisk box to know the caller name from a csv data file. The thing is, my address book keeps changing on my iphone. People change their phone numbers, I meet new people (can you believe it? I don’t let it happen too often though)… I wanted to be able to sync it automatically to my asterisk. This synchronisation also doubles up as a backup for my address book.
Caller ID is a wonderful feature. Don’t we love getting a call from someone we like, and perhaps more importantly, ignore those annoying callers who we really don’t want to talk to.
But this is yesterday’s news. We all have caller IDs. It just works. Well, yes. It does. But what if we get a call on our landline? We get the caller ID there too, but do we know who it is?? All our contacts are on our mobile phones. Standard phones don’t usually have the capacity to hold more than 10 names on average. And even if they did. Who’s got the energy to key in those numbers?