unicode url double-encoding 404 redirect trick

I’ve come across a small nuisance that seemed to appear occasionally with unicode urls. Some websites seem to encode/escape/quote urls as soon as they see any symbol (particularly % sign). They appear to assume it needs to be encoded, and convert any such character to its URL-Encoded form. For example, percent (%) symbol will convert to %25, ampersand (&) to %26 and so on.

This is not normally a problem, unless the URL is already encoded. Since all unicode-based urls use this encoding, they are more prone to these errors. What happens then is that a URL that looks like this:
http://www.frau-vintage.com/2011/%E3%81%95%E3%81%8F%E3%82%89 …

will be encoded again to this:
http://www.frau-vintage.com/2011/%25E3%2581%2595%25E3%25 …

So clicking on such a double-encoded link will unfortunately lead to a 404 page (don’t try it with the links above, because the workaround was already applied there).

A workaround

This workaround is specific to wordpress 404.php, but can be applied quite easily in other frameworks like django, drupal, and maybe even using apache htaccess rule(?).

/* detecting 'double-encoded' urls
 *  if the request uri contain %25 (the urlncoded form of '%' symbol)
 *  within the first few characeters, we try to decode the url and redirect
$pos = strpos($_SERVER&#91;'REQUEST_URI'&#93;,'%25');
if ($pos!==false && $pos < 10) :
    header("Status: 301 Moved Permanently");
    header("Location:" . urldecode($_SERVER&#91;'REQUEST_URI'&#93;)); 
    get_header(); ?>
    <h2>Error 404 - Page Not Found</h2>
    <?php get_sidebar(); ?>
    <?php get_footer(); 
endif; ?>

This is placed only in the 404 page. It then grabs the request URI and checks if it contains the string ‘%25’ within the first 10 characters (you can modify the check to suit your needs). If it finds it, it redirects to a urldecoded version of the same page…

2 Responses to “unicode url double-encoding 404 redirect trick”

  1. Lu

    Unfortunately didn’t work for me, still get 404…

Leave a Reply