I’ve come across a small nuisance that seemed to appear occasionally with unicode urls. Some websites seem to encode/escape/quote urls as soon as they see any symbol (particularly % sign). They appear to assume it needs to be encoded, and convert any such character to its URL-Encoded form. For example, percent (%) symbol will convert to %25, ampersand (&) to %26 and so on.
This is not normally a problem, unless the URL is already encoded. Since all unicode-based urls use this encoding, they are more prone to these errors. What happens then is that a URL that looks like this:
will be encoded again to this:
So clicking on such a double-encoded link will unfortunately lead to a 404 page (don’t try it with the links above, because the workaround was already applied there).
This workaround is specific to wordpress 404.php, but can be applied quite easily in other frameworks like django, drupal, and maybe even using apache htaccess rule(?).
<?php /* detecting 'double-encoded' urls * if the request uri contain %25 (the urlncoded form of '%' symbol) * within the first few characeters, we try to decode the url and redirect */ $pos = strpos($_SERVER['REQUEST_URI'],'%25'); if ($pos!==false && $pos < 10) : header("Status: 301 Moved Permanently"); header("Location:" . urldecode($_SERVER['REQUEST_URI'])); else: get_header(); ?> <h2>Error 404 - Page Not Found</h2> <?php get_sidebar(); ?> <?php get_footer(); endif; ?>
This is placed only in the 404 page. It then grabs the request URI and checks if it contains the string ‘%25’ within the first 10 characters (you can modify the check to suit your needs). If it finds it, it redirects to a urldecoded version of the same page…