URL Correction

I’m a big fan of mod_rewrite, Apache’s URL Rewriting Engine. It allows us to hide the inner workings of our websites, to present a page’s URL as an extra description in SERPs, and just looks more professional. When I first used it, the great advantage was the Seach Engine bots found it much easier to navigate a website without falling over several ampersands in our URL’s. Now that advantage has mostly been overtaken by the ability of the bots.

A lesson I learned shortly after implementing mod_rewrite in one of our sites was the danger of relying solely on mod_rewrite to interpret your URL’s. Herein lies the lesson…

Starting with a simple php page with a single variable:

http://www.domain.com/page.php?var=1

Use mod_rewrite to create nice friendly descriptive URL’s, something along these lines:

http://www.domain.com/CityName-HotelName-1.html

Just the way you want it to work, what could go wrong?

After a few weeks the pages got indexed, and all seemed well with the search engine world. A little while later, some odd URL’s started appearing in the SE’s, something like below.

http://www.domain.com/cityname-hotelname-1.html

http://www.domain.com/AnotherCityName-HotelName-1.html

http://www.domain.com/-HotelName-1.html

Basically, these URL’s were still operating through the mod-rewrite engine, producing 4 pages of identical content as far as the Search Engines were concerned. While some fancy Regular Expression work could have solved the problem, the solution I took was much simpler.

Using the techniques commonly applied to implement canonical URL’s, an “URL Correction” function is added to every page call. This checks the requested URL against the actual URL that the page should have. If it’s not the 100% correct version, there is a 301 redirect to the “proper” URL.

  • Build $correcturl using the passed var
  • Collate $requestedurl as http://www.domain.com . $_SERVER[’REQUEST_URI’]
  • Compare to see if $correcturl === $requestedurl
  • If not then 301 to $correcturl

As well as preventing duplication and directing traffic to the proper page, this also has the advantage of preventing anyone poking around your un-rewritten URL’s! If someone entered http://www.domain.com/page.php?var=1 the site would redirect them to http://www.domain.com/CityName-HotelName-1.html

I hope you find this information useful, saving you making the same mistake I did!

3 Comments »

  1. Fraser Edwards said,

    December 12, 2006 @ 11:52

    It’s a great point and a really good post. There is a plugin for wordpress which helps prevent this kind of thing too if you are concerned about it on your blog too.

    http://fucoder.com/code/permalink-redirect/

    It’s handy since wordpress generates all sorts of alternative URL’s which reach the same page

    i.e.

    http://www.hw techie.co.uk/2006/12/11/url-correction/
    http://www.hw techie.co.uk/?p=20/
    http://www.hw techie.co.uk/?p=20

    etc

    Spaces inserted to make sure the links don’t get picked up ;)

  2. HW Techie said,

    December 12, 2006 @ 11:57

    LOL

    Cheers Fraser, I guess I shouldn’t post about such things ’til I make sure it applies to my blog aswell! oops…

    ETA: Plugin added :)

  3. Darren Cronian said,

    December 30, 2006 @ 11:18

    I’ve used this mod rewrite tool which has helped me understand how the code works a little.. still baffles me.

    I’ve not used this for the blog, because Wordpress has inbuilt permalinks feature.

RSS feed for comments on this post · TrackBack URI

Leave a Comment