Tuesday, September 14, 2010

Scalable Site Structure

If you've been developing web sites for a few years, you've probably developed a lot of sites with a structure similar to the following:

  • /css/
  • /images/
  • /scripts/
  • index.php
  • about.php
  • products.php

One of my colleagues likes to keep the root directory as clean as possible, so his site structures look like this:

  • /content/about.php
  • /content/products.php
  • /common/
  • /images/
  • index.php

Which I find slightly irritating because I prefer to keep my CSS separate from my JavaScript, and the /content/ folder is just taking the mess from one place and moving it to another.

Of course, the Holy Grail of site structures is to have clean URLs like this:

  • /about/
  • /css/
  • /images/
  • /products/
  • /scripts/
  • index.php

If you have a dynamic site, this is easily achieved using mod_rewrite, so /about/ becomes index.php?page=about, but sometimes the client doesn't want to pay for a dynamic site. In that case we use the Poor Man's Clean URLs™ wherein we actually create the /about/ and /products/ folders and put an index.php file in each with the static content. The PHP is then only really used to include a header, footer, navigation, etc.

As a site grows, performance and scalability become a (greater) concern. Popular site speed analysis tools often recommend using a content distribution network. Whether you outsource this or try to do it in-house, it sounds like a lot of work.

I hadn't had the privilege of working on a site that experienced enough traffic to be concerned about scalability until earlier this year. The company was running a national television commercial. They had hired another company to develop a micro-site on a separate server for the campaign, but viewers were being directed to the regular corporate site where a gigantic ad enticed them to visit the micro-site. If you're shaking your head right now, me too. The corporate web server was going down almost daily.

The web statistics package indicated that the most downloaded files were the product videos. I still don't know why they aren't using YouTube or Vimeo for this purpose. On a hunch, I copied the video files to the micro-site's web server and changed all the links. They experienced zero downtime after that.

This experience made me think about how to build scalability into a site from the beginning. Something in the back of my mind twigged a memory of reading something about serving static content from a cookieless domain. Cookies aside, it turns out that serving content from multiple subdomains has certain performance advantages.

Any discussion of clean URLs would not be complete without mentioning www.example.com vs. example.com. Dropping the www seems to be the "cool" thing to do, but my personal feeling is that www.example.com is more descriptive of what you will find there. The mail server is mail.example.com, so the web server should be www.example.com.

So here's the site structure I'm planning to use for future projects:

  • /css/ - css.example.com
  • /images/ - images.example.com
  • /images/logos/ - images.example.com/logos/
  • /images/products/ - images.example.com/products/
  • /js/ - js.example.com
  • /www/ - www.example.com
  • /www/about/ - www.example.com/about/
  • /www/products/ - www.example.com/products/
  • /www/index.php - the actual home page
  • /index.php - file which redirects to www.example.com

The subdomains can be hosted from the primary web server while traffic is small, and migrated to dedicated servers when the traffic increases. The URLs to the images, scripts, and stylesheets don't change, so the PHP files don't need to be updated. Use plenty of folders to keep things organized. You can see I made folders at images.example.com to keep the logos separate from product pictures. If there will be a lot of pictures for each product, I would make folders within the products folder to keep them segregated.

No comments: