Software Development Resources
Create account | Log in | Log in with OpenID | Help

Clean URL

From DocForge

A clean URL is one which is short and descriptive. For example, you'll notice the relative URL to this page is /wiki/Clean_URL. The first part of the path designates the section or application of the site and second is the title of the article within the section. There's no reference to application files (such as index.php) or record identifiers.

Clean URLs are often preferable for a variety of reasons. Usually they're chosen for SEO (Search Engine Optimization). They're also easier to remember. And for developers they help organize control flow of the web application.

At the start of development of a new web application it's often a good idea to come up with a URL scheme. Organization of URLs will simplify web development and aid in modularizing code. Often the tree of URLs will mimic the site's basic organization. For example, it would be preferable to use URLs like /blog/title and /group/discussion instead of /blog-title and /discussion. This way web application code can easily determine what type of page to generate, there won't be any accidental overlap of root URLs, and the tree will match the organization of modular code.

There are many ways to develop a web application with clean URLs. Here we describe one simple method for PHP running with Apache httpd. The general approach is to send all requests to one PHP file. That PHP script will decide which other scripts to execute to process and generate the appropriate page content. This setup is particularly useful when the web application has modules which register the URLs they will handle with core code.

Requirements

Apache Configuration

When requests for your application come into Apache, they need to all be sent to a single script. By default Apache looks in its configured DocumentRoot for the exact file requested in the URL. Therefore we'll use the mod_rewrite module to turn any URL into a script request. The following can be placed in your httpd.conf or .htaccess. This simple method is taken from the Drupal project.

RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]
  1. The first line is the path at which your URL scheme begins.
  2. The first condition stops analysis if the request is the exact path of an existing file. This ignores requests for CSS files, images, and other special scripts.
  3. The second condition stops analysis if the request is the exact path of an existing directory.
  4. The rule uses a regular expression to take the entire path and pass it to index.php as a query string parameter ("q").

PHP

In index.php all requests will now come in as a simple parameter in the query string. For small web applications it may be sufficient to handle requests with a switch statement. This is an extremely simple example for demonstration purposes.

if (!empty($_GET['q'])) {
  // Break the request into it's path parts
  $path_parts = explode('/', $_GET['q']);
  $section = $path_parts[0];
}
else {
  $section = 'home';
}

switch ($section) {
  case 'home':
    // For requests to the home page
    require('home.html');
  case 'about':
    // For URLs like http://www.example.com/about
    require('about.html');
    break;
  case 'wiki':
    // For URLs like http://www.example.com/wiki/Document
    $page = empty($path_parts[1]) ? '' : $path_parts[1];
    require('wiki.php');
    break;
  default:
    // Unexpected page requests should get a 404
    header("HTTP/1.0 404 Not Found");
    include('error.html');
    break;
}

More complex and robust applications should be more flexible, such as having loadable modules which register the paths they handle (e.g. Drupal's menu API). It's also generally best to use as few global variables as possible as they can get out of hand when the application gets large.

Discuss