Do you remember when the web was young and everything was about having a “Home Page”? The legacy is still there in our web browser’s “Home” button (and do you wonder why we are limited to one home?). In fact, when I started our web server in 1993, like others, I made our primary web entrance a file named… homePage.html

In fact, in those days (late 1993), the correct URL to get to the MCLI main entrance was:
http://hakatai.mcli.dist.maricopa.edu/homePage.html

Using a few little tricks of web “redirection”, this 11 year old URL actually still works and gets you to the current main entrance at http://www.mcli.dist.maricopa.edu/. Is it magic? Not at all, and it is easier than you think. The biggest mystery is why so many sites, big and small, are willing to leave old links hanging in the breeze.

I will show later how I have been able to 3 times migrate a 70+ Mb web directory to different servers yet never produce a bad link message for that pile of content.

But I am not writing to be nostalgic about the web of the early 1990s (remember those ugly grey pages? Title tag animations? The Green Netscape logo?). Let’s take a look at scenarios that might happen among the people who create web content:

“We’ve improved our site to use interactive scripting so all web file pages will be changed from *.html to *.asp”

“That is an effective data analysis web site you built in your personal directory, Smithers, but since it gets so much traffic, we want to provide it a shorter URL by moving it to a top level on our main web site.”

“Most of the web content will be re-organized from a directory structure based on department names to ones based on services provided.”

“That is sol old, let’s just delete it from the web server”

What do these have in common? Let’s talk about Linkrot…

A phenomena of the web is that we never see the other sites out there that link to our content, and we do not get notifications when someone adds our site to their browser bookmarks or other similar service. Our audience is invisible and even if there is a place for comments to be posted, a significant portion of your web audience is silent and you would never know they were there. So unless you are religious about analysis of your web server logs, we are blind to who and how other people are connected to our content.

So if we arbitrarily move files around, rename files or directories, or just delete files, we are quite possibly creating numerous “404 Site Not found errors”. Sure it make sense to us, as we know where things have gone or why they have changed, but the web exists and operates far outside of our own realm.

Ore think of it this way- if your city is doing something like the scale of construction that went on in Boston’s “Big Dig”, imagine if roads and bridges are suddenly closed without any notice or offered detour route, faced with barricades over our regular routes, we are left on our own to figure it out, or give up and go home.

Fortunately, there are a number of things you can do to avoid putting up dead ends on your web site. We’ll explore them from the simple, to more complex (but powerful).

1. The Simplest Redirection: A Simple Web Message. Best for a very small numbers of pages that have moved. Once the content has moved, create a simple HTM page with the same file name and location as the moved ones, and say, “This page has moved to…..”.

Is this not bluntly obvious? Is it that hard to do?

I have done this for a lot of our really old, crufty sites. I came up with standard “This Page is Dead” message, but provide a link to an archived version of the content for those that wish to speak to the dead. See how we set it up for the “WWWW InfoPage” – a startling view into what was up with the web in 1996 (see, everything was a WebPage!).

2. Still Simple Redirection: A Simple Web Message with a MetaRefresh Auto Redirect. This would be the same approach as #1, hang a “this page has moved” message in place of every moved page, but add a line of code to the HEAD part of the file so that it automatically jumps to the new page after a specified number of seconds.

The META tag is standard HTML and should work fine for all browsers. You can read how to do this in the Writing HTML lessons on “What’s the META in your HEAD?” or follow this example I’ve used on a number of our pages.

One of the more common places I use it is where what was once the default page for a URL was a file named index.html has been replaced by a more dynamic one named index.php. If I have a URL such as http://www.porcupinesRus.com/products/index.html, the web server provides the same content for a URL of
http://www.porcupinesRus.com/products/ (shorter is better, right?). But let’s say I have modified my HTML skills to take advantage of PHP, so the new page is now referenced as http://www.porcupinesRus.com/products/index.html. The changes in my web server allow the directory only URL to load the index.php file first, so I could just trash the index.html file, right?

Wrong. What if someone out there had linked/bookmarked to the URL with the index.html file? What if I have local links, say from my home page to href="products/index.html"? A simple web redirect with the META refresh can handle this in a snap. Here is an example I use as a transfer device to anyone who connects/links to out old URL http://www.mcli.dist.maricopa.edu/index.html as it re-directs you to the current home page http://www.mcli.dist.maricopa.edu/index.php

The key in the META REFRESH tag is the value of the attribute content. The number represents the number of seconds before the browser will transfer, and the value of URL= is the address it will jump to after the time. Note the semi-colon that separates these two variable=value pairs, and that all is inside one set of quotes.

This approach works nicely since it does not require anything fancy on the server side.

3. Lightweight Redirection: Server file Alias
A very transparent way to redirect files is (on unix servers at least) setting up what might be parellal to a desktop Alias or Shortcut– it looks like a file to all that request it, but it seamlessly transfers the requested elsewhere. This approach works only to move between different files on the same web server.

In unix, this is a symbolic link. You will need to have terminal/command line access to your site (some web site hots provide a web interface for setting this up). I did this about 9 years ago on our main URL when I decided an address of http://www.mcli.dist.maricopa.edu/homePage.html should be auto bounced to the true (at that time) home page at http://www.mcli.dist.maricopa.edu/index.html.

Logging in to my server and navigating to this directory, we create a symbolic address by typing:

or for the case I described above:

This kept alive any old requests for the older URL.

4. The Geekier Way- Server Name Redirection
If your domain name is changed, you may need to have your network admin create a special netry in their DNS (Domain Name System) set up. For example, in 1993 when I plugged that Mac SE/30 into our network, I picked a URL for the server of hakatai.mcli.dist.maricopa.edu– but within a few months, I was ready to move up to another server, and thought it would be better to follow suit and create a web address that beings with the 3 W’s, or http://www.mcli.dist.maricopa.edu/. It also means that any one of URLs with the “hakatai” in the first slot, will still work, e.g. http://hakatai.mcli.dist.maricopa.edu/proj/mu/igas (Watch the URL display carefully!)

Having a duplicate entry in the DNS, means that requests for the alpha string of either URL is sent to the appropriate numerical IP address.

5. The Elegant, Sweeping Redirection- htaccess
To me, this is one of the slickest, powerful things you can do on a web server with a plain text file. You need to be running an Apache web server, and you need to have some settings made for the server to respect .htaccess files in directories. Ask your web master for info.

All you need to do is to create a plain text file names .htaccess and store it in the top level of your web directory. The way to write a web redirect is to have one line for each area of your web site that needs to be redirected as

This means that any URL that is under the /products/oldstuff directory (including sub directories and sub directories of sub directories) is automatically routed to another web server in the path indicted. The “301” is an Apache server response code that indicates that the change in the URL is permanant, so that search engines such as Google can even change their old outdated data with the newer URL.

What happens? If we have a request for

is automagically sent to:

Now before you click away in technical boredom, let me share how I have used this command. My old home page (last really editing in 1999 or so) is linked in the footer of almost every page on our site to:

There is some 70 Mb of photos, software, a lot of old shockwaves. About 18 months ago, our IT folks asked if we could archive any content as their disk space was getting full. I offered to move the 70 Mb slug to one of my in house Linux servers, so the whole directory was plunked over to our “realgar” file.

So any link to something under the former ends up in the same relative place in the latter. This was achieved with this .htaccess entry:

Thus, I could delete the entire 70Mb on the main server, and any reqauest for files from within there were transferred to the realgar server. This means that everything under the root directory /alan would have its correct URL loaded automatically from the new server.

But that is not the end of the re-direction– I then thought it would be better to move it to our “zircon” server which is on a faster part of the network and is insured via a dedicated tape backup drive. So I moved the files again, updated the .htaccess file on the main web server to now read:

as well as loading a copy of this same file on the realgar server. This way, we are not leaving any rotten links.

But that still is not the end of the tale. Our District came out with a new “computer usage” policy and has some strong words about “personal” home pages and content not directly related to the work we do. Since among my 70Mb were a lot of my hobby photographs and horrific poetry, I decided to once more shift (and update/post new) htaccess files so that all references on any of our servers to something under the alan directory is shifted to my personal web server, http://dommy.com/alan

What this means, is that any reference to a URL buried deep in my site, will be transferred to my own URL, e.g. all these URLs:

http://www.mcli.dist.maricopa.edu/alan/linear/moh2.html
http://realgar.mcli.dist.maricopa.edu/alan/linear/moh2.html
http://dommy.com/alan/linear/moh2.html

will all jump you to:
http://dommy.com/alan/linear/moh2.html

This is just a small part of what this magical file can do– for more see:

Bottom line, if you do anything to change a public URL, whether it is house cleaning the server, changing web technologies, even deleting pages, consider leaving something, a message, a web redirect to reduce the number of tickets to Lab 404.

The post "This Old Home Page (and mastering web redirection)" was originally assembled from spare parts of a 1957 Chevy at CogDogBlog (http://cogdogblog.com/2005/02/this-old/) on February 3, 2005.

Comments are closed.