Hacking the URL: Do-It-Yourself Navigation

URL-Hacking: Do-it-yourself Navigation

How many times has it happened to you? You click on a promising link, and you wind up on a strange, orphaned web page with the unhelpful message "Use the 'Go Back' button to return to the table of contents." If you want to explore this web site, what do you do? Hack the URL!

18 May 2000; by Dennis G. Jerz

Hacking a URL is the process of moving through a complex web site by playing directly with the address. Simply lop off the end of the address, in order to see whether the author has provided a table of contents page for a particular collection of web pages. (There's nothing illegal or even very technical about what I mean by hacking a URL -- but geeks will enjoy Jorn Barger's "Hacking URLs for Fun and Profit.")

If you surf your way to "www.mysite.com/features/1999/may/juggling.htm", you could chop off the end of the URL, in order to see what's located at "www.mysite.com/features/". (The final "/" is optional, but the page will load slightly faster if you include it.) If the site designer has put a file named "index.html" in that directory, the server will find it by default.

If you can make sense of a particular website's organizational pattern, you can bypass the site's navigation altogether by trying to predict the address of a web page you would like to see. If you liked the May, 1999 article on juggling, you could "hack" the URL to check what kind of articles might be posted at "www.mysite.com/features/[current year]/[current month]/".

In this section, I demonstrate how an Internet user might hack a URL in order to use (rather than simply read) a page I happened to find deep within the UWEC web site. This page lacks the navigation features that turn a static text document into a useful, living hypertext.

URL-hacking in Action

Sometimes URL-hacking is a simply quick way for impatient power-users to jump around within a website. At other times, regular users who stumble upon internal web pages with incomplete navigation systems will need to hack a URL in order to get anywhere at all (in order to determine whether a particular web page is worth citing in a research paper, or to figure out whom to contact for more information).

Let's imagine you have found the following "hit" on a search engine:

DIR1.GIF (10272 bytes)

Perhaps you are intrigued by the title of the first hit ("UW-Eau Claire Summer Times"), and you want to learn more about the publication.

Target URL: http://www.uwec.edu/Admin/NewsBureau/SummerTimes/STpast/Summer99/07-26-99/regents.html

If you click on the link, what you get is a page that is formatted to look like a printed press release (which is exactly what this page is -- an ordinary printed press release, thrown into a web page template). The only links on this page (way, way down at the bottom) are to the UWEC home page, to the home page of the UWEC News Bureau, and to the current issue of the Summer Times. If for some reason you wanted to know what other news items were published in this same issue, you would have to do some hunting. That's where URL-hacking can help.

Hack off the end of the URL in order to climb up the directory tree, looking for a table of contents or general information page.

Original URL: http://www.uwec.edu/Admin/NewsBureau/SummerTimes/STpast/Summer99/07-26-99/regents.html
Hacked URL: http://www.uwec.edu/Admin/NewsBureau/SummerTimes/STpast/Summer99/07-26-99/

When we delete the very last part of this web address (the filename, "regents.html"), the web server delivers a default file, called "index.html," located in the same directory (or "folder"). The author of this page (see image at left) has helpfully created a table of contents, in a file named "index.html". Problem solved!

Well, sort of. Remember, we had to hack the URL to get to this information. The site designers had not accounted for the fact that some people might find their way directly onto an internal web page, bypassing the table of contents. (Another problem with this site: it uses frames.)

Now that we have found this issue's table of contents, what if we want to look at other issues of the same publication? There is no link to "next" or "previous" issues, and there is no link "up" to a list of all the issues that were published that summer. Because the designer of this site did not expect that a user would ever wander onto in internal page like this, we are pretty much stuck again.

Digression: Comments about the Summer Times Website

This site is an excellent example of how frames can cause more problems than they solve. The Summer Times web designer created a navigation menu on the left, and a content area on the right (see image, above left). Separating navigation and content in this manner makes it marginally easier for the webmaster to add new articles, but such divisions can cause major usability headaches for visitors who arrive via one of those meaty, content-bearing pages. (See "Like it or Not, Your Website Will Talk to Strangers", in box above). Due to the original design scheme for the site, in which the frames did all the navigation work, the informative pages on this site are difficult to navigate -- as the URL-hacking example has demonstrated.

How serious is this problem? After all, the site is perfectly good at doing what it was designed to do -- present current press releases to the public. But the years and years worth of archived material is attractive to search engines; at many sites, the older articles get more hits than the newer ones, simply because the older pages have been found by more search engines and webmasters (who can then funnel new readers to those old pages). Since the purpose of a public relations office is to present an officially-sanctioned "spin" on the facts, those archives can continue to work for the university as long as people keep reading them -- that's great. But the people who arrive at the Summer Times face significant navigation problems.

Suggestions for Summer Times:

From each page, add a link to next level "up" (and reload the entire frameset, to avoid recursive framing). The most logical place to do that would be to turn the date of each article into a hyperlink that loads a chronological index of press releases.
In the margins (or at the end of each article), add links to related earlier stories, and to related reading elsewhere in the Internet.
Within the text of the article, offer links to the webpages of the organizations or people involved.
On the index pages, offer short summaries of each article, so that the reader doesn't have to click on each link in order to find an interesting topic. (Since the articles in question are press releases, the "lead paragraph" will probably be able to double as the summary.)
Note: When I checked the UWEC server logs, I found that some 235 pages from back issues of the Summer Times had been accessed a total of 1506 times during the month of April, 2000. (The actual number of hits to the Summer Times website would be much greater, since the server log is set to record only those files accessed 5 or more times). Most of those visitors would have been drawn by search engines. By comparison, the welcome message from UWEC Chancellor Don Mash -- a bit of well-meaning administrative fluff which is available via a prominent link on the home page -- was accessed only about 800 times during that same period. Given these facts, which do you think is the more important public relations vehicle?
See also:
- Eyetrack Online News Study May Surprise You:
  Cameras stuck to people's eyeballs as they read online newspapers confirm "what we all know, but don't always practice — that the Web is a different medium where rules about user behavior have little to do with what works in the print medium."
- Navigation: An often neglected component of web authorship:
  To make the best use of hypertext, you should not blindly follow the convention of printed, linear text. Instead, divide your content into logical, free-standing units that can be strung together like beads, in different orders.

by Dennis G. Jerz
23 May 2000 -- this box first posted

Let's hack the URL once more, and see what we find.

Hacked URL: http://www.uwec.edu/Admin/NewsBureau/SummerTimes/STpast/Summer99/

www.uwec.edu -
/Admin/NewsBureau/SummerTimes/
STpast/Summer99/

[To Parent Directory]

    1/2/00  2:54 PM        <dir> 06-14-99
    1/2/00  2:54 PM        <dir> 06-21-99
    1/2/00  2:54 PM        <dir> 06-28-99
    1/2/00  2:54 PM        <dir> 07-05-99
    1/2/00  2:54 PM        <dir> 07-12-99
    1/2/00  2:54 PM        <dir> 07-19-99
    1/2/00  2:54 PM        <dir> 07-26-99
    1/2/00  2:54 PM        <dir> 08-02-99

In this case, the web developer has not written a special table of contents page for this directory. When the web server found no file named "index.html", it generated a default index (by listing all the files that are in this directory).

In this case, all the files are usefully-named subdirectories. From the page displayed to the left, a user would probably have no trouble locating all the other issues of the 1999 "Summer Times".

Still, many users who are unfamiliar with directory listings would find this page confusing -- the color scheme looks nothing like the previous pages, and the title at the top of the page looks like a mistake.

Further, if we want to find a mission statement or other general information page, we won't find it here.

If we want to find out more about the publication "Summer Times" we shall have to hack the URL again.

Hacked URL: http://www.uwec.edu/Admin/NewsBureau/SummerTimes/

Uh-oh! We get a navigation menu on the left, but now there's an error message on the right: "The page cannot be found."

Most web surfers would probably give up by this point, but let's keep going, just to see how confusing it can be if webmasters don't use "index.html" on a regular basis. Remember, we are simply looking for basic information about the Summer Times periodical.

Keep hacking:

Chopped URL: http://www.uwec.edu/Admin/NewsBureau/

www.uwec.edu - /Admin/NewsBureau/

[To Parent Directory]

   4/27/00  1:53 PM         5407 Backup of nbureau.wbk
   4/24/00  9:31 AM        <dir> bulletin
   2/25/00  5:14 PM        <dir> calendars
   8/11/99  9:45 AM         5516 CopyOf_nbureau.html
    1/2/00  2:52 PM        <dir> experts
   4/20/00  2:34 PM         1816 guide.html
   5/12/00 12:54 PM        <dir> images
   4/20/00  9:22 AM        33280 NBstyle.doc
   4/27/00  2:19 PM         5595 nbureau.html
    2/3/00 12:07 PM        <dir> news_events
    5/3/98  1:27 PM         2471 newsperi.html
    1/2/00  2:52 PM        <dir> profile
    5/2/00  1:16 PM        <dir> release
    5/3/98  1:27 PM         1757 release.html
   5/12/00 12:55 PM        <dir> staff
    1/2/00  2:54 PM        <dir> SummerTimes
   2/23/00 12:38 PM          450 test.htm
   2/14/00 12:05 PM          472 test1.htm
   2/14/00 12:06 PM          471 test2.htm
   2/17/00  5:05 PM          472 test3.htm
   2/21/00 10:11 AM          450 test4.htm
   5/11/00 12:00 PM        <dir> View

Where to find information about the Summer Times?

In this case, we get another automatically-generated directory list, but look at the number of files in this directory! Nobody is going to want to click all of these files randomly, in the hopes of discovering where the table of contents is. This page displays links to backup copies of files, and a series of five "test" files. There they are online, even though they are out of date. Search engines may be able to find them.

Aha! There's a directory named "SummerTimes"! That might be where we would expect to find an introduction page of some sort. But if we click on it, we end up with the very same error message that we got in the previous step.

That's not helpful at all!

By the way, the news bureau home page is actually named "nbureau.html". It is a perfectly acceptable "portal" style home page, designed to speed visitors on their way to the various subsections of the website. From that page, it is easy to find the Summer Times home page, which happens to be named "STopen.html". Nevertheless, as this exercise demonstrates, it is not easy to find that home page unless you know where to look.

If we chop the URL yet again, we get something even less useful on www.uwec.edu/Admin -- a much, much longer directory listing.

URL Hacking and Ethics

Sometimes lawyers contact me about a case featuring URL hacking (or, as one such lawyer called it, "URL typing"). I haven't yet been interested enough in a case to offer to do any writing or testifying for free. But I'll summarize my position here.

If a company built a private warehouse, not intended to be accessed by the public, and I broke through the door and saw a secret, I would be in the wrong; the problem here is breaking and entering.
If a company built a gallery that was open to the public, and put its secrets out on the walls along with the material visitors are supposed to see, and I walked in when the gallery was open for business happened to see a secret, I have done no wrong; the problem is the company's non-existent security.
If a company built an archive, where all visitors were expected to write down a catalog number and wait in the library while the clerk fetches it, and I ask the clerk to bring me "documents/2008/annual," the clerk will probably first go to the shelf and see if such a document exists.
- If it does exist, the clerk will check to see whether the document has a "Top Secret" tag on it, or an "Embargo until Dec 2007" sign, or a note that says "Only Bill, Sally, and Freddy are permitted to read this document."
- If the owner of the item has placed it in the archive without any restrictions whatsoever, the clerk would be expected to treat this request just like any other.
The problem is once again the company's non-existent security.

In the archive example above, if I bombarded the clerk with hundreds of random requests, hoping to come up with something unexpected, that's a very different matter from actually typing the URL out of a desire to get to a page that deductive reasoning suggests ought to exist.

Since some web pages are dynamically generated from URLs that include complex parameters, there is not a clear line between what counts as simply typing the URL and manipulating complex parameters in a deliberate attempt to alter the way the site's designers expected the site to behave.

Of course, manipulating a system may be against the terms of an end-user license, student handbook, employment contract.

Just because a company's website permits a hack does not automatically excuse all the actions carried out by the hacker. Most hackers are simply curious, seeking a faster, more powerful way to do something that seems slowed down by an unnecessarily tedious newbie-friendly process. URL hacking won't help a user bypass a simple .htaccess password, and it won't let user see sensitive material unless the webmaster has already placed that material on the website.

by Dennis G. Jerz
18 May 2000 -- first posted
23 May 2000 -- "Digression" box added
06 Aug 2001 -- last modified
13 Feb 2007 -- "URL Hacking and Ethics" section added

In the News

28 Oct 2002
URL Hacking Leads to Web Privacy Case: A small Swedish company posted its quarterly earnings on its website, without any password protection, but without publicizing the URL. A Reuters reporter guessed the URL and found the report before it was supposed to be released. Now the reporter is being sued for Hacking a URL.

Make Your URLs Hackable

If your website exists completely within a single directory ("www.geocities.com/hectopus", for example), then all you need to do is name your home page "index.html".

If your site is more complex, create homepages named "index.html" in each subdirectory.

In addition, each page on a particular level should link to the local "index.html" page, and also provide a link to the next level "up". (See Navigation: An often neglected component of web authorship).

The Golden Rule

Like It or Not, Your Website Will Talk to Strangers
You say your website isn't that complex... you say the people who use it already know how to find what they need... you say you don't have enough time... so why should you bother making your website accessible to strangers? I can offer several reasons.

The Golden Rule. Do unto others as you would have them do unto you. Nobody likes being marooned on a lame website. If you can't provide all the answers yourself, you should at least offer links to sites that can.
Serving Your Clients. Whatever you do to make your site more accessible to strangers will also make it more accessible to your intended audience.
Pride in a Job Well Done. Whatever is worth saying, is worth saying well. If you are posting something on the Internet, you probably want people to read it. To help them to read it, you should recognize what people value in their online texts. (See "Top 10 Mistakes" Revisited.)

URL-Hacking: Do-it-yourself Navigation

URL-hacking in Action

See also:

URL Hacking and Ethics

Category Tags

Search Site