URL-Hacking: Do-it-yourself Navigation

How many times has it happened to you? You click on a promising link, and you wind up on a strange, orphaned web page with the unhelpful message "Use the 'Go Back' button to return to the table of contents."  If you want to explore this web site, what do you do? Hack the URL!

Hacking a URL is the process of moving through a complex web site by playing directly with the  address.  Simply lop off the end of the address, in order to see whether the author has provided a table of contents page for a particular collection of web pages. (There's nothing illegal or even very technical about what I mean by hacking a URL -- but geeks will enjoy Jorn Barger's "Hacking URLs for Fun and Profit.") 

If you surf your way to "www.mysite.com/features/1999/may/juggling.htm", you could chop off the end of the URL, in order to see what's located at "www.mysite.com/features/".   (The final "/" is optional, but the page will load slightly faster if you include it.)  If the site designer has put a file named "index.html" in that directory, the server will find it by default.

If you can make sense of a particular website's organizational pattern, you can bypass the site's navigation altogether by trying to predict the address of a web page you would like to see.  If you liked the May, 1999 article on juggling, you could "hack" the URL to check what kind of articles might be posted at "www.mysite.com/features/[current year]/[current month]/". 

In this section, I demonstrate how an Internet user might hack a URL in order to use (rather than simply read) a page I happened to find deep within the UWEC web site. This page lacks the navigation features that turn a static text document into a useful, living hypertext.

URL-hacking in Action 

Sometimes URL-hacking is a simply quick way for impatient power-users to jump around within a website.  At other times, regular users who stumble upon internal web pages with incomplete navigation systems will need to hack a URL in order to get anywhere at all (in order to determine whether a particular web page is worth citing in a research paper, or to figure out whom to contact for more information).

Let's imagine you have found the following "hit" on a search engine:

DIR1.GIF (10272 bytes)

Perhaps you are intrigued by the title of the first hit ("UW-Eau Claire Summer Times"), and you want to learn more about the publication. 

Target URL:   http://www.uwec.edu/Admin/NewsBureau/SummerTimes/STpast/Summer99/07-26-99/regents.html

URL0.JPG (41657 bytes)If you click on the link, what you get is a page that is formatted to look like a printed press release (which is exactly what this page is -- an ordinary printed press release, thrown into a web page template).  The only links on this page (way, way down at the bottom) are to the UWEC home page, to the home page of the UWEC News Bureau, and to the current issue of the Summer Times.  If for some reason you wanted to know what other news items were published in this same issue, you would have to do some hunting.   That's where URL-hacking can help.


Hack off the end of the URL in order to climb up the directory tree, looking for a table of contents or general information page. 

Original URL: http://www.uwec.edu/Admin/NewsBureau/SummerTimes/STpast/Summer99/07-26-99/regents.html
Hacked URL:   http://www.uwec.edu/Admin/NewsBureau/SummerTimes/STpast/Summer99/07-26-99/

URL1.JPG (81208 bytes)When we delete the very last part of this web address (the filename, "regents.html"), the web server delivers a default file, called "index.html," located in the same directory (or "folder").  The author of this page (see image at left) has helpfully created a table of contents, in a file named "index.html".   Problem solved! 

Well, sort of.  Remember, we had to hack the URL to get to this information.   The site designers had not accounted for the fact that some people might find their way directly onto an internal web page, bypassing the table of contents.  (Another problem with this site: it uses frames.)

Now that we have found this issue's table of contents, what if we want to look at other issues of the same publication?  There is no link to "next" or "previous" issues, and there is no link "up" to a list of all the issues that were published that summer.  Because the designer of this site did not expect that a user would ever wander onto in internal page like this, we are pretty much stuck again.

Let's hack the URL once more, and see what we find.

Hacked URL:   http://www.uwec.edu/Admin/NewsBureau/SummerTimes/STpast/Summer99/

www.uwec.edu -
/Admin/NewsBureau/SummerTimes/
STpast/Summer99/
[To Parent Directory]

    1/2/00  2:54 PM        <dir> 06-14-99
    1/2/00  2:54 PM        <dir> 06-21-99
    1/2/00  2:54 PM        <dir> 06-28-99
    1/2/00  2:54 PM        <dir> 07-05-99
    1/2/00  2:54 PM        <dir> 07-12-99
    1/2/00  2:54 PM        <dir> 07-19-99
    1/2/00  2:54 PM        <dir> 07-26-99
    1/2/00  2:54 PM        <dir> 08-02-99

In this case, the web developer has not written a special table of contents page for this directory.  When the web server found no file named "index.html", it generated a default index (by listing all the files that are in this directory).

In this case, all the files are usefully-named subdirectories.  From the page displayed to the left, a user would probably have no trouble locating all the other issues of the 1999 "Summer Times". 

Still, many users who are unfamiliar with directory listings would find this page confusing -- the color scheme looks nothing like the previous pages, and the title at the top of the page looks like a mistake. 

Further, if we want to find a mission statement or other general information page, we won't find it here.

If we want to find out more about the publication "Summer Times" we shall have to hack the URL again.

Hacked URL:   http://www.uwec.edu/Admin/NewsBureau/SummerTimes/

URL3.JPG (31677 bytes)Uh-oh!  We get a navigation menu on the left, but now there's an error message on the right: "The page cannot be found."

Most web surfers would probably give up by this point, but let's keep going, just to see how confusing it can be if webmasters don't use "index.html" on a regular basis. Remember, we are simply looking for basic information about the Summer Times periodical.


Keep hacking:

Chopped URL:  http://www.uwec.edu/Admin/NewsBureau/

www.uwec.edu - /Admin/NewsBureau/
[To Parent Directory]

   4/27/00  1:53 PM         5407 Backup of nbureau.wbk
   4/24/00  9:31 AM        <dir> bulletin
   2/25/00  5:14 PM        <dir> calendars
   8/11/99  9:45 AM         5516 CopyOf_nbureau.html
    1/2/00  2:52 PM        <dir> experts
   4/20/00  2:34 PM         1816 guide.html
   5/12/00 12:54 PM        <dir> images
   4/20/00  9:22 AM        33280 NBstyle.doc
   4/27/00  2:19 PM         5595 nbureau.html
    2/3/00 12:07 PM        <dir> news_events
    5/3/98  1:27 PM         2471 newsperi.html
    1/2/00  2:52 PM        <dir> profile
    5/2/00  1:16 PM        <dir> release
    5/3/98  1:27 PM         1757 release.html
   5/12/00 12:55 PM        <dir> staff
    1/2/00  2:54 PM        <dir> SummerTimes
   2/23/00 12:38 PM          450 test.htm
   2/14/00 12:05 PM          472 test1.htm
   2/14/00 12:06 PM          471 test2.htm
   2/17/00  5:05 PM          472 test3.htm
   2/21/00 10:11 AM          450 test4.htm
   5/11/00 12:00 PM        <dir> View

Where to find information about the Summer Times?

In this case, we get another automatically-generated directory list, but look at the number of files in this directory! Nobody is going to want to click all of these files randomly, in the hopes of discovering where the table of contents is. This page displays links to backup copies of files, and a series of five "test" files.  There they are online, even though they are out of date. Search engines may be able to find them.

Aha!  There's a directory named "SummerTimes"!  That might be where we would expect to find an introduction page of some sort.  But if we click on it, we end up with the very same error message that we got in the previous step.

That's not helpful at all!

By the way, the news bureau home page is actually named "nbureau.html".  It is a perfectly acceptable "portal" style home page, designed to speed visitors on their way to the various subsections of the website.   From that page, it is easy to find the Summer Times home page, which happens to be named "STopen.html".  Nevertheless, as this exercise demonstrates, it is not easy to find that home page unless you know where to look.

If we chop the URL yet again, we get something even less useful on www.uwec.edu/Admin -- a much, much longer directory listing.

See also:

Navigation: An often neglected component of web authorship

URL Hacking and Ethics

Sometimes lawyers contact me about a case featuring URL hacking (or, as one such lawyer called it, "URL typing"). I haven't yet been interested enough in a case to offer to do any writing or testifying for free. But I'll summarize my position here.

  • If a company built a private warehouse, not intended to be accessed by the public, and I broke through the door and saw a secret, I would be in the wrong; the problem here is breaking and entering.
  • If a company built a gallery that was open to the public, and put its secrets out on the walls along with the material visitors are supposed to see, and I walked in when the gallery was open for business happened to see a secret, I have done no wrong; the problem is the company's non-existent security.
  • If a company built an archive, where all visitors were expected to write down a catalog number and wait in the library while the clerk fetches it, and I ask the clerk to bring me "documents/2008/annual," the clerk will probably first go to the shelf and see if such a document exists.
    • If it does exist, the clerk will check to see whether the document has a "Top Secret" tag on it, or an "Embargo until Dec 2007" sign, or a note that says "Only Bill, Sally, and Freddy are permitted to read this document."
    • If the owner of the item has placed it in the archive without any restrictions whatsoever, the clerk would be expected to treat this request just like any other.
  • The problem is once again the company's non-existent security.

In the archive example above, if I bombarded the clerk with hundreds of random requests, hoping to come up with something unexpected, that's a very different matter from actually typing the URL out of a desire to get to a page that deductive reasoning suggests ought to exist.

Since some web pages are dynamically generated from URLs that include complex parameters, there is not a clear line between what counts as simply typing the URL and manipulating complex parameters in a deliberate attempt to alter the way the site's designers expected the site to behave.

Of course, manipulating a system may be against the terms of an end-user license, student handbook, employment contract.

Just because a company's website permits a hack does not automatically excuse all the actions carried out by the hacker. Most hackers are simply curious, seeking a faster, more powerful way to do something that seems slowed down by an unnecessarily tedious newbie-friendly process. URL hacking won't help a user bypass a simple .htaccess password, and it won't let user see sensitive material unless the webmaster has already placed that material on the website.


Category Tags