How many times has it happened to you? You click on a promising link, and you get a 404 “File not found” error message, or an otherwise unhelpful page that offers no useful clues about where you should go next. If you’re not ready to give up, what do you do?
Hack the URL!
You can start by simply lopping off the end of a long address, in order to see whether the author has provided a table of contents page for a particular collection of web pages.
If you find a dead end at the URL “www.mysite.com/features/1999/may/juggling.htm”, you could chop off the end of the URL, in order to see what the website has posted at “www.mysite.com/features/”. If that directory has a file named “index.html,” the server should display it.
(There’s nothing illegal or even very technical about what I mean by hacking a URL — but geeks will enjoy Jorn Barger’s “Hacking URLs for Fun and Profit.”)
If you can make sense of a particular website’s organizational pattern, you can bypass the site’s navigation altogether by trying to predict the address of a web page you would like to see. If you liked the May, 1999 article on juggling, you could “hack” the URL to check what kind of articles might be posted at “www.mysite.com/features/[current year]/[current month]/”.
In this section, I demonstrate how an Internet user might hack a URL in order to use (rather than simply read) a page I happened to find deep within a university web site. The page I found lacked some of the important navigation features that turn a static text document into a useful, living hypertext.
URL-hacking in Action
Sometimes URL-hacking is a simply quick way for impatient power-users to jump around within a website. At other times, regular users who stumble upon internal web pages with incomplete navigation systems will need to hack a URL in order to get anywhere at all (in order to determine whether a particular web page is worth citing in a research paper, or to figure out whom to contact for more information).
Let’s imagine you have found the following “hit” on a search engine:
Perhaps you are intrigued by the title of the first hit (“UW-Eau Claire Summer Times”), and you want to learn more about the publication.
If you click on the link, what you get is a page that is formatted to look like a printed press release (which is exactly what this page is — an ordinary printed press release, thrown into a web page template). The only links on this page (way, way down at the bottom) are to the UWEC home page, to the home page of the UWEC News Bureau, and to the current issue of the Summer Times. If for some reason you wanted to know what other news items were published in this same issue, you would have to do some hunting. That’s where URL-hacking can help.
Hack off the end of the URL in order to climb up the directory tree, looking for a table of contents or general information page.
When we delete the very last part of this web address (the filename, “regents.html”), the web server delivers a default file, called “index.html,” located in the same directory (or “folder”). The author of this page (see image at left) has helpfully created a table of contents, in a file named “index.html”. Problem solved!
Well, sort of. Remember, we had to hack the URL to get to this information. The site designers had not accounted for the fact that some people might find their way directly onto an internal web page, bypassing the table of contents. (Another problem with this site: it uses frames.)
Now that we have found this issue’s table of contents, what if we want to look at other issues of the same publication? There is no link to “next” or “previous” issues, and there is no link “up” to a list of all the issues that were published that summer. Because the designer of this site did not expect that a user would ever wander onto an internal page like this, we are pretty much stuck again.
Let’s hack the URL once more, and see what we find.
|www.uwec.edu – |
[To Parent Directory] 1/2/00 2:54 PM <dir> 06-14-99 1/2/00 2:54 PM <dir> 06-21-99 1/2/00 2:54 PM <dir> 06-28-99 1/2/00 2:54 PM <dir> 07-05-99 1/2/00 2:54 PM <dir> 07-12-99 1/2/00 2:54 PM <dir> 07-19-99 1/2/00 2:54 PM <dir> 07-26-99 1/2/00 2:54 PM <dir> 08-02-99
In this case, the web developer has not written a special table of contents page for this directory. When the web server found no file named “index.html”, it generated a default index (by listing all the files that are in this directory).
In this case, all the files are usefully-named subdirectories. From the page displayed to the left, a user would probably have no trouble locating all the other issues of the 1999 “Summer Times”.
Still, many users who are unfamiliar with directory listings would find this page confusing — the color scheme looks nothing like the previous pages, and the title at the top of the page looks like a mistake.
Further, if we want to find a mission statement or other general information page, we won’t find it here.
If we want to find out more about the publication “Summer Times” we shall have to hack the URL again.
Uh-oh! We get a navigation menu on the left, but now there’s an error message on the right: “The page cannot be found.”‘
Most web surfers would probably give up by this point, but let’s keep going, just to see how confusing it can be if webmasters don’t use “index.html” on a regular basis. Remember, we are simply looking for basic information about theSummer Times periodical.
Chopped URL: http://www.uwec.edu/Admin/NewsBureau/
|www.uwec.edu – /Admin/NewsBureau/|
[To Parent Directory] 4/27/00 1:53 PM 5407 Backup of nbureau.wbk 4/24/00 9:31 AM <dir> bulletin 2/25/00 5:14 PM <dir> calendars 8/11/99 9:45 AM 5516 CopyOf_nbureau.html 1/2/00 2:52 PM <dir> experts 4/20/00 2:34 PM 1816 guide.html 5/12/00 12:54 PM <dir> images 4/20/00 9:22 AM 33280 NBstyle.doc 4/27/00 2:19 PM 5595 nbureau.html 2/3/00 12:07 PM <dir> news_events 5/3/98 1:27 PM 2471 newsperi.html 1/2/00 2:52 PM <dir> profile 5/2/00 1:16 PM <dir> release 5/3/98 1:27 PM 1757 release.html 5/12/00 12:55 PM <dir> staff 1/2/00 2:54 PM <dir> SummerTimes 2/23/00 12:38 PM 450 test.htm 2/14/00 12:05 PM 472 test1.htm 2/14/00 12:06 PM 471 test2.htm 2/17/00 5:05 PM 472 test3.htm 2/21/00 10:11 AM 450 test4.htm 5/11/00 12:00 PM <dir> View
Where to find information about the Summer Times?
In this case, we get another automatically-generated directory list, but look at the number of files in this directory! Nobody is going to want to click all of these files randomly, in the hopes of discovering where the table of contents is. This page displays links to backup copies of files, and a series of five “test” files. There they are online, even though they are out of date. Search engines may be able to find them.
Aha! There’s a directory named “SummerTimes”! That might be where we would expect to find an introduction page of some sort. But if we click on it, we end up with the very same error message that we got in the previous step.
That’s not helpful at all!
By the way, the news bureau home page is actually named “nbureau.html“. It is a perfectly acceptable “portal” style home page, designed to speed visitors on their way to the various subsections of the website. From that page, it is easy to find the Summer Times home page, which happens to be named “STopen.html“. Nevertheless, as this exercise demonstrates, it is not easy to find that home page unless you know where to look.
URL Hacking and Ethics
Sometimes lawyers contact me about a case featuring URL hacking (or, as one such lawyer called it, “URL typing”). I haven’t yet been interested enough in a case to offer to do any writing or testifying for free. But I’ll summarize my position here.
- If a company built a private warehouse, not intended to be accessed by the public, and I broke through the door and saw a secret, I would be in the wrong; the problem here is breaking and entering.
- If a company built a gallery that was open to the public, and put its secrets out on the walls along with the material visitors are supposed to see, and I walked in when the gallery was open for business happened to see a secret, I have done no wrong; the problem is the company’s non-existent security.
- If a company built an archive, where all visitors were expected to write down a catalog number and wait in the library while the clerk fetches it, and I ask the clerk to bring me “documents/2008/annual,” the clerk will probably first go to the shelf and see if such a document exists.
- If it does exist, the clerk will check to see whether the document has a “Top Secret” tag on it, or an “Embargo until Dec 2007” sign, or a note that says “Only Bill, Sally, and Freddy are permitted to read this document.”
- If the owner of the item has placed it in the archive without any restrictions whatsoever, the clerk would be expected to treat this request just like any other.
- If I request a document that the company hasn’t told the clerk to restrict, then the problem is once again the company’s non-existent security.
If instead of submitting a request that follows a clear pattern, I instead bombarded the clerk with hundreds of random requests, hoping to come up with something unexpected, that’s a very different matter. Because it’s common practice to type URLs to facilitate navigation on a website, typing out a URL isn’t devious, or illegal. It’s simply a way to get to a page that deductive reasoning suggests ought to exist.
Since some web pages are dynamically generated from URLs that include complex parameters, there is not a clear line between what counts as simply typing the URL and manipulating complex parameters in a deliberate attempt to alter the way the site’s designers expected the site to behave.
Of course, manipulating a system may be against the terms of an end-user license, student handbook, employment contract.
Just because a company’s website permits a hack does not automatically excuse all the actions carried out by the hacker. Most hackers are simply curious, seeking a faster, more powerful way to do something that seems slowed down by an unnecessarily tedious newbie-friendly process. URL hacking won’t help a user bypass a simple .htaccess password, and it won’t let user see sensitive material unless the webmaster has already placed that material on the website.
by Dennis G. Jerz
18 May 2000 — first posted
23 May 2000 — “Digression” box added
06 Aug 2001 — last modified
13 Feb 2007 — “URL Hacking and Ethics” section added
18 Jun 2013 — minor edits; “Digression” box removed
21 Sep 2021 — minor edits
29 Mar 2022 — added new graphic; redirected relevant, historical links to Internet Archive caches
|In the News|
|28 Oct 2002URL Hacking Leads to Web Privacy Case (Salon/Reuters)|
A small Swedish company posted its quarterly earnings on its website, without any password protection, but without publicizing the URL. A Reuters reporter guessed the URL and found the report before it was supposed to be released. Now the reporter is being sued for Hacking a URL.
(Link broken; see contemporary coverage by Slashdot, The Register, and freedom-to-tinker.com)
|Make Your URLs Hackable|
|If your website exists completely within a single directory (“www.geocities.com/hectopus”, for example), then all you need to do is name your home page “index.html”.If your site is more complex, create homepages named “index.html” in each subdirectory.In addition, each page on a particular level should link to the local “index.html” page, and also provide alink to the next level “up”. (See Navigation: An often neglected component of web authorship).|
|The Golden Rule|
|Like It or Not, Your Website Will Talk to Strangers|
You say your website isn’t that complex… you say the people who use it already know how to find what they need… you say you don’t have enough time… so why should you bother making your website accessible to strangers? I can offer several reasons.