Hack the URL!
You can start by simply lopping off the end of a long address, in order to see whether the author has provided a table of contents page for a particular collection of web pages.
If you find a dead end at the URL “www.mysite.com/features/1999/may/juggling.htm”, you could chop off the end of the URL, in order to see what the website has posted at “www.mysite.com/features/”. If that directory has a file named “index.html,” the server should display it.
(There’s nothing illegal or even very technical about what I mean by hacking a URL — but geeks will enjoy Jorn Barger’s “Hacking URLs for Fun and Profit.”)
If you can make sense of a particular website’s organizational pattern, you can bypass the site’s navigation altogether by trying to predict the address of a web page you would like to see. If you liked the May, 1999 article on juggling, you could “hack” the URL to check what kind of articles might be posted at “www.mysite.com/features/[current year]/[current month]/”.
In this section, I demonstrate how an Internet user might hack a URL in order to use (rather than simply read) a page I happened to find deep within a university web site. The page I found lacked some of the important navigation features that turn a static text document into a useful, living hypertext.
URL-hacking in Action
Sometimes URL-hacking is a simply quick way for impatient power-users to jump around within a website. At other times, regular users who stumble upon internal web pages with incomplete navigation systems will need to hack a URL in order to get anywhere at all (in order to determine whether a particular web page is worth citing in a research paper, or to figure out whom to contact for more information).
Let’s imagine you have found the following “hit” on a search engine:
Perhaps you are intrigued by the title of the first hit (“UW-Eau Claire Summer Times”), and you want to learn more about the publication.
Target URL: http://www.uwec.edu/Admin/NewsBureau/SummerTimes/STpast/Summer99/07-26-99/regents.html
Hack off the end of the URL in order to climb up the directory tree, looking for a table of contents or general information page.
Original URL:
http://www.uwec.edu/Admin/NewsBureau/SummerTimes/STpast/Summer99/07-26-99/regents.html
Hacked URL:
http://www.uwec.edu/Admin/NewsBureau/SummerTimes/STpast/Summer99/07-26-99/
Well, sort of. Remember, we had to hack the URL to get to this information. The site designers had not accounted for the fact that some people might find their way directly onto an internal web page, bypassing the table of contents. (Another problem with this site: it uses frames.)
Now that we have found this issue’s table of contents, what if we want to look at other issues of the same publication? There is no link to “next” or “previous” issues, and there is no link “up” to a list of all the issues that were published that summer. Because the designer of this site did not expect that a user would ever wander onto an internal page like this, we are pretty much stuck again.
Let’s hack the URL once more, and see what we find.
Hacked URL: http://www.uwec.edu/Admin/NewsBureau/SummerTimes/STpast/Summer99/
www.uwec.edu – /Admin/NewsBureau/SummerTimes/ STpast/Summer99/ [To Parent Directory]
1/2/00 2:54 PM <dir> 06-14-99
1/2/00 2:54 PM <dir> 06-21-99
1/2/00 2:54 PM <dir> 06-28-99
1/2/00 2:54 PM <dir> 07-05-99
1/2/00 2:54 PM <dir> 07-12-99
1/2/00 2:54 PM <dir> 07-19-99
1/2/00 2:54 PM <dir> 07-26-99
1/2/00 2:54 PM <dir> 08-02-99 |
In this case, the web developer has not written a special table of contents page for this directory. When the web server found no file named “index.html”, it generated a default index (by listing all the files that are in this directory).
In this case, all the files are usefully-named subdirectories. From the page displayed to the left, a user would probably have no trouble locating all the other issues of the 1999 “Summer Times”.
Still, many users who are unfamiliar with directory listings would find this page confusing — the color scheme looks nothing like the previous pages, and the title at the top of the page looks like a mistake.
Further, if we want to find a mission statement or other general information page, we won’t find it here.
If we want to find out more about the publication “Summer Times” we shall have to hack the URL again.
Hacked URL: http://www.uwec.edu/Admin/NewsBureau/SummerTimes/
Uh-oh! We get a navigation menu on the left, but now there’s an error message on the right: “The page cannot be found.”‘
Most web surfers would probably give up by this point, but let’s keep going, just to see how confusing it can be if webmasters don’t use “index.html” on a regular basis. Remember, we are simply looking for basic information about theSummer Times periodical.
Keep hacking:
Chopped URL: http://www.uwec.edu/Admin/NewsBureau/
www.uwec.edu – /Admin/NewsBureau/ [To Parent Directory]
4/27/00 1:53 PM 5407 Backup of nbureau.wbk
4/24/00 9:31 AM <dir> bulletin
2/25/00 5:14 PM <dir> calendars
8/11/99 9:45 AM 5516 CopyOf_nbureau.html
1/2/00 2:52 PM <dir> experts
4/20/00 2:34 PM 1816 guide.html
5/12/00 12:54 PM <dir> images
4/20/00 9:22 AM 33280 NBstyle.doc
4/27/00 2:19 PM 5595 nbureau.html
2/3/00 12:07 PM <dir> news_events
5/3/98 1:27 PM 2471 newsperi.html
1/2/00 2:52 PM <dir> profile
5/2/00 1:16 PM <dir> release
5/3/98 1:27 PM 1757 release.html
5/12/00 12:55 PM <dir> staff
1/2/00 2:54 PM <dir> SummerTimes
2/23/00 12:38 PM 450 test.htm
2/14/00 12:05 PM 472 test1.htm
2/14/00 12:06 PM 471 test2.htm
2/17/00 5:05 PM 472 test3.htm
2/21/00 10:11 AM 450 test4.htm
5/11/00 12:00 PM <dir> View |
Where to find information about the Summer Times?
In this case, we get another automatically-generated directory list, but look at the number of files in this directory! Nobody is going to want to click all of these files randomly, in the hopes of discovering where the table of contents is. This page displays links to backup copies of files, and a series of five “test” files. There they are online, even though they are out of date. Search engines may be able to find them.
Aha! There’s a directory named “SummerTimes”! That might be where we would expect to find an introduction page of some sort. But if we click on it, we end up with the very same error message that we got in the previous step.
That’s not helpful at all!
By the way, the news bureau home page is actually named “nbureau.html“. It is a perfectly acceptable “portal” style home page, designed to speed visitors on their way to the various subsections of the website. From that page, it is easy to find the Summer Times home page, which happens to be named “STopen.html“. Nevertheless, as this exercise demonstrates, it is not easy to find that home page unless you know where to look.
If we chop the URL yet again, we get something even less useful on www.uwec.edu/Admin — a much, much longer directory listing.
Navigation: An often neglected component of web authorship
Sometimes lawyers contact me about a case featuring URL hacking (or, as one such lawyer called it, “URL typing”). I haven’t yet been interested enough in a case to offer to do any writing or testifying for free. But I’ll summarize my position here.
If instead of submitting a request that follows a clear pattern, I instead bombarded the clerk with hundreds of random requests, hoping to come up with something unexpected, that’s a very different matter. Because it’s common practice to type URLs to facilitate navigation on a website, typing out a URL isn’t devious, or illegal. It’s simply a way to get to a page that deductive reasoning suggests ought to exist.
Since some web pages are dynamically generated from URLs that include complex parameters, there is not a clear line between what counts as simply typing the URL and manipulating complex parameters in a deliberate attempt to alter the way the site’s designers expected the site to behave.
Of course, manipulating a system may be against the terms of an end-user license, student handbook, employment contract.
Just because a company’s website permits a hack does not automatically excuse all the actions carried out by the hacker. Most hackers are simply curious, seeking a faster, more powerful way to do something that seems slowed down by an unnecessarily tedious newbie-friendly process. URL hacking won’t help a user bypass a simple .htaccess password, and it won’t let user see sensitive material unless the webmaster has already placed that material on the website.
by Dennis G. Jerz
18 May 2000 — first posted
23 May 2000 — “Digression” box added
06 Aug 2001 — last modified
13 Feb 2007 — “URL Hacking and Ethics” section added
18 Jun 2013 — minor edits; “Digression” box removed
21 Sep 2021 — minor edits
29 Mar 2022 — added new graphic; redirected relevant, historical links to Internet Archive caches
In the News |
28 Oct 2002URL Hacking Leads to Web Privacy Case (Salon/Reuters) A small Swedish company posted its quarterly earnings on its website, without any password protection, but without publicizing the URL. A Reuters reporter guessed the URL and found the report before it was supposed to be released. Now the reporter is being sued for Hacking a URL. (Link broken; see contemporary coverage by Slashdot, The Register, and freedom-to-tinker.com) |
Make Your URLs Hackable |
If your website exists completely within a single directory (“www.geocities.com/hectopus”, for example), then all you need to do is name your home page “index.html”.If your site is more complex, create homepages named “index.html” in each subdirectory.In addition, each page on a particular level should link to the local “index.html” page, and also provide alink to the next level “up”. (See Navigation: An often neglected component of web authorship). |
The Golden Rule |
Like It or Not, Your Website Will Talk to Strangers You say your website isn’t that complex… you say the people who use it already know how to find what they need… you say you don’t have enough time… so why should you bother making your website accessible to strangers? I can offer several reasons.
|
View Comments
I still remember this, some websites in 2002 uprooted, and even its database cannot last restored, what happened then and I had made me confused
This a fake account
The screenshots really do show what I saw when I created this page back in the 20th century. But since then the website has changed.
Fake account