Previous Article WWWiz Home Next Article


The Little Engine That Could?

The Inside Scoop on Search Engines

by Catherine Deely

If you're reading this issue of WWWiz, you know the routine. You wake up and throw back your morning coffee, intent on starting a no-nonsense, fully productive day. As soon as the breakfast dishes are cleared, however, you furtively glance around, engage in some last-minute rationalization -- and jump on the Net. You promise yourself you'll be logged on only briefly -- a few minutes, max -- just to check your email, the weather, and maybe how your stocks are doing. But that, you swear vehemently, is it.

So how does it come to be that three hours later, you're utterly immersed in a zany, bizarre, utterly absorbing site…maybe the latest horoscope for your parakeet, or perhaps the Tiki Bar Review Page? You alone know why you arrived here, but that leaves one question unanswered: how did you manage to navigate your way into such an "Alice in Wonderland"-esque portion of your friendly neighborhood Net?

The answer, of course, comes in the form of that beloved pilot of cyberspace: the search engine. We use search engines every day, in the boardroom and the living room, and yet when we pause to reflect on the matter, there's a good chance that we really don't know all that much about this most prominent feature of the Net.

Most people who consider themselves relatively Web-savvy are acquainted with the loose definition of a search engine: a software mechanism which searches a certain index (that is, the Net itself) after being prompted by a user-entered keyword or phrase, and returns all of the matches it can locate.

The matches returned by an engine search usually range from the most accurate and scientific responses to the decidedly irrelevant, and, occasionally, irreverent as well, hence the unfortunate tendency of vastly unrelated pornographic sites to pop up on even the most innocuous search. But that's another story for later.

Search engines work with the assistance of a smaller component known as the robot, which also bears an assortment of other eclectic nicknames such as "spider" and "crawler." No, nothing quite so intriguing as R2D2 or a tarantula traversing the Web, but robots are rather intriguing concepts in their own right. Simply defined, a robot is a program that automatically penetrates the massive structure of the Web. It does so by retrieving a document which matches the input entered into it, then doing its own "loop" to find all of the other documents which are connected through some similarity to the first one. The creators of the vastly informative Web Robots Pages emphasize that names such as "crawler" misrepresent the diplomatic nature of robots: "These names are a bit misleading because they give the impression the software itself moves between sites like a virus; this is not the case. A robot simply visits sites by requesting documents from them."

Picture this example to better visualize the search engine/robot relationship: the robot is the hardworking, nose-to-the-grind "gofer" to the search engine's Big Corporate Boss. (Imagine a scene out of Dilbert). The Boss (search engine) gets a request from the Customer (you, the Net user) for sites related to a certain topic, such as "magazines on the Web." The search engine Boss passes its information request on to the company Gofer (the robot), and sends it on its way to retrieve all it can for you. The robot will then post an index of documents for you to peruse. Like any worker, the robot has its limits. It's quite likely that, while many of the sites you are directed to will be close or exact matches to what you're looking for, you'll also get back some links pertaining to the Costa Rican rain forest or something else completely unrelated.

Why does this happen? If a robot isn't human, why does it make "human" mistakes? The reasons may vary. A misspelled word within a site may be used as a keyword by the robot, which simply doesn't have the capacity to differentiate. Another common occurrence: directories, or search engines dependent upon human listings -- Yahoo! is a very popular example -- ask potential Web site posters to list the search keywords most applicable to their site before adding that site's URL to their directory. Most Webmasters carefully check submitted site URLs to ensure that their Web pages contain exactly what they're supposed to contain, but cunning troublemakers do manage slip through. Another problem occurs when site URLs suggest one type of site, but are actually devoted to another sort altogether. A prime example which made headlines recently was that of the mix-up between the actual White House Web site and a cleverly orchestrated site which featured the deceptive URL Visitors to the second site soon found, however, that there were no friendly photos of Buddy the First Dog or historical trivia to be shared; instead, the site was devoted to decidedly adults-only material. (After an angry response from the real White House, whitehouse.orgwas forced to limit access to its site to viewers age 18 or older, and was required to post clear labels identifying the nature of its content.) If this is confusing for human beings, it's utterly bewildering to the robot, which can interpret only at the most literal level.

Now that you understand a little about how search engines work, you may be wondering how to sort through the huge array of engines on the Net to find the best one for you. Search engines can be divided into several categories. First, there are the "biggies," the major engines used most frequently by individual Web surfers and companies alike. In addition to Yahoo!, popular engines include Lycos, which offers online guides and filters for better searching; Excite , a global conglomerate of Internet-focused services; AltaVista, which seeks out newsgroups as well as individual sites; and the well-received upstart newcomer, HotBot, whose colorful commercials well suit its flashy, high-tech style.

You don't have to do all your searches through the biggies, however; there are specialty search engines on the Net for just about every imaginable topic. You can search for humor, music, or sites emanating only from the Seattle area. That's not all; search engines have the potential to tap into the most obscure, and specific, portals of information in cyberspace. There's Aqueous, a search engine only dealing with, as it professes, "sites with water-related content"; Petseek, which turns up online animal adoption and rescue sites throughout the U.S.; The James Kirk Search Engine, for Star Trek devotees (you knew there had to be at least one!); Ahoy!, which locates personal home pages; and Scour.Net, tracking sites that offer multimedia, such as graphics, sound files, and streaming video. It's all out there!

For more information on search engines, visit The Search Engine Watch , which offers everything from up-to-the-minute industry news to fun facts and little-known trivia on the subject.

So there you have it -- the surface scoop on the search engine phenomenon. Chances are, if you've found a site you particularly enjoy, but figure is all on its lonesome in cyberspace, a quick trip via a search engine will uncover an entire world of other perfect pages for your visiting enjoyment.

But first, remember -- you still have your productive day to begin!

Catherine Deely is currently completing her junior year at Boston College, where she is a Communication major specializing in WWW and Digital Media. In addition to being a certified Net addict, she holds high hopes of finding "THE dream job" -- combining writing, media, and research for an online publication. Catherine can be reached at


Copyright (C) 1998 WWWiz Corporation - All Rights Reserved
Phone: 714.848.9600 FAX: 714.375.2493
WWWiz Web site developed and maintained by
GRAFX Digital Studio

Previous Article Next Article
WWWiz Home