| The grandfather
of all search engines was called Archie and was created in 1990, by Alan
Emtage, a student at McGill University in Montreal. At this early date there
was no World Wide Web. however there was an internet and many files were
scattered over the vast network.
The primary method of storing and retrieving files was using FTP (File Transfer Protocol). This was (and still is) a system that specified a common way for computers to exchange files over the Internet. This is how it works: Somebody decides that he wants to make files available from his computer. He sets up a program on his computer, called an FTP server. When someone on the Internet wants to retrieve a file from this computer, he or she connects to it via another program called an FTP client. Any FTP client program can connect with any FTP server program as long as the client and server programs both fully follow the specifications set forth in the FTP protocol.
Initially, anyone who wanted to share a file had to set up an FTP server in order to make the file available to others. Later, "anonymous" FTP sites became stores for files, allowing all users to post and retrieve them.
Even with archive sites, many important files were still scattered on small FTP servers. Archie changed all that. It combined a script-based data gatherer, which gathered site listings of anonymous FTP files, with a regular expression matcher for retrieving file names matching a user query. In other words, Archie's gatherer searched FTP sites across the Internet and indexed all of the files it found. Its regular expression matcher provided users with access to its database.
Archies’ popularity grew so much that in 1993, the University of Nevada System Computing Services developed another searching device similar to Archie, but which could search for single documents as well as files.
The term robot has special significance to programmers. Computer robots are programs that automatically perform repetitive tasks at speeds that could not be matched by humans. For the most part it refers to programs that that explore the Internet for some sort of information. Web robots search the Internet for web pages usually for the purpose of creating a large searchable database. This sort of robot or bot is often called a spider.
Matthew Grays’ World Wide Web Wanderer was the first robot on the web and was designed to track the webs growth. Originally it counted only web servers but eventually it began to capture URLs too.
In response to the Wander, Martijn Koster created ALIWEB in October 1993. It was the HTTP equivalent of Archie. ALIWEB does not have a web-searching robot. Instead, webmasters of participating sites post their own index information for each page they want listed. The advantage to this method is that users get to describe their own site, and a robot doesn't run about eating up Net bandwidth. ALIWEB is still used today but does not have the mass appeal of the likes of Yahoo or Lycos.
As the web grew, it became more and more difficult to sort through all of the new web pages added each day. Matthew Gray’s Wanderer inspired a number of programmers to follow up on the idea of web robots, or spiders, as they are now called. These programs systematically scour the web for pages by exploring all of the links on a starter site, which is a page that contains many links to other pages. The concept was that by definition, every page on the web must be linked to another page. By searching through a large number of pages and following all of the links, a user will discover new pages that have their own collection of links. The hope is that most of the web can be explored through the continuous repetition of this process.
This process caused a great deal of controversy because some poorly written spiders were creating huge loads on the network by repeatedly accessing the same series of pages. By December 1993 there were three search engines powered by robots that had made their debut: JumpStation, the World Wide Web Worm, and the Repository-Based Software Engineering (RBSE) spider.
JumpStation’s web bot gathered information about the title and header from Web pages and used a very simple search and retrieval system for its web interface. The system searched a database linearly, matching keywords as it went. Needless to say, as the web grew larger, JumpStation became slower and slower, finally grinding to a halt.
The WWW Worm indexed only the titles and URLs of the pages it visited. It used regular expressions to search the index. Results from JumpStation and the Worm came out in the order that the search found them, meaning that the order of the results was completely irrelevant. The RSBE spider was the first to improve on this process by implementing a ranking system based on relevance to the keyword string.
The today popular search engine, Excite (www.excite.com) has roots which extend back to February 1993, when it was started by six Stanford undergraduates. Their idea was to use an analysis of word relationships in order to provide more efficient searches through the large amount of information on the Internet. By mid-1993 it was a fully funded project and they released a version of their own software for webmasters to use on their own websites, now called Excite for Web Servers.
In April 1994, two Stanford University Ph.D candidates, David Filo and Gerry Yang created something that has continued to rise in popularity since then. They called their collection of pages Yahoo!.
As the number of links grew and their pages began to receive thousands of hits a day, the team created ways to better organize the data. In order to aid in data retrieval, Yahoo! (www.yahoo.com) became a searchable directory. The search feature was a simple database search engine. Because Yahoo! entries were entered and categorized manually, Yahoo! was not really classified as a search engine. Instead, it was generally considered to be a searchable directory. Yahoo! has since automated some aspects of the gathering and classification process, blurring the distinction between engine and directory.
The Wanderer captured only URLs, which made it difficult to find things that weren’t explicitly described by their URL. Because URLs are rather cryptic to begin with, this didn’t help the average user. Searching Yahoo! was much more effective because they contained additional descriptive information about the indexed sites.
Lycos was the next major development, having been design at Carnegie Mellon University around July of 1994. Michale Mauldin was responsible for this search engine and remains to be the chief scientist at Lycos Inc.
On July 20, 1994, Lycos went public with a catalog of 54,000 documents. In addition to providing ranked relevance retrieval, Lycos provided prefix matching and word proximity bonuses. But Lycos' main difference was the sheer size of its catalog: by August 1994, Lycos had identified 394,000 documents; by January 1995, the catalog had reached 1.5 million documents; and by November 1996, Lycos had indexed over 60 million documents -- more than any other Web search engine. In October 1994, Lycos ranked first on Netscape's list of search engines by finding the most hits on the word ‘surf.’.
Representatives of Infoseek, another major search engine, say that they founded their corporation in January 1994. Although this may be true, the search engine itself was not accessible until much later that year.
Initially, Infoseek was just another search engine. It borrowed conceptually from Yahoo! and Lycos, not really innovating in any particular way. Yet the history of Infoseek and its current critical acclaim show that being the first or most original isn’t always that important. Infoseek’s user-friendly interface and the numerous additional services (such as UPS tracking, News, a directory, and the like) have garnered kudos, but it was Infoseek’s strategic deal with Netscape in December 1995 that brought it to the forefront of the search engine line. Infoseek convinced Netscape (with the help of quite a bit of cash) to have its engine pop up as the default when people hit the Net Search button on the Netscape browser. Prior to this, Yahoo! was Netscape’s default search service.
Digital Equipment Corporation’s (DEC) AltaVista was a latecomer to the scene; it had its online debut in December 1995. Nonetheless, it had a number of innovative features that quickly catapulted it to the top. The least of the features was its speed. Run on a bunch of DEC Alphas, it had the horsepower to handle millions of hits per day without slowing down in the slightest.
The rest of its features, all available from introduction, changed the face of search engines forever. AltaVista was the first to use natural language queries, meaning a user could type in a sentence like "What is the weather like in Tokyo?" and not get a million pages containing the word "What." Additionally, it was the first to implement advanced searching techniques, such as the use of Boolean operators (AND, OR, NOT, etc.). Furthermore, a user could search newsgroup articles and retrieve them via the web as well as specifically search for text in image names, titles, Java applets, and ActiveX objects. Additionally, AltaVista claims to be the first search engine to allow users to add to and delete their own URLs from the index, placing them online within 24 hours. One of the most interesting new features AltaVista provided was the ability to search for all of the sites that link to a particular URL. This was very useful for web designers who were trying to get some popularity for their pages; they could frequently check to see how many other pages were referencing them.
On the user interface end, AltaVista made a number of innovations. It put "tips" below the search field to help the user better formulate a search. These tips constantly change, so that after using the search for a few times, users see a number of interesting features that they possibly did not know about. This system became widely adopted by the other search engines. Latter, Google.com became the main searh engine, searhing over 3.000.000.000 web pages !!! |