Friday, October 30, 2009

Search Engine History - How Search Engines Operate

Internet is a spectrum of ideas and information. Searching the net for the right webpage can be a cumbersome task. Search engines explore the Internet on the basis of the keywords or a combination of words that are present in different websites. These search engines maintain an index of words that help them easily locate a website. Search engines have become an integral part of an Internet user.

The earlier versions of search engines indexed only a few thousands of pages. With the advent in technology, the search engines have become more compatible with Web pages to such an extent that they can list any site by using different permutation and combination of words. It can even respond to millions of queries in a day.
How does a Search Engine Work?

Internet search engines search the databases by means of a list of words or a combination of words they have stored/cached over a period of time. The search engines have a computer program called spider that indexes the list of words found in different websites. This program further travels through the links connected with different sites and index another set of words.

Only those sites that are being harvested by the search engine are opened. The spider searches a copy of the site, and when the user clicks on links, the actual site opens. The spiders are programmed to omit articles that appear in a page and detect terms that appear in titles, subtitles and meta tags.
Timeline of Search Engine Evolution

The first search engine, called Archie, was developed by a University student named Alan Emtage. He created the program to search file names from the Internet. In 1993, another program called Aliweb was launched. It was a manually constructed web directory that had several limitations. In the same year, JumpStation was developed by using spider technology. It allowed the users to search keywords in the titles and headers of Web pages.

As the numbers of Web pages increased, the search also became slow and this search engine stopped functioning in 1994. JumpStation, however, can be considered as the first modern search engine. In the same year, WebCrawler was launched. It had better features than JumpStation and searched the entire contents of a Webpage. This search engine was later sold to Excite.

The next one was Lycos that added a few more features to the conventional search engine, such as proximity to matching and listing in the order of relevancy. In 1996, the first meta search engine MetaCrawler was developed. This tool was capable of searching on other search engines and compiling the results. In the following year,

David Filo and JerryYang, two Stanford University students, launched the famous Yahoo. Around this time, Altavista was gaining popularity. It used the spider technology and indexed and responded to millions of pages per day.

Towards the end of 1997, another search engine named Google was launched. The entry of Google marked a major milestone in the history of search engines. Google uses a “page ranking” system on the basis of number of links to a particular site.

At present, Google is a very popular search engine with a huge list of websites in their index. It is so popular that it lead to origin of the term“Googling,” which means to search for information using Google. Bing is a search engine developed by Microsoft, which categorizes searches allowing for improved image and video searches along with preview searches.
Some Popular Search Engines

Following are some of the popular search engines: Baidu (Chinese), Bing (formerly MSN Search and Live Search), Cuil, Duck Duck Go, Sogou (Chinese), Sohu (Chinese), Yandex (Russian),, and
Future of Search Engines

At present, most of the search engines work on tha basis of the exact matches of the keywords entered for search. This can be confusing as a single word can have different meanings. In future, search engines will be developed on the basis of concept-based searching and natural language queries, and this phase of evolution in search engines has been keenly awaited by users around the world.

How Search Engines Operate

Search engines have a short list of critical operations that allows them to provide relevant web results when searchers use their system to find information.

1.Crawling the Web
Search engines run automated programs, called "bots" or "spiders" that use the hyperlink structure of the web to "crawl" the pages and documents that make up the World Wide Web. Estimates are that of the approximately 20 billion existing pages, search engines have crawled between 8 and 10 billion.

2.Indexing Documents
Once a page has been crawled, it's contents can be "indexed" - stored in a giant database of documents that makes up a search engine's "index". This index needs to be tightly managed, so that requests which must search and sort billions of documents can be completed in fractions of a second.

3.Processing Queries
When a request for information comes into the search engine (hundreds of millions do each day), the engine retrieves from its index all the document that match the query. A match is determined if the terms or phrase is found on the page in the manner specified by the user. For example, a search for car and driver magazine at Google returns 8.25 million results, but a search for the same phrase in quotes ("car and driver magazine") returns only 166 thousand results. In the first system, commonly called "Findall" mode, Google returned all documents which had the terms "car" "driver" and "magazine" (they ignore the term "and" because it's not useful to narrowing the results), while in the second search, only those pages with the exact phrase "car and driver magazine" were returned. Other advanced operators (Google has a list of 11) can change which results a search engine will consider a match for a given query.

4.Ranking ResultsOnce
the search engine has determined which results are a match for the query, the engine's algorithm (a mathematical equation commonly used for sorting) runs calculations on each of the results to determine which is most relevant to the given query. They sort these on the results pages in order from most relevant to least so that users can make a choice about which to select. 

