How
Search Engines Work
Search engine functions by the co-coordination of a set of programmes which includes
- Web crawling
- Indexing
- Searching
Web crawling
The function of a Web search engine is to store information about a large number of web pages, which they retrieve from the WWW itself. These pages retrieved by a web crawler (sometimes also known as a spider) are an automated web browser that follows every link it sees, contemplating exclusions that can made by the use of robots.txt.
Indexing
Then follows the analysis of the contents of each page to determine
how it has to be indexed; like, the words that are to be extracted
from the titles, headings, or special fields called Meta tags. Such
data about web pages is then stored in an index database for use
in later queries. Search engines, for instance Google,
store all or part of the source page (referred to as a cache) as
well as information about the web pages, whereas some search engine
like AltaVista stores every word of every page it finds.
Since it is the page that was actually indexed, the cached page
always holds the actual search text, which can be very useful at
times when the content of the current page has been updated and
the search terms are no longer in it. While this problem may be
a mild form of linkrot, Google's handling of it increases its usability
by satisfying user expectations so that the search terms will be
on the returned web page. Since the user on the web normally expects
the search terms to be on the returned pages, the principle of least
astonishment is satisfied. Advanced search relevance makes these
cached pages more useful, even beyond the fact that may contain
data, which might not be available elsewhere.

High-level architecture of a standard web crawler
Searching
A user while using a search engine starts making a query, typically by giving key words. The Search engine then spiders the index and provides a listing of best-matching web pages according to its criteria, generally with a short summary containing the document's title and in some cases as parts of the text. While most of the search engines support the use of the Boolean terms like AND, OR and NOT to further specify the search query, an advanced feature of search engine is proximity search, which permits you to define the distance between keywords.
While the functional benefit of a search engine depends on the relevance of the result set it gives back, most search engines employ methods to rank the results to provide the "best" results first. There may be millions of Web pages that include a particular word or phrase and some pages, which may be more relevant, popular, or authoritative than others. However, the parameters set by a search engine to decide which pages are the best matches, and the order in which the results has to be shown, varies widely from one engine to another. With time has also changed the methods of searching - with changes in Internet usage and evolution of new techniques.
Most web search engines are commercial ventures supported by advertising revenue as a result; some employ the controversial practice of allowing advertisers to pay money to have their listings ranked higher in search results.
|