Larry Page, one of Google's founders, described the perfect search engine as something that understands exactly what you are looking for and also exactly the supplies as a result. To meet this demand has developed a technology Google, which is based on the following three components:
The finishing touch is brought by the introduction of a ranking, as identified by the relevant documents to the user in a convenient order. The foundations for this are original ranking criteria summarized.
Crawl
Google uses so-called web crawlers (also often crawler or spider gennant) to find a website. The Google crawler called Googlebot. In general, any Web sites are not randomly accessed, but the crawler works systematically through the website links . Of a retrieved Web page hyperlinks are extracted and stored in a queue. This queue is then gradually processed. To conserve resources, however compared to previously which web pages the crawler has already retrieved.
At the present time two different crawling processes can be distinguished, the deep crawling and Fresh-crawling . Here the Deep Crawling is the process explained above, while the fresh-crawling is responsible for the timeliness of the retrieved pages. In this case, well-known sites are already re-crawled to the latest changes to recognize it.
The results of the crawling is passed to the so-called indexers, which is explained below.
Indexing
The pure collecting sites initially offer nothing but the archiving of information. However, the main purpose of search engines is searching (and finding) of documents. Since the duration of this process with an increasing number of documents also increases, a technique must be found to make this process as efficient as possible. For this reason, Google places for each crawled web page to an index consisting of the individual words of the document. The index associated with a word document and can be searched in parallel from diversified servers. This type of standard is also mentioned to as "inverted index".
The index itself is seeking requests for optimized (for example, only words are stored in lowercase and in alphabetical order). The efficient application of this method allows Google to answer queries in a fraction of a second, although theoretically recognized several billion web pages would have to be searched.
Query Processing
The query processing provides the interface to the users of the Google search engine represents a entered a search term ends quantity is processed by Google and sent to the database. The preparation includes, for example, the removal of stop words.
The request to the index database now delivers all documents that contain the search terms. This document set is also referred to as "posting list". The real power lies in this posting to list sorted so that it displays the most relevant results at the beginning. This is Google more than 200 rating factors a, on the one hand, the relevance and on the other the reputation evaluate a page. The results are what you generally summarized under the term SERP and which manifests itself in the display prepared for the searching user.
Original ranking criteria
Some of Google's ranking factors are in The Anatomy of a Large-Scale Hypertextual Web Search Engine described. These are explained below and it will be an evaluation in relation to the current relevance of these factors were made. The factors are:
This fall PageRank and anchor text in the field of OffPage optimization , while the "Other Features" on the field of onpage optimization relate.
Other features
Under "Other Features" are mentioned in the original version of Google 1998 Factors "keyword proximity" and "HTML markup". Under keyword proxmity is understood to be close to each other, the search terms within a document. The index is compared to the source code of the first search term with the other terms. HTML markup refers to the syntactic markup such as font size and color.