Articlezeneu: Robots.txt

Robots.txt

The file robots.txt is a file that specifically at web crawler based. It bears the primary benefit to control the behavior of this crawler, what, more precisely, from the Allow and Deny to crawl certain pages (or entire directories) exists. These directives will allow or disallow use.

The rules defined can either apply to all crawlers or apply only to a particular. As a website operator can thus, for example, save traffic by the crawlers unimportant search engines locked out. It also allows, for example, home directories from inclusion in the Google index rule.

Another field of application of this file is the location of the sitemap of a domain under the sitemap is explained in detail. In the source code shown below excerpt from crawling all pages and directories to the path / admin / and / statistics / allowed. Furthermore, the site map is under http://www.example.com/sitemap.xml localized. These rules apply to all crawlers (specified by "*"). Are special rules apply to certain crawlers, these must have the User Agent are marked directive. A list of possible robot is under http://www.robotstxt.org/db.html together.

User Agent: *
Disallow: / admin /
/ statistics /
Sitemap: http://www.example.com/sitemap.xml

Articlezeneu

Robots.txt

Leave a Reply

Popular Posts

Archives