|
|||||
| |||||
|
|
Home | Henry, The Mirago Robot | Robot Guidelines Robot Guidelines If you would like to prevent Mirago from indexing your site or alternatively you would like to limit robot activity to certain areas, here are some possible mechanisms: Meta tags Mirago supports the use of "noindex" and/ or "nofollow" META tags.
HEAD part of the page:
META tags.Robots exclusion standard Mirago supports the Standard for Robot Exclusion which specifies a format for robots.txt files. When placed in a server's root directory, this text file allows a webmaster to deny access to all robots or certain robots and specify which areas of the site (if any) robots can index. The file is checked periodically by Mirago and permissions for the site are modified accordingly. The robots.txt file must be located at the root of a site. It will not be read from a subdirectory. Note: If a robots.txt file is not present, robots assume they can index the entire domain or subdomain based on the premise that you have 'published' the site on the Internet for general access. If you also operate subdomains, the robots.txt file should be present in each root directory. You can indicate to well behaved robots such as Mirago that certain parts of your server should not be indexed by some or all robots. The following example illustrates the possible contents of a robots.txt file: # robots.txt file for http://mywebsite.co.uk/ User-agent: Mirago-Test-Robot (http://www.miragorobot.com) Disallow: User-agent: naughtyrobot Disallow: / User-agent: * Disallow: /stay_out Disallow: /devproject The first line, starting with '#', specifies a comment. The next two lines specifies that the Mirago robot has nothing disallowed. This means permission is granted to go anywhere on that site. This is optional, as a robot will assume it has permission to access your site if it is not excluded by any Disallow directives.The next two lines indicates that the robot called 'naughtyrobot' has all relative URL's starting with '/' disallowed. As all relative URL's on a server start with '/', this means the entire site should not be accessed by the robot. Note: Don't put more than one path on a Disallow line.The third paragraph indicates that all other robots should not visit URL's starting with /stay_out or /devproject. It should be noted that the '*' is a special token meaning 'all robots' and is not a regular expression. Instead of Disallow /myproject/* just put Disallow /myproject. The * user-agent can be used before or after any specific user-agent. Specific user-agents are searched before the default * user-agent.For more complex access restrictions we support the use of multiple user-agents and the Allow directive.For example: User-agent: robot1 Disallow: /stayout Disallow: /devproject User-agent: robot2 User-agent: robot3 User-agent: robot4 Disallow: /stayout Allow: /devproject/beta Disallow: /devproject User-agent: robot5 Disallow: / User-agent: Mirago-Test-Robot (http://www.miragorobot.com) Disallow: In this case robot2, robot3 and robot4 all behave identically. The Disallow entry after robot5 is required so that User-agent: Mirago starts a new block, otherwise Mirago would share the same block as robot5 in the same way that robot2, robot3 and robot4 share a block.Where User-agent: Mirago is specified, * and '$' can be used to further control access to specific documents as follows:* can be used to identify collections of entries (eg /devproject/client*.htm). Multiple *'s may be included in any line. $ can be used to control access to a specific directory. For example, Disallow: /devproject/text$ will disable access to the file /devproject/text but will still allow access to /devproject/text.doc and /devproject/text/home.htm. The entry must exactly match for this to be effective.Note: Mirago must be specified as the User-agent: for the block in which these extensions are used. Most other robots won't interpret them in this way.Password-protecting parts of your site Mirago robots use similar protocols to a browser. They have no mysterious access system, so documents which are in an authentication area protected by a password cannot be visited by Mirago. Removing your site from the Mirago index We hope that inclusion within the Mirago index brings more visitors to your site, but we will of course remove your site's entry upon request. This can be accomplished by emailing remove@mirago.com |
|
|
|
| Mirago Corporate | Terms of use | Privacy | Contact Us |
©2006 Mirago. All Rights Reserved. |