Home | Henry, The Mirago Robot | Robot Guidelines
 
Robot Guidelines

If you would like to prevent Mirago from indexing your site or alternatively you would like to limit robot activity to certain areas, here are some possible mechanisms:
 
Meta tags

Mirago supports the use of "noindex" and/ or "nofollow" META tags.
  • noindex will prevent Mirago from indexing anything on your page
  • nofollow will prevent Mirago from following any of the links on your page
To activate them, just include this tag in the HEAD part of the page:

<META NAME="robots" CONTENT="noindex,nofollow">

Note: The Mirago robot does not index keyword and description META tags.
 
Robots exclusion standard

Mirago supports the Standard for Robot Exclusion which specifies a format for robots.txt files. When placed in a server's root directory, this text file allows a webmaster to deny access to all robots or certain robots and specify which areas of the site (if any) robots can index. The file is checked periodically by Mirago and permissions for the site are modified accordingly. The robots.txt file must be located at the root of a site. It will not be read from a subdirectory.
 
Note: If a robots.txt file is not present, robots assume they can index the entire domain or subdomain based on the premise that you have 'published' the site on the Internet for general access. If you also operate subdomains, the robots.txt file should be present in each root directory.
 
You can indicate to well behaved robots such as Mirago that certain parts of your server should not be indexed by some or all robots.
 
The following example illustrates the possible contents of a robots.txt file:
 
# robots.txt file for http://mywebsite.co.uk/
 
User-agent: Mirago-Test-Robot (http://www.miragorobot.com)
Disallow:
 
User-agent: naughtyrobot
Disallow: /
 
User-agent: *
Disallow: /stay_out
Disallow: /devproject

 
The first line, starting with '#', specifies a comment.
 
The next two lines specifies that the Mirago robot has nothing disallowed. This means permission is granted to go anywhere on that site. This is optional, as a robot will assume it has permission to access your site if it is not excluded by any Disallow directives.
 
The next two lines indicates that the robot called 'naughtyrobot' has all relative URL's starting with '/' disallowed. As all relative URL's on a server start with '/', this means the entire site should not be accessed by the robot.
Note: Don't put more than one path on a Disallow line.
 
The third paragraph indicates that all other robots should not visit URL's starting with /stay_out or /devproject. It should be noted that the '*' is a special token meaning 'all robots' and is not a regular expression. Instead of Disallow /myproject/* just put Disallow /myproject. The * user-agent can be used before or after any specific user-agent. Specific user-agents are searched before the default * user-agent.
 
For more complex access restrictions we support the use of multiple user-agents and the Allow directive.
 
For example:
 
User-agent: robot1
Disallow: /stayout
Disallow: /devproject
 
User-agent: robot2
User-agent: robot3
User-agent: robot4
Disallow: /stayout
Allow: /devproject/beta
Disallow: /devproject
 
User-agent: robot5
Disallow: /
 
User-agent: Mirago-Test-Robot (http://www.miragorobot.com)
Disallow:

 
In this case robot2, robot3 and robot4 all behave identically. The Disallow entry after robot5 is required so that User-agent: Mirago starts a new block, otherwise Mirago would share the same block as robot5 in the same way that robot2, robot3 and robot4 share a block.
 
Where User-agent: Mirago is specified, * and '$' can be used to further control access to specific documents as follows:
 
* can be used to identify collections of entries (eg /devproject/client*.htm). Multiple *'s may be included in any line.

$ can be used to control access to a specific directory. For example, Disallow: /devproject/text$ will disable access to the file /devproject/text but will still allow access to /devproject/text.doc and /devproject/text/home.htm. The entry must exactly match for this to be effective.
 
Note: Mirago must be specified as the User-agent: for the block in which these extensions are used. Most other robots won't interpret them in this way.
 
Password-protecting parts of your site

Mirago robots use similar protocols to a browser. They have no mysterious access system, so documents which are in an authentication area protected by a password cannot be visited by Mirago.
 
Removing your site from the Mirago index

We hope that inclusion within the Mirago index brings more visitors to your site, but we will of course remove your site's entry upon request. This can be accomplished by emailing remove@mirago.com