Alkaline Robots, HTML and Meta Tags

Alkaline Robot Support

Alkaline fully supports robot directives described at the WebCrawler robots pages, http://info.webcrawler.com/mak/projects/robots/robots.html . Alkaline is a registered bot with a user-agent string: AlkalineBOT/1.9 .

This includes full compliance with the /robots.txt directives, including the User-agent and Disallow restrictions.

Alkaline will not follow links if a <meta name="robots" content="nofollow"> tag is found. Alkaline will not index document contents if a <meta name="robots" content="noindex"> tag is found.

Alkaline robots support can be disabled for individual configurations by specifying Robots=N in the asearch.cnf file.

Alkaline Specific Meta Tags

Alkaline will look for specific meta tags in a document. Each meta tag is of format <meta name="alkaline" content="...">. The value of the meta tag can contain multiple elements separated by spaces and can be the following:

Table 7-3. Alkaline Specific Meta Tags
skip skip indexing of the page, it will not be referenced
skipmeta skip indexing of meta tags on the current page
skiplinks do not gather links from the currently indexed page
skiptext do not index free text on the current page

A <meta name="alkaline" content="skip"> tag will instruct Alkaline not just to avoid indexing the page, but also not to gather links from it. If you do not want the page to be indexed, but the links to be gathered, use <meta name="alkaline" content="skiptext skipmeta">.

If you with to exclude a pattern of pages from indexing but with links to be gathered, use the UrlIndex and/or the UrlSkip directives.

Example:
<meta name="alkaline" content="skipmeta skiplinks">

Alkaline Specific Html Tags

Alkaline will look for specific html tags in a document. These can include one-another. An Alkaline specific tag is always <alkaline ...> </alkaline> .

Table 7-4. Alkaline Specific Html Tags
<alkaline skip> skip the indexing of the section terminated by </alkaline>
<alkaline url="url"> add a link from the current page manually; useful for pages that generate, for example, JavaScript code that cannot be correctly interpreted by the parser; there's no need to terminate this tag with </alkaline>

Example:
<alkaline skip>John, please read this page!</alkaline> 
<alkaline url="http://www.foo.com">