Welcome to Vestris Inc.
Internet Interactive Solutions Company



Software Documentation

UrlList

Name

UrlList — specify the root URLs to index

Synopsis

UrlList = url1 [,url2 ] [,url3 ... ]

Description

This parameter defines the root URLs to retrieve. Alkaline will start the indexing process with each of these URLs in a sequential order. An url can be of both http://server:port/path/file?arg and file:///path/name formats. The file:// format was added in version 1.7.

Unless the Robots directive is set to No, Alkaline will attempt to retrieve the http://server-name/robots.txt file first first. Contents of the robots.txt file are stored for the entire session. No robots.txt file is retrieved for local file:// urls.

After retrieving the first page defined by the UrlList directive, Alkaline will extract all links from it and schedule them for indexing in the order they were found. A thread from a thread pool will pickup each of these urls, match them to the current asearch.cnf rules and if required, restart the indexing process for the particular url. Local directory listings retrieved following a file:// url will be treated as html content with links to each document and subdirectory.

After emptying the queue of urls, retrieved using the initial entry of the UrlList directive, Alkaline will continue with the next url in the list, if any. Otherwise, if more asearch.cnf files are specified, the spider will pursue with the next asearch.cnf configuration. Once the entire list of configurations is processed, it will mark an inactivity period of SleepRoundtrip and restart with the first configuration file.

Example

UrlList=http://www.foo.com,http://foo.bar.ch