Description
This parameter defines the root URLs to retrieve. Alkaline will start the indexing process with each of these URLs
in a sequential order. An url can be of both http://server:port/path/file?arg
and
file:///path/name
formats. The file:// format was added in version 1.7.
Unless the Robots
directive is set to No, Alkaline will attempt to retrieve the
http://server-name/robots.txt
file first first. Contents of the robots.txt file are stored for
the entire session. No robots.txt file is retrieved for local file:// urls.
After retrieving the first page defined by the UrlList directive, Alkaline will extract all links from it and schedule
them for indexing in the order they were found. A thread from a thread pool will pickup each of these urls, match them to
the current asearch.cnf rules and if required, restart the indexing process for the particular url. Local directory listings
retrieved following a file:// url will be treated as html content with links to each document and subdirectory.
After emptying the queue of urls, retrieved using the initial entry of the UrlList
directive,
Alkaline will continue with the next url in the list, if any. Otherwise, if more asearch.cnf files are specified, the spider
will pursue with the next asearch.cnf configuration. Once the entire list of configurations is processed, it will
mark an inactivity period of SleepRoundtrip
and restart with the first configuration
file.