no theoretical limits in amount of indexed documents or sites
fully remote indexing, not just local machine or local area network
remote URL(s) defined as a base of indexing
indexing of local file system
local directories defined as a base of indexing
true spider, follows links on web pages, A HREFs, MAPs, FRAMEs, META REFRESH, etc.
deleted pages are automatically removed and newly created pages instantly added
grouping of multiple sites with individual options and parameters inside a same search group
automatic support for redirected URLs, relative Location: headers, detection of circular deep redirections
multiple indexing bases for the same index/search database
highly configurable index/search paths, exclusion lists, index categories and file extensions
capable of using regular expressions to define which urls to follow and what documents to index
setting file amount, recursion and remote limits on demand
automatic indexing of newer files only, using if-Modified-Since
intelligent HTML parsing, link and text retrieval, supporting &...; style tags, simple error recovery
single indexing engine for multiple search/index groups
foreground dedicated indexing for first-time setup or fast reindexing
multithreaded architecture with background continuous indexing
textual cleanup, supporting accentuated characters (searching French text with or without accents for example)
META tag support for KEYWORDS and DESCRIPTION, TITLE tag support for title
discarding of script, style and object code
full support for robots.txt and META ROBOTS directives, disabled on demand
filters for indexing other formats than HTML and plain text (such as Adobe PDF)
using external third party command line tools as filters through a documented interface
embedded objects retrieval support for indexing other formats such as Shockwave Flash using the filter interface
page preprocessing available through a published API before real indexing, using a filter
Md5 document signature that identifies and ignores symbolic links and duplicate documents (such as http://www.foo.com and http://www.foo.com/index.html)
persistent remote document retrieval, fully configurable in number of retries, etc.
supports retrieval of secured pages on password protected sites
(HTTP/1.0 BASIC authentication, NTLM support for Windows NT, no support for SSL)
Alkaline-specific META tags to avoid indexing of individual pages,
following links, excluding text portions, indexing META data or indexing parts of a document
using the Alkaline memory mapped files swap to minimize memory usage
using the Alkaline flat interval technology to stabilize the memory usage curve
external lists of words to be excluded from indexing, rules for page inclusion, stop words,
including regular expressions to define exclusions, etc.
statistics on requests and traffic
capable of adding/removing/reindexing URLs submitted online
native server-side includes (SSI)
full support for client non-Javascript Cookies
fully parallel multithread configurable retrieval, concurrent indexing
ability to run as a native Windows NT/2000 service