Alkaline can be viewed as two distinct pieces: the indexer or spider and the search engine. A non-exclusive and constantly growing capabilities include the lists below.
no theoretical limits in amount of indexed documents or sites
fully remote indexing, not just local machine or local area network
remote URL(s) defined as a base of indexing
indexing of local file system
local directories defined as a base of indexing
true spider, follows links on web pages, A HREFs, MAPs, FRAMEs, META REFRESH, etc.
deleted pages are automatically removed and newly created pages instantly added
grouping of multiple sites with individual options and parameters inside a same search group
automatic support for redirected URLs, relative Location: headers, detection of circular deep redirections
multiple indexing bases for the same index/search database
highly configurable index/search paths, exclusion lists, index categories and file extensions
capable of using regular expressions to define which urls to follow and what documents to index
setting file amount, recursion and remote limits on demand
automatic indexing of newer files only, using if-Modified-Since
intelligent HTML parsing, link and text retrieval, supporting &...; style tags, simple error recovery
single indexing engine for multiple search/index groups
foreground dedicated indexing for first-time setup or fast reindexing
multithreaded architecture with background continuous indexing
textual cleanup, supporting accentuated characters (searching French text with or without accents for example)
META tag support for KEYWORDS and DESCRIPTION, TITLE tag support for title
discarding of script, style and object code
full support for robots.txt and META ROBOTS directives, disabled on demand
filters for indexing other formats than HTML and plain text (such as Adobe PDF)
using external third party command line tools as filters through a documented interface
embedded objects retrieval support for indexing other formats such as Shockwave Flash using the filter interface
page preprocessing available through a published API before real indexing, using a filter
Md5 document signature that identifies and ignores symbolic links and duplicate documents (such as http://www.foo.com and http://www.foo.com/index.html)
persistent remote document retrieval, fully configurable in number of retries, etc.
supports retrieval of secured pages on password protected sites (HTTP/1.0 BASIC authentication, NTLM support for Windows NT, no support for SSL)
Alkaline-specific META tags to avoid indexing of individual pages, following links, excluding text portions, indexing META data or indexing parts of a document
using the Alkaline memory mapped files swap to minimize memory usage
using the Alkaline flat interval technology to stabilize the memory usage curve
external lists of words to be excluded from indexing, rules for page inclusion, stop words, including regular expressions to define exclusions, etc.
statistics on requests and traffic
capable of adding/removing/reindexing URLs submitted online
native server-side includes (SSI)
full support for client non-Javascript Cookies
fully parallel multithread configurable retrieval, concurrent indexing
ability to run as a native Windows NT/2000 service
searching remote sites
searching any search group with a single search/index server
searching local file system
searching of word sub-strings and heuristics, not just full keywords
fully configurable output (virtually any HTML layout), using user-defined templates, with the MV4 expressions mechanism for each separate search group
multiple page results, with any amount of results per page for each separate search group
full web server pool architecture for immediate response at search
denial of service, server flood protection and automatic fall-off, automatic restart on resource starvation
searching of accentuated and non-accentuated text, full support for automatic translation of accents (é, à, etc.)
searching in META tags
output of META DESCRIPTION and page TITLE if available
searching in ALT image and applet tags
no searching in scripts
automatic selection of case-sensitive/case-insensitive search
automatic selection of heuristics/exact search for quoted sequences
boolean search using + and - signs
scope restriction to host, path, url and file extension
results sorting by date (ascending and descending), size (ascending and descending), title and url
results grouping by domain name
results re-sorting and re-grouping by any of the above criteria
four level expiring cache
user-selection of maximum amount of results
numeric tags, combinations such as price=345 searchable as price<34 , price=34 or price>345
ranking weight options for titles, meta tags and document body
weak words
support for GET and POST methods
wap/wml 1.1 wireless devices support
BASIC authentication restricted administration section with various access level username/password pairs
fully customizable administration section, using JavaScript and XML
extended possibilities for resellers for co-branding
extensive search statistics and performance counters
browsing of configurations and their individual parameters
search 4-level cache statistics per configuration
certification embedded in the admin section
restart the server from the admin section
refresh templates from the admin section
add, reindex and remove individual urls from the admin section
produce MRTG-compliant statistics through XML queries and plot search/load averages using MRTG