Welcome to Vestris Inc.
Internet Interactive Solutions Company



Software Documentation

Alkaline Features

Alkaline can be viewed as two distinct pieces: the indexer or spider and the search engine. A non-exclusive and constantly growing capabilities include the lists below.

Indexing

  • no theoretical limits in amount of indexed documents or sites

  • fully remote indexing, not just local machine or local area network

  • remote URL(s) defined as a base of indexing

  • indexing of local file system

  • local directories defined as a base of indexing

  • true spider, follows links on web pages, A HREFs, MAPs, FRAMEs, META REFRESH, etc.

  • deleted pages are automatically removed and newly created pages instantly added

  • grouping of multiple sites with individual options and parameters inside a same search group

  • automatic support for redirected URLs, relative Location: headers, detection of circular deep redirections

  • multiple indexing bases for the same index/search database

  • highly configurable index/search paths, exclusion lists, index categories and file extensions

  • capable of using regular expressions to define which urls to follow and what documents to index

  • setting file amount, recursion and remote limits on demand

  • automatic indexing of newer files only, using if-Modified-Since

  • intelligent HTML parsing, link and text retrieval, supporting &...; style tags, simple error recovery

  • single indexing engine for multiple search/index groups

  • foreground dedicated indexing for first-time setup or fast reindexing

  • multithreaded architecture with background continuous indexing

  • textual cleanup, supporting accentuated characters (searching French text with or without accents for example)

  • META tag support for KEYWORDS and DESCRIPTION, TITLE tag support for title

  • discarding of script, style and object code

  • full support for robots.txt and META ROBOTS directives, disabled on demand

  • filters for indexing other formats than HTML and plain text (such as Adobe PDF)

  • using external third party command line tools as filters through a documented interface

  • embedded objects retrieval support for indexing other formats such as Shockwave Flash using the filter interface

  • page preprocessing available through a published API before real indexing, using a filter

  • Md5 document signature that identifies and ignores symbolic links and duplicate documents (such as http://www.foo.com and http://www.foo.com/index.html)

  • persistent remote document retrieval, fully configurable in number of retries, etc.

  • supports retrieval of secured pages on password protected sites (HTTP/1.0 BASIC authentication, NTLM support for Windows NT, no support for SSL)

  • Alkaline-specific META tags to avoid indexing of individual pages, following links, excluding text portions, indexing META data or indexing parts of a document

  • using the Alkaline memory mapped files swap to minimize memory usage

  • using the Alkaline flat interval technology to stabilize the memory usage curve

  • external lists of words to be excluded from indexing, rules for page inclusion, stop words, including regular expressions to define exclusions, etc.

  • statistics on requests and traffic

  • capable of adding/removing/reindexing URLs submitted online

  • native server-side includes (SSI)

  • full support for client non-Javascript Cookies

  • fully parallel multithread configurable retrieval, concurrent indexing

  • ability to run as a native Windows NT/2000 service

Searching

  • searching remote sites

  • searching any search group with a single search/index server

  • searching local file system

  • searching of word sub-strings and heuristics, not just full keywords

  • fully configurable output (virtually any HTML layout), using user-defined templates, with the MV4 expressions mechanism for each separate search group

  • multiple page results, with any amount of results per page for each separate search group

  • full web server pool architecture for immediate response at search

  • denial of service, server flood protection and automatic fall-off, automatic restart on resource starvation

  • searching of accentuated and non-accentuated text, full support for automatic translation of accents (é, à, etc.)

  • searching in META tags

  • output of META DESCRIPTION and page TITLE if available

  • searching in ALT image and applet tags

  • no searching in scripts

  • automatic selection of case-sensitive/case-insensitive search

  • automatic selection of heuristics/exact search for quoted sequences

  • boolean search using + and - signs

  • scope restriction to host, path, url and file extension

  • results sorting by date (ascending and descending), size (ascending and descending), title and url

  • results grouping by domain name

  • results re-sorting and re-grouping by any of the above criteria

  • four level expiring cache

  • user-selection of maximum amount of results

  • numeric tags, combinations such as price=345 searchable as price<34 , price=34 or price>345

  • ranking weight options for titles, meta tags and document body

  • weak words

  • support for GET and POST methods

  • wap/wml 1.1 wireless devices support

Online Administration

  • BASIC authentication restricted administration section with various access level username/password pairs

  • fully customizable administration section, using JavaScript and XML

  • extended possibilities for resellers for co-branding

  • extensive search statistics and performance counters

  • browsing of configurations and their individual parameters

  • search 4-level cache statistics per configuration

  • certification embedded in the admin section

  • restart the server from the admin section

  • refresh templates from the admin section

  • add, reindex and remove individual urls from the admin section

  • produce MRTG-compliant statistics through XML queries and plot search/load averages using MRTG