Welcome to Vestris Inc.
Internet Interactive Solutions Company


Alkaline latest release news and information.

Alkaline 1.3 - What's New

next release (currently RTM - 1.31.0909)
 (build 1.31.0909) added --enableping and removed --disableping; by default the ping thread is not launched
 (build 1.3.0904) corrected bug: incorrect $end value for last page of results
 (build 1.3.0903) corrected bug: search appeared to be locked for a few seconds (attempting to resolve remote hostname aliases on FreeBSD and other systems, which do not support reentrant hostname resolution)
 (build 1.3.0902.2) restarting Alkaline from the admin console did not re-enable swap (when run with --enableswap)
 (build 1.3.0902.1) added USFormats=Y/N (default N) option to global.cnf, which defines whether before: and after: dates should be treated as US dates (MMDDYY) rather than European dates (DDMMYY)
 (build 1.3.0902.1) corrected bug: year 00 did not translate to 2000 when specifying before: and/or after:
 (build 1.3.0831.1) corrected bug: corrupted output on the 404 verification step for auth settings
 (build 1.3.0831.1) corrected bug: urls such as http://foo:X showing as Upper from http://foo; now port values must match for the upper condition, otherwise the url is considered remote
 (build 1.3.0829) corrected bug: duplicate urls appearing in the queue resulting in an [md5-unch] message during the very first reindex
 (build 1.3.0823) corrected bug: cannot login to /admin with the alkaline-add password
 (build 1.3.0823) added binding to an alternative ip/hostname, the new command line syntax is asearch [host:]port ..., for example ./asearch foo.bar.com:9999 ...
 (build 1.3.0819) corrected bug: random failure to grant access to /admin for rare username/password pairs
 (build 1.3.0819) versionning is now using the standard numbering major.minor.build.dot; asearch.exe properties under Windows NT show the correct version number and the proper copyright information
12-Aug-2000
 added LowerCase option to asearch.cnf which forces the conversion of individual word forms to lowercase when indexing; case-sensitive search will make no sense, but the index size should be considerably smaller
 added native Windows NT service implementation; you do not need SrvAny to run Alkaline as a Windows NT/2000 service any more - added asearch.exe service install ..., asearch.exe service remove, start and stop
 added CacheTemplates option to the global.cnf file; default is Y, when set to N, templates are not cached and reloaded for each search request
 because more people worry about performance more than about memory, swap is now disabled by default; run with --enableswap to enable it
 added command line excludewords that operates an ExcludeWords dictionary on an existing index
 added SET EXPIRED-COUNT, which allows to specify the number of days after which a document is considered expired. Added SET EXPIRED which sets the value of $expired for the SET MAP section on the search template when a document is older than SET EXPIRED-COUNT days.
 added MaxLinks, specify the maximum amount of urls to follow (documents to index) from the topmost url, typically defined by UrlList or in a file pointed by UrlListFile
 corrected bug: a malformed &...; sign in the HTML causes an access violation and crashes the indexing threads (which resulted in a full crash on several platforms)
 corrected bug: searching for meta tags randomly returned incorrect results
 admin section uses BASIC authentication and not password cookies any more
 added a global.cnf file to compensate the lack of equiv/*.struct globals, which can contain Pass, LogPath and Proxy settings
 removed the ManageHeader and ManageFooter options
 totally new management section, now moved to /admin; using JavaScript menus, XML and more - you can now fully customize the admin pages
 legacy of .struct files: you can create directories with asearch.cnf files and launch Alkaline using a list of directory names
 legacy of .struct files: when searching, :port/path/filename[.aln]?search=... is equivalent to :port/?Alias+AliasHTML+search=...; the path replaces the alias and can be many levels deep, the filename is used as the template; filenames with special .aln extensions must contain an http link on a single line, which will be retrieved and used as a template
 optimized loading of large indexes, now faster and uses less memory
 corrected bug: Alkaline will not try to retrieve malformed urls nor will it retry a retrieval when an unrecoverable error, such as an unknown host name, occurred
 corrected bug: memory leaking when background indexing under Windows NT
 while indexing, acquired cookies will be cleared between configurations
 corrected bug: searching with opt:whole and opt:and always returns no results
 added output from UrlReplace and Redirect when running with -exv; regexp syntax error from these two directives will also be shown
 added UrlReplace to perform regexp and text replacements on urls before they are scheduled for indexing
 corrected bug: all expression functions involving limits and integers, such as LEFT, CLEFT, RIGHT, CRIGHT broken
 added UrlIndex and UrlSkip (UrlIndexFile and UrlSkipFile), directives to the asearch.cnf, which behave like UrlInclude and UrlExclude, except that the matching is done after the page is retrieved - if an url matches any entry in UrlSkip or if the url does not match any entry in UrlIndex, the links will still be followed, but the page contents discarded (it will not appear in search results)
 improved parallel indexing and search/index resource balancing; removed remaining global locks between the search and the indexing threads
 completely new swap, using memory mapped files; performance loss is very acceptable
 faster handling of Md5 duplicates; previous versions used to parse the entire list of urls to discover whether a duplicate exists, now using a fast character tree
 corrected bug: relative Location: headers (such as Location: foo.html, rather than Location: http://bar.com/foo.html) allowed with 301, 302, 303 and 307 Redirects and are properly resolved
06-Jul-2000
 handling denial of service: when the system runs out of socket resources and is unable to accept more connections, or when the queue is full, Alkaline will stop listening for new connections, wait for the search thread pool to clear all pending requests and start listening again; in the case when it is repeating this operation frequently and cannot recover, it will attempt to restart (or terminate if ran with -d).
 corrected bug: filter command line mapping broken
04-Jul-2000
 corrected bug: ping thread might not restart Alkaline properly and hang
 corrected bug: temporary files not deleted on interrupt
 late binding - Alkaline will not start accepting connections before indexes are loaded - this avoids the initial server flood when many users attempt to access the search engine during the startup time
 corrected bug: Content-type: text/html was not properly returned in the server response
 added -ai=X/-acceptinterval=X accept delay parameter; Alkaline will sleep for X milliseconds after each new connection is accepted
 corrected bug: millisecond sleep intervals were not made on various platforms resulting in all kinds of server problems
 added RegExp=Y option that enables UrlInclude*, UrlExclude*, ExcludeWords* and IncludeWords* option to have regular expressions as parameters
 corrected bug: when two <TITLE> tags are found, only the first one appears as the document title>
 corrected bug: indexing sites with FreeCharset=Y/&acute;-like characters was broken
 corrected bug: indexes not written when WriteIndex=-1 or at the end of indexing roundtrip during background indexing
  corrected bug: adding <meta name="alkaline" content="skip"> to an already indexed page won't remove it from an existing index.
  added $start to SEARCH-GENERAL, the index of the first result output on this page, and $end, the index of the last result output on this page; to output Displaying 21-30 out of 289 documents., the expression would be <!--SEARCH-GENERAL £total|NOT0~[Displaying $start-$end of $total documents.]-->.
  added string operators URLSCH (url scheme, such as http), URLHOST (server name), URLDIR (full path), URLFILE (file name), URLARG (parameters after the ?); for example:
<!--SET MAP--
 <ul>
  <li>prot: $url|URLSCH
  <li>host: $url|URLHOST
  <li>path: $url|URLDIR
  <li>file: $url|URLFILE
  <li>args: $url|URLARG
 </ul>
 £url|[URLDIR,IS/~foo/]~[Foo's Files]
-->
27-Jun-2000
  New Alkaline documentation. Also available in PDF.
  Fixed Alkaline for FreeBSD 4. Also changed the thread-unsafe way Alkaline creates and destroys temporary files which had potentially disasterous results in multithread indexing with filters.
  corrected bug: circular multiple redirections made Alkaline index in an inifinite loop
  Internet Service Manager under NT4 does not handle HTTP/1.0 GET correctly for binary files when specifying a server name and returns a [406/No Acceptable Object]; instead of returning an error on a 406, Alkaline will attempt an HTTP/0.9 GET.
 remove command line parameter allows to remove full subdirectories and sites:
./asearch path-to-asearch.cnf remove http://www.vestris.com/alkaline/*
 parallel 404/Not Found step; the initial verification for document existence can now take a few seconds for a 100'000 documents site with a large -mi parameter
 added -mi / --MaxIndexThreads=X parameter; the default is 5 and defines the thread pool size for indexing purposes
 parallel indexing; this greatly improves the speed of the indexing process, but it is just a first step to the right architecture - currently document retrieval is done in parallel only the index database is interlocked; reindexing an already indexed site with little changes can see performance increases by 10-20 times and a clean index can be up to 2-3 times faster
 failure to bind will result in a retry attempt loop of five seconds; this solves numerous cases when Alkaline is unable to restart properly because of a zombie thread from a previous session or simply kernel unbind latency
 corrected bug: statistics were showing the wrong requests-per-minute count when server was restarted; added a Server Pool Started line which shows when Alkaline has started accepting connections and which is used for the correct rpm count.
 added ping mechanism; Alkaline will create an additional thread demanding very little system resources, which tries to ping the engine and restarts it when Alkaline dies (UNIX versions running without -d only); also added --disableping
 considerably reduced heap contention and allocation counts; stress tests of this version show lower memory usage and considerably improved stability
 improved thread pooling mechanism for better responsiveness and stability on heavily loaded servers
 added -mt=X / --maxthreads=X options to command line, where X is the maximum amount of threads that the search pool can accomodate; default is 100
 corrected bug: client connection abort while server was accepting it was handled incorrectly and caused Alkaline to restart
 improved handling of HTML errors for missing quotes and unclosed tags; if you see a progress bar with a # character, such as [*#****], this means that your HTML is not correctly formed and that Alkaline has attempted to recover from the error
13-Apr-2000
 changed behaviour: when evaluating a regexp, such as <a href="£url|URLENCODE">, the parser would consider as an operator URLENCODE", which is correct according to the regexp rules, but is misleading; the operators can now contain alphanumeric characters only, otherwise they must be wrapped inside [] - £url|HASftp://~[Ftp!] will not work any more, but £url|[HASftp://]~[Ftp!] will
  corrected bug: pdf2text would report a broken pdf when MaxSize setting is smaller than the retrieved PDF (PDF partially retrieved) - Alkaline will not retrieve a partial document which is smaller than the MaxSize setting any more; if the server returns Content-length: bigger than MaxSize, Alkaline will skip the document and report a 206 Partial Content error
 SSI tags can contain variables from the <!--SEARCH-GENERAL--regexp--> tag - the string is preprocessed as a whole, before being passed to the SSI parser
 made certification links slightly more explicit - separated the login to the management console with the certification
 corrected (rare) bug: Alkaline hangs trying to verify the existence of a page that has already been indexed
 corrected bug: attempt of retrieval of WAIS, NTTP and other non-http protocols; also HTTPS urls will be rewritten to HTTP as many sites support both protocols and Alkaline does not support SSL
 corrected bug: infinite loop retrieving an URL with a recursive 302 redirect
  added Cookie setting to asearch.cnf
  corrected bug: URLs containing explicitly encoded elements such as &amp; were incorrectly decoded, eg. http://server/url?one=&amp;&two=2 was handled as http://server/url?one=&&two=2
  corrected bug: Alkaline META tags such as <META HTTP-EQUIV="ALKALINE" CONTENT="SKIP"> were case-sensitive and were ignored if lowercase
  added full support for persistent cookies set via a Set-Cookie: header (non JavaScript cookies) while indexing a remote site; implemented a persistent cookie storage compliant to the Netscape original cookie specification (http://www.netscape.com/newsref/std/cookie_spec.html), including expires, domain and path attributes
  corrected bug: indexing URLs containing spaces or non-ASCII characters did not encode the requested URL properly
  corrected bug: restricting scope to URLs with port numbers such as url:server:port/path did not work if both port and path were specified
  corrected doc: META tags syntax is <META NAME="tag name" CONTENT="tag value"> or <META HTTP-EQUIV="tag name" CONTENT="tag value">
  corrected bug: parsing HTML with single quoted elements did not strip the single quote from the url; links such as <a href='page.html'> could not be followed
26-Feb-2000
 added IndexWords [mime/ext] option to index specific words from every document (opposite behaviour of ExcludeWords)
 fixed end-of-line order in the HTTP headers and added an extra space in header variables after the ':' character; bugfix for WebStar 4.x web servers
 corrected bug: regexp processing of chained operators was done in the wrong order; in £variable|LEFT2,RIGHT1 the second operator was executed first
 added <!--SET SEARCH-BASE-HREF--url--> to the search template options; all links to search result pages use this url instead of being relative to the server root
 documents that produced a retrieval error without a valid status code will be requested from the server again; if the if-Modified-Since tag was used, it will be removed
 MD5 is checked for reindexed documents against previous content when the server does not reply to if-Modified-Since, such as for dynamic content; [md5-unch] will show if the document has not been altered
 corrected bug: indexing of sites using port values other than 80 stored URLs without the port value; search results pointed to the wrong url
 corrected bug: redirection for links to directories without a trailing slash
 added CLEFT and CRIGHT cut regexp operators
 words containing underscores, such as size_t are indexed as a whole
 -exv will verbose for URLs excluded because of the CGI, Exts(Add) settings and for malformed or unsupported URLs
 corrected (rare) bug: .ndx file not written propertly or zeroed

26-Nov-1999
 added reloading of template pages from the online management (equivalent to the +nocache option)
 days and months in dates are mapped to a two digit number (1 will appear as 01)
 fixed META KEYWORDS appearing in document header text in search results
 fixed growing asearch.cnf when running Alkaline with -d
 fixed $modif bogus date
 fixed $age bogus field
 fixed a broken pipe fatal error retrieving data from disconnected or inaccessible remote sites
 added NoMd5=Y option to asearch.cnf which disables the MD5 mechanism (this option is very useful if you are indexing individual URLs from different sites)
 added string operators URLENCODE and URLDECODE for mapping results on the search results templates
 added NoMetaDescription=Y which will force the plain text to be used as the document header text in the search results instead of the META DESCRIPTION tag value
 the QUANT value for the amount of search results to show can now be -1 which will show all results
 corrected bug: META DESCRIPTION tag was cleaned from punctuation in search results
30-Sep-1999
 various multithread-related issues have been solved, they lead to frequent Alkaline crashes; finished porting Alkaline to IRIX IP32, the version has been tested fully stable; released a DEC OSF1 beta and a Linux Suse 6.1 Glib fully working version
 corrected bug: <META HTTP-EQUIV="REFRESH" CONTENT="0; URL=..."> now followed (with space before URL)
 Under UNIX, running Alkaline with -d will remove standard output and run the server as a daemon (note: restarting server from the /?manage section will not be possible with this option)
 fixed opt:XXX and before/after keyword combinations returning no results
 when logging search operations, Alkaline will output the search results page number requested
 reloading of configuration files will be made only if a modification has occured
 online stats show the last processed (indexed) file
 variables of SEARCH-GENERAL are available in MAP (such as post.*)
 variables passed to Alkaline using a & or a + delimiter on the command line are available as POST variables inside the regexp (post.*)
  added <!--SET NEXT-INHERIT--regexp--> which allows to pass values to the next and previous pages of results which have been generated with the <!--SEARCH-NEXT--> tag
 the apostrophe character is considered as part of the word (searching for bob's for example will work)
16-Sep-1999
 all links to result pages are shown, by default, links to the first 10 pages are shown; if the user skips to the 11th page, links to the next 10 pages will be shown, etc.; the amount of links to show is defined by a new <!--SET NEXT-DIVISION--X--> option, where X is 10 by default
 added similar to filters, support for embedded HTML <OBJECT> sections; Alkaline can now index documents such as Shockwave Flash

<OBJECT classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000"   codebase=
   "http://active.macromedia.com/flash2/cabs/swflash.cab#version=3,0,0,0"   height="100%" id=navig width="100%">
  <PARAM NAME="movie" VALUE="navig.swf">
  <PARAM NAME="menu" VALUE="false">
</OBJECT>
 output of rating (in percent) for each result shown using $quality variable in SET MAP option
 added -exv command line option to monitor which URLs are included or excluded, which pages are indexed or skipped and for what humanly readable reason; output is for example:
[http://www.vestris.com/index-full.html] (-2) - [*****][304 Not Modified]
  {URL} http://db.infomaniak.ch/mrtg/httpaccess.html excluded
  because remote from http://www.vestris.com and Remote is set to No.
...
[http://www.vestris.com/] (-1) - [*****][1228 bytes][2079]
  [*****][inf][lnx][md5][*****][vix][*****][keys][mta][ndx]
  {EP} Implicitely included because no ExcludePages have been defined.
 corrected bug: indexing hanging with MaxFiles setting, making also online statistics became unavailable
 added ability to merge Alkaline database indexes using the merge command line directive

./asearch data/first/ merge data/second data/third
will merge the three indexes into data/first database (produce data/first = data/first + data/second + data/third)
 email and emailall command line directives will not output any copyright or information notice any more, but the URL where the email has been found will be shown; the output will look like this (elements separated by a tab):

webmaster@vestris.comhttp://www.vestris.com/index-full.html
dblock@vestris.comhttp://www.vestris.com/sti/company.html
admin@vestris.comhttp://www.vestris.com/sti/company.html

You might find it useful to reprocess this output, for example:
./asearch data emailall | awk '{print "Email:("$1") Server:("$2")"}'
 added command line emailall directive to output all email found, including duplicates
 loading of indexes and of exclusion lists is much faster
 added IncludePagesAll=Y/N to choose whether to match all or any of the words if one or more IncludePages directive is present and dictionaries contain at least one word (default is N)
 added IncludePages [mime/ext] option to index pages containing words in dictionaries only
 added ExcludePagesAll=Y/N to choose whether to match all or any of the words if one or more ExcludePages directive is present and dictionaries contain at least one word (default is N)
 added ExcludePages [mime/ext] option to index pages NOT containing words in dictionaries only
 added ManageHeader and ManageFooter equivalences to the equiv/equiv.struct file - if ManageHeader is present, the first thing to be output at the /?manage section is the value of ManageHeader instead of the Alkaline copyright notice and the UTS node information; if the ManageFooter is present, the last thing to be output is the value of ManageFooter. Here's an example:

ManageHeader,<html><body>Alkaline Management:<hr>
ManageFooter,</body></html>


To reload the equiv/equiv.struct file without restarting Alkaline, call /?manage+nocache
 added WriteIndex which defines the number of files to index before writing a database (only modified files are counted); default is 100, -1 means to write index at the end of the indexing roundtrip only
 added ExactSize directive to asearch.cnf that defines the maximum length of a word to search exact only; for example searching in the tree with ExactSize=3 will be equivalent to searching "in" "the" tree; default value is 1
 added documentation on Alkaline regular expressions and how to use them inside search result templates
 added operators REVERSE, UPCASE, LCASE, LEFT, RIGHT, MORE and LESS to the regexp, for example $url|LEFT10 will leave a maxmimum of 10 characters from the contents of the $url variable
06-Sep-1999
 corrected bug: links after a previous relative link skipped, could lead to partially indexed sites
 all variables from the POST method are defined with a post. prefix for the result templates (thus you can pass any variable to Alkaline's results mapper); for example $post.search has the same content as $search and can be mapped using the SEARCH-GENERAL tag
30-Aug-1999
 Digital OSF1 Alpha version of Alkaline
[distribution directory]
 light implementation of server-side includes (SSI) for Alkaline template pages
 corrected bug: words with dashes like coca-cola were not correctly indexed
 corrected bug: a &lt; or a &gt; indexed as "lt", "gt"
 punctuation now appears in document headers on search results
 password protected /?manage console, to login a root or a alkaline-manage password must be provided; instead of the password it is also possible to supply the server certification unlock key (for stats only, not for adding URLs, etc.) which is only available to Vestris Inc. and the server administrator; the password is stored in a cookie
 added $host, $path, $url, $other, $before, $after variables to SEARCH-GENERAL template tag; you can use this to inherit the case-sensitive check-box for example
 corrected bug: Remote=Y option without UrlInclude force parameter
 corrected bug: indexing URLs from a page containing a unique A HREF tag or an A HREF tag at the end of the page
 corrected bug: indexing URLs with parameters of form ?option=value truncated the list of options
 added SkipParseLinks=Y/N to asearch.cnf that forces to index URLs from the A HREF, FRAME, BASE HREF, etc. tags even inside an ALKALINE SKIP section
 corrected bug: indexing of character sequences such as &#183;
  added Replace Source=Target to the asearch.cnf that defines string replacement for URLs in the search results. This can be useful when indexing local intranet domains that are behind firewall. For example the fully qualified host name of my Linux server is www.vestris.com. This is behind a firewall, so it is not known to the DNS. What is being returned from the search engine is http://www.vestris.com/index.html. To return http://195.141.15.96/index.html, I would add
Replace www.vestris.com=195.141.15.96
 corrected bug: binary formats and filters under Windows NT (produced gzip errors with pdf2text for example)
06-Aug-1999
 corrected bug: alias name shown in the log file for each search operation
 corrected bug: empty file error running external filters
 corrected indexing of intolerant servers (such as MacOS WebStar Starnine) that expect a double carriage return to terminate an HTTP request
 corrected bug: <!--SET C-BEFORE--...--> option
03-Aug-1999
 added SiteDepth option to asearch.cnf which defines the maximum number of path segments that an URL to index can have relative to the server name; default is -1; SiteDepth=0 would for example index only documents in the server root path.
 made minor improvements in memory usage during background reindexing
 Alkaline will exit when the front end cannot not bind the listening socket
02-Aug-1999
 Alkaline will now create siteidx1.urt instead of siteidx1.url when saving indexes; this is a faster and a more compact file format for loading the URL trees - the .url file will be automatically read when .urt file does not exist and will be deleted after the first successful load.

Note: it takes much more time to load the previous version .url file, than in older Alkaline releases, but this operation will be done once only. The .urt file format is documented in the
01-Aug-1999
 modified the algorithms for internal URL indexing, now using a very efficient URL tree; Alkaline indexing and searching both benefit in speed from this new technique, especially with a large amount of URLs
 improved NT multithreading for atomic operations and thread passive waiting
 corrected bug: Alkaline would crash with a bogus empty POST request (HTML form with a single submit button)
 corrected bug: thread handle remained unclosed under Windows NT
 improved inter-thread protection requesting server statistics
 added CPU usage and average CPU load for Alkaline server statistics under Windows NT
 reimplemented email command line operation parameter
 reimplemented -THREAD* set of options under Windows NT
29-Jul-1999
 released Alkaline for Linux Alpha platform
 corrected bug with skip tags ignored for the ALT information attached to images
 corrected bug with opt:whole
 Windows NT version now uses non-blocking sockets, timeout settings are working like on UNIX releases; error messages for unsuccessful connections and socket operations have been revised for the NT version as well, they now comply to the WSA definitions
 added Expire option to asearch.cnf to force all pages to be treated as out-of-date during the reindex process, the if-Modified-Since option will never be set (can be forced from command line with -expire)
 added NewOnly option to asearch.cnf to reindex only those documents that are not already in the index, this allows to restart indexing from the abandoned point and to index only newly added documents to the UrlListFile(s) (can be forced from command line with -newonly)
 added -newonly option to the command line which forces to index only those files that are not already in the index, this also allows to restart indexing from the interrupted point
 added -once option to the command line which forces the sites to be reindexed once only; if you add a new Alkaline group in the equiv/equiv.struct at the runtime, the new group will be indexed once too
28-Jul-1999
 added CPU user/system usage and average CPU load in server online statistics (currently working under Sun Solaris only)
 added a documentation page on the /?manage online management section
 added two new passwords to equiv/access.struct:
  • alkaline-add will be checked when attempting to add an URL
  • alkaline-restart will be checked when attempting to restart the server
The root password is valid for all operations. The password file is reread when it is modified, so there is no need to restart Alkaline when changing passwords.
 added the online URL addition and removal to/from the index; if the page exists, it will be reindexed - all server specific operations such as server restart are now at /?manage+alkaline - this adds individual pages only and writes the index files immediately
 added remove option to the command line followed by a list of URLs to remove from the index; for example ./asearch path_to_asearch_cnf remove http://www.vestris.com http://www.vestris.com/alkaline/ will remove the two URLs from the index; note that they are still present in the siteidx?.url file but they are effectively removed from cross-references.

To make more complex operations, use Alkaline in conjunction with other Unix commands, for example to remove all ".cpp" files from the index, run cat path_to_asearch_cnf/siteidx1.url | grep ".cpp" | xargs ./asearch path_to_asearch_cnf remove
 added $quant field showing the number of results shown
 users can choose the amount of results to show, for example
<select name="quant">
<option value="10"> 10
<option value="50"> 50
</select>
This parameter overrides the <!--SET QUANT--X--> option on the template page.
23-Jul-1999
 added verification of (apparently) valid cross-references while loading indexes, invalid siteidx*.* files will not be loaded and an error signaled
 added Timeout setting to asearch.cnf, number of seconds to wait during retrieval operations while indexing documents
 fixed timeout periods connecting to inexistent or unavailable servers, now using Timeout setting and does not depend on the operating system setting any more
21-Jul-1999
 added command line option -noreindex to force the background indexing option to false
 added command line option -expire to treat all documents as out-of-date (disable if-Modified-Since)
 added request traffic statistics (output traffic size, request count and requests per minute) in /?manage+stat
20-Jul-1999
 corrected resorting when searching words with quotes
 spider remembers last authentify values for basic authentification and avoids querying the server twice
 fixed UrlListFile and similar when loading a file that does not exist
 added ExcludeWords [mime/ext] option to exclude dictionaries of words from being indexed, dictionaries are loaded when they are required only
 added a 4 kilobytes buffer when retrieving documents for faster processing (a keep-alive retrieval has been added to the TODO list)
 added a possibility to specify filters with a space after the Filter directive (for compliance with Auth and ExcludeWords), old syntax as "FilterExt" or "FilterMimeType" is preserved, but "Filter Ext" and "Filter MimeType" is advised.
 fixed bug with meta tags stored case-insensitively that lead to wrong results searching for meta data
14-Jul-1999 - 19-Jul-1999
 a Distribution Directory has been set and will contain all latest releases starting from now
 the BSDI, BSD/OS release is now available and has been tested fully stable
[BSDI, BSD/OS distribution]
 an SGI IRIX 6.5 IP32 version is available but suffers multithread problems. It should have background indexing disabled (NoReindex=Y)
an SGI guru is welcome to enlight me in some specific points
[SGI IRIX distribution]
 all versions now support $time variable in results mapping
Release Note:
  This is a complete Alkaline rewrite. Alkaline 1.3 uses a new portable base library which is no more the old MV4 CGI. It is currently released as an Alpha version but it is proving itself much more stable than all previous releases.

Bugfixes are counted by hundreds and mostly include hacks in multithreading and memory allocation, especially under Windows NT. I have used Rational Purify for Alkaline and it has lead to impressive improvements in low level code. Major memory leaks were discovered and fixed.

Older What's New Pages

ToDo List

Server Configuration and Management
 online graphical statistics and monitoring via the web, documentation and interface with MRTG or similar software
 full search and index administration via the web
 online management of asearch.cnf files, addition of new groups, etc
 native Windows NT implementation as a service with interface and administration
 Windows and XWindows interface
Indexing
 NTLM authentification protocol support (Windows NT Server user access)
 local date storage and reindex rejection of scripts not responding to If-Modified-Since
 compressed indexes (siteidx files) for optimizing space usage
 common database (ODBC, Oracle, etc.) support for storing indexes
 index selected CGI scripts such as redirectors using a more straightforward include/exclude option
 round-robin spidering policy
 restriction of remotely indexed servers using IP numbers instead of host names
 keep-alive retrieval of documents to minimize network load
 forcing the last-modified date from a meta tag
Searching
 artificial intelligence driven queries (programmer mastering AI warmly welcome), example: "What's the weather in California?"
 fuzzy search. Example: ba%nana matches banana, bananna
 stemming. Example: apply~ matches apply, applies, applied
 search for synonyms of search terms
 phonic search. Example: #smith matches smith, smythe
 natural language search
 numeric range. Example: 12~~24 matches 18
 variable term weighting. Example: apple:4 w/5 pear:1
 better meta tag support, for example sorting results by meta tags
Results Output
 dates and error messages in any language other than English (already implemented dates in French and free date format)
 output of results per word searched

Distributed PVM version

Schedule: we have a functionning PVM version that suffers several problems, thus it has not been released. There is no date for a beta PVM test version for the public for the moment, but we are testing it on a 32PCs Pentium II cluster at the University of Geneva.

Multiprocessor Parallel/Distributed Version: Alkaline 2.0

Schedule: no schedule available.
Things we plan to do:
  • Parallel version of Alkaline for clustering on a series of PCs (using TCP/IP for message passing).

  • Parallel version of Alkaline for IBM SP2 using MPI.

[ back to Alkaline doc ]

- © Vestris Inc.
1994 - 2004 - Switzerland - All Rights Reserved
Last modified: Tue Feb 06 08:52:59 Pacific Standard Time 2001