|
Alkaline latest release news and information.
|
|
|
Alkaline 1.3 - What's New
| next release (currently RTM - 1.31.0909) |
| | (build 1.31.0909) added --enableping and removed --disableping; by default the ping thread is not launched |
| | (build 1.3.0904) corrected bug: incorrect $end value for last page of results |
| | (build 1.3.0903) corrected bug: search appeared to be locked for a few seconds (attempting to resolve remote hostname aliases on FreeBSD and other systems, which do not support reentrant hostname resolution) |
| | (build 1.3.0902.2) restarting Alkaline from the admin console did not re-enable swap (when run with --enableswap) |
| | (build 1.3.0902.1) added USFormats=Y/N (default N) option to global.cnf, which defines whether before: and after: dates should be treated as US dates (MMDDYY) rather than European dates (DDMMYY) |
| | (build 1.3.0902.1) corrected bug: year 00 did not translate to 2000 when specifying before: and/or after: |
| | (build 1.3.0831.1) corrected bug: corrupted output on the 404 verification step for auth settings |
| | (build 1.3.0831.1) corrected bug: urls such as http://foo:X showing as Upper from http://foo; now port values must match for the upper condition, otherwise the url is considered remote |
| | (build 1.3.0829) corrected bug: duplicate urls appearing in the queue resulting in an [md5-unch] message during the very first reindex |
| | (build 1.3.0823) corrected bug: cannot login to /admin with the alkaline-add password |
| | (build 1.3.0823) added binding to an alternative ip/hostname, the new command line syntax is
asearch [host:]port ..., for example
./asearch foo.bar.com:9999 ... |
| | (build 1.3.0819) corrected bug: random failure to grant access to /admin for rare username/password pairs |
| | (build 1.3.0819) versionning is now using the standard numbering
major.minor.build.dot; asearch.exe properties under Windows NT show the correct version number and the proper copyright information |
| 12-Aug-2000 |
| | added LowerCase option to asearch.cnf which forces the conversion of
individual word forms to lowercase when indexing; case-sensitive search will make no sense, but
the index size should be considerably smaller |
| | added native Windows NT service implementation; you do not need
SrvAny to run Alkaline as a Windows NT/2000 service any more - added
asearch.exe service install ..., asearch.exe service remove, start and stop
|
| | added CacheTemplates option to the global.cnf file;
default is Y, when set to N, templates are not cached and reloaded for each
search request |
| | because more people worry about performance more than
about memory, swap is now disabled by default; run with --enableswap
to enable it |
| | added command line excludewords that operates
an ExcludeWords dictionary on an existing index |
| | added SET EXPIRED-COUNT,
which allows to specify the number of days after which a document is considered expired.
Added SET EXPIRED which sets the value of $expired for the SET MAP section
on the search template when a document is older than SET EXPIRED-COUNT days. |
| | added MaxLinks, specify the maximum amount of urls to follow
(documents to index) from the topmost url, typically defined by UrlList or
in a file pointed by UrlListFile |
| | corrected bug: a malformed &...; sign in the HTML causes an access violation
and crashes the indexing threads (which resulted in a full crash on several platforms) |
| | corrected bug: searching for meta tags randomly
returned incorrect results |
| | admin section uses BASIC authentication
and not password cookies any more |
| | added a global.cnf file to compensate the lack of
equiv/*.struct globals, which can contain Pass, LogPath and Proxy settings |
| | removed the ManageHeader and ManageFooter options |
| | totally new management section, now moved to /admin;
using JavaScript menus, XML and more - you can now fully customize
the admin pages |
| | legacy of .struct files:
you can create directories with asearch.cnf files and launch Alkaline using a list
of directory names |
| | legacy of .struct files:
when searching, :port/path/filename[.aln]?search=... is equivalent to
:port/?Alias+AliasHTML+search=...; the path replaces the alias and can be many levels deep,
the filename is used as the template; filenames with special .aln extensions must contain an
http link on a single line, which will be retrieved and used as a template |
| | optimized loading of large indexes, now faster and uses less memory |
| | corrected bug: Alkaline will not try to retrieve malformed urls nor will it retry
a retrieval when an unrecoverable error, such as an unknown host name, occurred |
| | corrected bug: memory leaking when background indexing under Windows NT |
| | while indexing, acquired cookies will be cleared between configurations |
| | corrected bug: searching with opt:whole and opt:and always returns no results |
| | added output from UrlReplace and Redirect when running with -exv; regexp syntax error
from these two directives will also be shown |
| | added UrlReplace to perform regexp and text replacements on urls before they
are scheduled for indexing |
| | corrected bug: all expression functions involving limits and integers, such
as LEFT, CLEFT, RIGHT, CRIGHT broken |
| | added UrlIndex and UrlSkip (UrlIndexFile
and UrlSkipFile), directives to the asearch.cnf,
which behave like UrlInclude and UrlExclude, except that the matching is done after the
page is retrieved - if an url matches any entry in UrlSkip or if
the url does not match any entry in UrlIndex, the links will still be followed, but
the page contents discarded (it will not appear in search results)
|
| | improved parallel indexing and search/index resource balancing;
removed remaining global locks between the search and the indexing threads |
| | completely new swap, using memory mapped files;
performance loss is very acceptable
|
| | faster handling of Md5 duplicates; previous versions
used to parse the entire list of urls to discover whether a duplicate
exists, now using a fast character tree |
| | corrected bug: relative Location: headers (such as
Location: foo.html, rather than Location: http://bar.com/foo.html)
allowed with 301, 302, 303 and 307 Redirects and are properly resolved |
| 06-Jul-2000 |
| | handling denial of service: when the system runs out of socket resources
and is unable to accept more connections, or when the queue is full, Alkaline will stop listening
for new connections, wait for the search thread pool to clear all pending requests and start
listening again; in the case when it is repeating this operation frequently and cannot recover,
it will attempt to restart (or terminate if ran with -d).
|
| | corrected bug: filter command line mapping broken |
| 04-Jul-2000 |
| | corrected bug: ping thread might not restart Alkaline properly and hang |
| | corrected bug: temporary files not deleted on interrupt |
| | late binding - Alkaline will not start accepting connections before
indexes are loaded - this avoids the initial server flood when many users attempt
to access the search engine during the startup time |
| | corrected bug: Content-type: text/html was not properly returned
in the server response |
| | added -ai=X/-acceptinterval=X accept delay parameter;
Alkaline will sleep for X milliseconds after each new connection is accepted |
| | corrected bug: millisecond sleep intervals were not made on various platforms resulting
in all kinds of server problems |
| | added RegExp=Y option that enables UrlInclude*, UrlExclude*, ExcludeWords* and IncludeWords*
option to have regular expressions as parameters |
| | corrected bug: when two <TITLE> tags are found,
only the first one appears as the document title> |
| | corrected bug: indexing sites with FreeCharset=Y/´-like characters was broken |
| | corrected bug: indexes not written when WriteIndex=-1 or at the end of indexing roundtrip during background indexing |
| |
corrected bug: adding <meta name="alkaline" content="skip"> to an already indexed page
won't remove it from an existing index. |
| |
added $start to SEARCH-GENERAL, the index of the first result output on this page, and
$end, the index of the last result output on this page; to output
Displaying 21-30 out of 289 documents., the expression would be
<!--SEARCH-GENERAL £total|NOT0~[Displaying $start-$end of $total documents.]-->.
|
| |
added string operators URLSCH (url scheme, such as http), URLHOST (server name), URLDIR (full path), URLFILE
(file name), URLARG (parameters after the ?); for example:
<!--SET MAP--
<ul>
<li>prot: $url|URLSCH
<li>host: $url|URLHOST
<li>path: $url|URLDIR
<li>file: $url|URLFILE
<li>args: $url|URLARG
</ul>
£url|[URLDIR,IS/~foo/]~[Foo's Files]
--> |
| 27-Jun-2000 |
| |
New Alkaline documentation. Also available in PDF.
|
| |
Fixed Alkaline for FreeBSD 4. Also changed the thread-unsafe way Alkaline creates and destroys temporary
files which had potentially disasterous results in multithread indexing with filters.
|
| |
corrected bug: circular multiple redirections made Alkaline index in an inifinite loop
|
| |
Internet Service Manager under NT4 does not handle HTTP/1.0 GET correctly for
binary files when specifying a server name and returns a [406/No Acceptable Object];
instead of returning an error on a 406, Alkaline will attempt an HTTP/0.9 GET.
|
| | remove command line parameter allows to remove full subdirectories and sites:
./asearch path-to-asearch.cnf remove http://www.vestris.com/alkaline/*
|
| | parallel 404/Not Found step; the initial verification for document
existence can now take a few seconds for a 100'000 documents site with a large -mi parameter
|
| | added -mi / --MaxIndexThreads=X parameter;
the default is 5 and defines the thread pool size for indexing purposes
|
| | parallel indexing; this greatly improves the speed of the indexing process, but it
is just a first step to the right architecture - currently document retrieval is done in parallel only
the index database is interlocked; reindexing an already indexed site with little changes
can see performance increases by 10-20 times and a clean index can be up to 2-3 times faster
|
| | failure to bind will result in a retry attempt loop of five seconds; this solves
numerous cases when Alkaline is unable to restart properly because of a zombie thread from a previous
session or simply kernel unbind latency
|
| | corrected bug: statistics were showing the wrong requests-per-minute
count when server was restarted; added a Server Pool Started line which shows when
Alkaline has started accepting connections and which is used for the correct rpm count.
|
| | added ping mechanism; Alkaline will create an additional thread demanding very
little system resources, which tries to ping the engine and restarts it when Alkaline
dies (UNIX versions running without -d only);
also added --disableping
|
| | considerably reduced heap contention and allocation counts;
stress tests of this version show lower memory usage and considerably improved stability |
| | improved thread pooling mechanism for better responsiveness and stability on heavily
loaded servers |
| | added -mt=X / --maxthreads=X options to command line, where X is the maximum
amount of threads that the search pool can accomodate; default is 100
|
| | corrected bug: client connection abort while server was accepting it
was handled incorrectly and caused Alkaline to restart |
| | improved handling of HTML errors for missing quotes and unclosed tags; if
you see a progress bar with a # character, such as [*#****], this means that your HTML
is not correctly formed and that Alkaline has attempted to recover from the error |
| 13-Apr-2000
|
| | changed behaviour: when evaluating a regexp, such as
<a href="£url|URLENCODE">, the parser would consider as an operator URLENCODE",
which is correct according to the regexp rules, but is misleading; the operators can now contain alphanumeric
characters only, otherwise they must be wrapped inside [] -
£url|HASftp://~[Ftp!] will not work any more, but
£url|[HASftp://]~[Ftp!] will
|
| |
corrected bug: pdf2text would report a broken pdf when MaxSize setting is smaller than the
retrieved PDF (PDF partially retrieved) -
Alkaline will not retrieve a partial document which is smaller than the MaxSize setting any more;
if the server returns Content-length: bigger than MaxSize, Alkaline will skip the document and
report a 206 Partial Content error
|
| | SSI tags can contain variables from the <!--SEARCH-GENERAL--regexp-->
tag - the string is preprocessed as a whole, before being passed to the SSI parser |
| | made certification links slightly more explicit - separated the login
to the management console with the certification |
| | corrected (rare) bug: Alkaline hangs trying to verify the
existence of a page that has already been indexed |
| | corrected bug: attempt of retrieval of WAIS, NTTP and other
non-http protocols; also HTTPS urls will be rewritten to HTTP as many sites support
both protocols and Alkaline does not support SSL |
| | corrected bug: infinite loop retrieving an URL
with a recursive 302 redirect |
| |
added Cookie setting to asearch.cnf
|
| |
corrected bug: URLs containing explicitly encoded elements such as &
were incorrectly decoded, eg. http://server/url?one=&&two=2
was handled as http://server/url?one=&&two=2
|
| |
corrected bug: Alkaline META tags such as <META HTTP-EQUIV="ALKALINE" CONTENT="SKIP"> were
case-sensitive and were ignored if lowercase
|
| |
added full support for persistent cookies set via a Set-Cookie: header (non JavaScript cookies)
while indexing a remote site; implemented
a persistent cookie storage compliant to the Netscape original cookie specification
(http://www.netscape.com/newsref/std/cookie_spec.html), including
expires, domain and path attributes
|
| |
corrected bug: indexing URLs containing spaces or non-ASCII characters did not
encode the requested URL properly
|
| |
corrected bug: restricting scope to URLs with port numbers such as
url:server:port/path did not work if both port and path were specified
|
| |
corrected doc: META tags syntax is <META NAME="tag name" CONTENT="tag value">
or <META HTTP-EQUIV="tag name" CONTENT="tag value">
|
| |
corrected bug: parsing HTML with single quoted elements did not strip the single quote from the url;
links such as <a href='page.html'> could not be followed
|
| 26-Feb-2000 |
| | added IndexWords [mime/ext] option to index specific words from
every document (opposite behaviour of ExcludeWords)
|
| | fixed end-of-line order in the HTTP headers and added an extra
space in header variables after the ':' character; bugfix for WebStar 4.x web servers |
| | corrected bug: regexp processing of chained operators was done in the wrong order;
in £variable|LEFT2,RIGHT1 the second operator was executed first |
| | added <!--SET SEARCH-BASE-HREF--url--> to the search template
options; all links to search result pages use this url instead of being relative
to the server root |
| | documents that produced a retrieval error without a valid status code will
be requested from the server again; if the if-Modified-Since tag was used, it will be
removed |
| | MD5 is checked for reindexed documents against previous content when the
server does not reply to if-Modified-Since, such as for dynamic content; [md5-unch]
will show if the document has not been altered |
| | corrected bug: indexing of sites using port values other than 80
stored URLs without the port value; search results pointed to the wrong url |
| | corrected bug: redirection for links to directories
without a trailing slash |
| | added CLEFT and CRIGHT cut regexp operators
|
| | words containing underscores, such as size_t are indexed as a whole |
| | -exv will verbose for URLs excluded because of the CGI, Exts(Add) settings and for malformed or unsupported URLs |
| | corrected (rare) bug: .ndx file not written propertly or zeroed |
| 26-Nov-1999 |
| | added reloading of template pages from the online management (equivalent to the +nocache option) |
| | days and months in dates are mapped to a two digit number (1 will appear as 01) |
| | fixed META KEYWORDS appearing in document header text in search results |
| | fixed growing asearch.cnf when running Alkaline with -d |
| | fixed $modif bogus date |
| | fixed $age bogus field |
| | fixed a broken pipe fatal error retrieving data from
disconnected or inaccessible remote sites |
| | added NoMd5=Y option to asearch.cnf which
disables the MD5 mechanism (this option is very useful if
you are indexing individual URLs from different sites) |
| | added string operators URLENCODE and
URLDECODE for mapping results on the search results
templates |
| | added NoMetaDescription=Y which will force
the plain text to be used as the document header text in the
search results instead of the META DESCRIPTION tag value |
| | the QUANT value for the amount of search results
to show can now be -1 which will show all results |
| | corrected bug: META DESCRIPTION tag was cleaned
from punctuation in search results |
| 30-Sep-1999 |
| | various multithread-related issues have been solved, they lead to frequent Alkaline crashes;
finished porting Alkaline to IRIX IP32, the version has been tested fully stable;
released a DEC OSF1 beta and a Linux Suse 6.1 Glib fully working version |
| | corrected bug: <META HTTP-EQUIV="REFRESH" CONTENT="0; URL=..."> now followed
(with space before URL) |
| | Under UNIX, running Alkaline with -d will remove standard output
and run the server as a daemon (note: restarting server from the /?manage section will not be
possible with this option)
|
| | fixed opt:XXX and before/after keyword combinations returning no results |
| | when logging search operations, Alkaline will output the search results page number requested |
| | reloading of configuration files will be made only if a modification has occured |
| | online stats show the last processed (indexed) file |
| | variables of SEARCH-GENERAL are available in MAP (such as post.*) |
| | variables passed to Alkaline using a & or a + delimiter on the command line
are available as POST variables inside the regexp (post.*) |
| |
added <!--SET NEXT-INHERIT--regexp--> which
allows to pass values to the next and previous pages of results which have been generated with
the <!--SEARCH-NEXT--> tag
|
| | the apostrophe character is considered as part of the word
(searching for bob's for example will work)
|
| 16-Sep-1999 |
| | all links to result pages are shown, by default, links to the first 10 pages
are shown; if the user skips to the 11th page, links to the next 10 pages will be shown, etc.;
the amount of links to show is defined by a new <!--SET NEXT-DIVISION--X--> option,
where X is 10 by default
|
| | added similar to filters, support for embedded HTML
<OBJECT> sections; Alkaline
can now index documents such as Shockwave Flash
<OBJECT classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000"
codebase= "http://active.macromedia.com/flash2/cabs/swflash.cab#version=3,0,0,0"
height="100%" id=navig width="100%">
<PARAM NAME="movie" VALUE="navig.swf">
<PARAM NAME="menu" VALUE="false">
</OBJECT>
|
| | output of rating (in percent) for each result shown using $quality variable
in SET MAP option |
| | added -exv command line option to monitor which URLs are included or excluded,
which pages are indexed or skipped and for what humanly readable reason; output is for example:
[http://www.vestris.com/index-full.html] (-2) - [*****][304 Not Modified]
{URL} http://db.infomaniak.ch/mrtg/httpaccess.html excluded because remote from http://www.vestris.com and Remote is set to No.
...
[http://www.vestris.com/] (-1) - [*****][1228 bytes][2079]
[*****][inf][lnx][md5][*****][vix][*****][keys][mta][ndx]
{EP} Implicitely included because no ExcludePages have been defined.
|
| | corrected bug: indexing hanging with MaxFiles setting, making also
online statistics became unavailable |
| | added ability to merge Alkaline database indexes using the merge command
line directive
./asearch data/first/ merge data/second data/third
will merge the three indexes into data/first database
(produce data/first = data/first + data/second + data/third)
|
| | email and emailall command line directives will not output
any copyright or information notice any more, but the URL where the email has
been found will be shown; the output will look like this (elements separated by a tab):
| webmaster@vestris.com | http://www.vestris.com/index-full.html |
| dblock@vestris.com | http://www.vestris.com/sti/company.html |
| admin@vestris.com | http://www.vestris.com/sti/company.html |
You might find it useful to reprocess this output, for example:
./asearch data emailall | awk '{print "Email:("$1") Server:("$2")"}'
|
| | added command line emailall directive to output all email
found, including duplicates |
| | loading of indexes and of exclusion lists is much faster |
| | added IncludePagesAll=Y/N to choose whether to match all or
any of the words if one or more IncludePages directive is present and dictionaries contain
at least one word (default is N)
|
| | added IncludePages [mime/ext] option to index pages containing words
in dictionaries only
|
| | added ExcludePagesAll=Y/N to choose whether to match all or
any of the words if one or more ExcludePages directive is present and dictionaries contain
at least one word (default is N)
|
| | added ExcludePages [mime/ext] option to index pages NOT containing words
in dictionaries only
|
| | added ManageHeader and ManageFooter equivalences to the
equiv/equiv.struct file - if ManageHeader is present, the first thing to be output
at the /?manage section is the value of ManageHeader instead of the Alkaline copyright notice
and the UTS node information; if the ManageFooter is present,
the last thing to be output is the value of ManageFooter. Here's an example:
ManageHeader,<html><body>Alkaline Management:<hr>
ManageFooter,</body></html>
To reload the equiv/equiv.struct file without restarting Alkaline, call /?manage+nocache
|
| | added WriteIndex
which defines the number of files to index before writing a database (only modified
files are counted); default is 100, -1 means to write index at the end of the indexing
roundtrip only
|
| | added ExactSize directive to asearch.cnf
that defines the maximum length of a word
to search exact only; for example searching in the tree with
ExactSize=3 will be equivalent to searching "in" "the" tree;
default value is 1
|
| | added documentation on Alkaline regular expressions and how to use them
inside search result templates
|
| | added operators REVERSE, UPCASE, LCASE, LEFT, RIGHT, MORE and LESS
to the regexp, for example $url|LEFT10 will leave a maxmimum of 10 characters from the
contents of the $url variable |
| 06-Sep-1999 |
| | corrected bug: links after a previous relative link skipped, could lead
to partially indexed sites |
| | all variables from the POST method are defined with a post. prefix
for the result templates (thus you can pass any variable to Alkaline's results mapper);
for example $post.search has the same content as $search and can be mapped using the
SEARCH-GENERAL tag |
| 30-Aug-1999 |
| | Digital OSF1 Alpha version of Alkaline
[distribution directory]
|
| | light implementation of server-side includes (SSI) for Alkaline
template pages
|
| | corrected bug: words with dashes like coca-cola were not correctly indexed |
| | corrected bug: a < or a > indexed as "lt", "gt" |
| | punctuation now appears in document headers on search results |
| | password protected /?manage console, to login a root or
a alkaline-manage password must be provided; instead of the password it is
also possible to supply the server certification unlock key (for stats only, not for adding URLs, etc.)
which is only available to Vestris Inc. and the server administrator; the password is stored in
a cookie
|
| | added $host, $path, $url, $other, $before, $after variables to
SEARCH-GENERAL template tag; you can use this to inherit the case-sensitive check-box
for example
|
| | corrected bug: Remote=Y option without UrlInclude force parameter |
| | corrected bug: indexing URLs from a page containing a unique A HREF tag
or an A HREF tag at the end of the page |
| | corrected bug: indexing URLs with parameters of form ?option=value truncated
the list of options |
| | added SkipParseLinks=Y/N to asearch.cnf that forces to index URLs
from the A HREF, FRAME, BASE HREF, etc. tags even inside an ALKALINE SKIP section
|
| | corrected bug: indexing of character sequences such as · |
| |
added Replace Source=Target to the asearch.cnf that
defines string replacement for URLs in the search results. This can be useful when indexing
local intranet domains that are behind firewall. For example the fully qualified host name of
my Linux server is www.vestris.com. This is behind a firewall, so it is not known to the DNS.
What is being returned from the search engine is http://www.vestris.com/index.html.
To return http://195.141.15.96/index.html, I would add Replace www.vestris.com=195.141.15.96
|
| | corrected bug: binary formats and filters under Windows NT (produced
gzip errors with pdf2text for example)
|
| 06-Aug-1999 |
| | corrected bug: alias name shown in the log file for each search operation |
| | corrected bug: empty file error running external filters |
| | corrected indexing of intolerant servers
(such as MacOS WebStar Starnine)
that expect a double carriage return to terminate an HTTP request |
| | corrected bug: <!--SET C-BEFORE--...--> option |
| 03-Aug-1999 |
| | added SiteDepth option to asearch.cnf which defines
the maximum number of path segments that an URL to index can have relative to the server name;
default is -1; SiteDepth=0 would for example index only documents in the server root path.
|
| | made minor improvements in memory usage during background reindexing |
| | Alkaline will exit when the front end cannot not bind the listening socket |
| 02-Aug-1999 |
| | Alkaline will now create siteidx1.urt instead of siteidx1.url
when saving indexes; this is a faster and a more compact file format for loading the URL trees -
the .url file will be automatically read when .urt file does not exist and will
be deleted after the first successful load.
Note: it takes much more time to load
the previous version .url file, than in older Alkaline releases,
but this operation will be done once only.
The .urt file format is documented in the
|
| 01-Aug-1999 |
| | modified the algorithms for internal URL indexing, now using
a very efficient URL tree; Alkaline indexing and searching both benefit in speed
from this new technique, especially with a large amount of URLs
|
| | improved NT multithreading for atomic operations and thread passive waiting |
| | corrected bug: Alkaline would crash with a bogus empty
POST request (HTML form with a single submit button) |
| | corrected bug: thread handle remained unclosed under Windows NT |
| | improved inter-thread protection requesting server statistics |
| | added CPU usage and average CPU load for Alkaline server statistics
under Windows NT |
| | reimplemented email command line operation parameter |
| | reimplemented -THREAD* set of options under Windows NT |
| 29-Jul-1999 |
| | released Alkaline for Linux Alpha platform |
| | corrected bug with skip tags ignored for the ALT information attached to images |
| | corrected bug with opt:whole |
| | Windows NT version now uses non-blocking sockets, timeout settings are working
like on UNIX releases; error messages for unsuccessful connections and socket operations
have been revised for the NT version as well, they now comply to the WSA definitions |
| | added Expire option to asearch.cnf to force all pages to be treated
as out-of-date during the reindex process, the if-Modified-Since option will never be set
(can be forced from command line with -expire) |
| | added NewOnly option to asearch.cnf
to reindex only those documents that are not already in the index,
this allows to restart indexing from the abandoned point and to index only newly added documents
to the UrlListFile(s) (can be forced from command line with -newonly) |
| | added -newonly option to the command line which forces
to index only those files that are not already in the index,
this also allows to restart indexing from the interrupted point
|
| | added -once option to the command line which forces the sites to be reindexed
once only; if you add a new Alkaline group in the equiv/equiv.struct at the runtime, the new
group will be indexed once too |
| 28-Jul-1999 |
| | added CPU user/system usage and average CPU load in
server online statistics (currently working under Sun Solaris only) |
| | added a documentation page on the /?manage online management section |
| | added two new passwords to equiv/access.struct:
- alkaline-add will be checked when attempting to add an URL
- alkaline-restart will be checked when attempting to restart the server
The root password is valid for all operations.
The password file is reread when it is modified, so there is no need to restart Alkaline when
changing passwords.
|
| | added the online URL addition and removal to/from the index; if the page
exists, it will be reindexed - all server specific operations such as
server restart are now at /?manage+alkaline - this adds individual pages only
and writes the index files immediately
|
| | added remove option to the command line
followed by a list of URLs to remove from the index; for example
./asearch path_to_asearch_cnf remove http://www.vestris.com http://www.vestris.com/alkaline/
will remove the two URLs from the index; note that they are still present in the siteidx?.url file
but they are effectively removed from cross-references.
To make more complex operations, use Alkaline in conjunction
with other Unix commands, for example to remove all ".cpp" files from the index, run
cat path_to_asearch_cnf/siteidx1.url | grep ".cpp" | xargs ./asearch path_to_asearch_cnf remove
|
| | added $quant field showing the number of results shown |
| | users can choose the amount of results to show,
for example <select name="quant">
<option value="10"> 10
<option value="50"> 50
</select>
This parameter overrides the <!--SET QUANT--X--> option on the template page.
|
| 23-Jul-1999 |
| | added verification of (apparently) valid cross-references
while loading indexes, invalid siteidx*.* files will not be loaded and an error
signaled |
| | added Timeout setting to asearch.cnf, number of seconds to wait during
retrieval operations while indexing documents |
| | fixed timeout periods connecting to inexistent or unavailable
servers, now using Timeout setting and does not depend on the operating system setting
any more |
| 21-Jul-1999 |
| | added command line option -noreindex to force the background indexing option to false |
| | added command line option -expire to treat all documents as out-of-date (disable if-Modified-Since) |
| | added request traffic statistics (output traffic size, request count and requests per minute) in /?manage+stat |
| 20-Jul-1999 |
| | corrected resorting when searching words with quotes |
| | spider remembers last authentify values for basic authentification and avoids
querying the server twice |
| | fixed UrlListFile and similar when loading a file that does not exist |
| | added ExcludeWords [mime/ext] option to exclude dictionaries of words from
being indexed, dictionaries are loaded when they are required only
|
| | added a 4 kilobytes buffer when retrieving documents for faster processing
(a keep-alive retrieval has been added to the TODO list) |
| | added a possibility to specify filters with a space after the Filter directive
(for compliance with Auth and ExcludeWords), old syntax as "FilterExt" or "FilterMimeType" is preserved,
but "Filter Ext" and "Filter MimeType" is advised. |
| | fixed bug with meta tags stored case-insensitively that lead to wrong
results searching for meta data |
| 14-Jul-1999 - 19-Jul-1999 |
| | a Distribution Directory has been
set and will contain all latest releases starting from now |
| | the BSDI, BSD/OS release is now available and has been tested fully stable
[BSDI, BSD/OS distribution]
|
| | an SGI IRIX 6.5 IP32 version is available but suffers multithread problems.
It should have background indexing disabled (NoReindex=Y)
an SGI guru is welcome to enlight me in
some specific points
[SGI IRIX distribution]
|
| | all versions now support $time variable in results mapping |
| Release Note: |
| |
This is a complete Alkaline rewrite. Alkaline 1.3 uses a new portable base library which is no more
the old MV4 CGI. It is currently released as an Alpha version but it is proving itself
much more stable than all previous releases.
Bugfixes are counted by hundreds and mostly include hacks in multithreading and memory allocation,
especially under Windows NT.
I have used Rational Purify for Alkaline and
it has lead to impressive improvements in low level code. Major memory leaks were discovered
and fixed.
|
Older What's New Pages
ToDo List
| Server Configuration and Management |
| | online graphical statistics and monitoring via the web, documentation and interface
with MRTG or similar software |
| | full search and index administration via the web |
| | online management of asearch.cnf files, addition of new groups, etc |
| | native Windows NT implementation as a service with interface and administration |
| | Windows and XWindows interface |
| Indexing |
| | NTLM authentification protocol support (Windows NT Server user access) |
| | local date storage and reindex rejection of scripts not responding to If-Modified-Since |
| | compressed indexes (siteidx files) for optimizing space usage |
| | common database (ODBC, Oracle, etc.) support for storing indexes |
| | index selected CGI scripts such as redirectors using a more straightforward include/exclude option |
| | round-robin spidering policy |
| | restriction of remotely indexed servers using IP numbers instead of host names |
| | keep-alive retrieval of documents to minimize network load |
| | forcing the last-modified date from a meta tag |
| Searching |
| | artificial intelligence driven queries (programmer
mastering AI warmly welcome), example: "What's the weather in California?" |
| | fuzzy search. Example: ba%nana matches banana, bananna |
| | stemming. Example: apply~ matches apply, applies, applied |
| | search for synonyms of search terms |
| | phonic search. Example: #smith matches smith, smythe |
| | natural language search |
| | numeric range. Example: 12~~24 matches 18 |
| | variable term weighting. Example: apple:4 w/5 pear:1 |
| | better meta tag support, for example sorting results by meta tags |
| Results Output |
| | dates and error messages in any language other than English (already implemented dates in French and free date format) |
| | output of results per word searched |
Distributed PVM version
Schedule: we have a functionning PVM version that suffers several problems, thus it has not been released.
There is no date for a beta PVM test version for the public for the moment, but we are testing it on a
32PCs Pentium II cluster at the University of Geneva.
Multiprocessor Parallel/Distributed Version: Alkaline 2.0
Schedule: no schedule available.
Things we plan to do:
- Parallel version of Alkaline for clustering on a series of PCs (using TCP/IP for message passing).
- Parallel version of Alkaline for IBM SP2 using MPI.
|
|