Welcome to Vestris Inc.
Internet Interactive Solutions Company


Alkaline version 1.2 news and information.

Minor Updates - Alkaline 1.2

15.06.1999
 removed authentification information from /?manage+stat
 <!--SET C-BEFORE--> and <!--SET C-AFTER--> are inserted outside of the link generated by <!--SEARCH-NEXT-->
14.06.1999
 corrected bug: circular HTTP redirections infinite loop
 added NSF option to asearch.cnf to improve indexing of files generated by Lotus Notes (Domino); it will force replacement of URLs such as *.nsf*?OpenDocument* by *.nsf*?OpenDocument&ExpandView indexing all pages fully open
 NSF option enables cleanup of multiple views for Domino generated sites; this avoids duplicate pages with different URLs but that are just a view on the same content
 Added Insens option to asearch.cnf that will treat document duplicates and URLs case-insensitive (for Windows NT and Domino servers that make no difference between /Internet and /internet for example)
 Added NoEmptyLinks option that enables skipping empty links, such as <A HREF="..."></A> - those links cannot be clicked on, but they are useful if you need to link a page for a bot exclusively; do enable this option for Domino generated sites, which makes empty links leading to pages containing junk
30.05.1999
 corrected bug in regexp, mapping of $header...
 corrected bug with multiple configurations
 the log file can be now different for inline multiple configurations; searching the whole group will produce one entry in each logfile
 swap file is created with S_IRWXU permissions
 changed behaviour of <!--SET QUOTE-1-->, will enquote only variable values instead of all the text
 added a command line option -no404 to skip the cleanup step
 corrected bug with terminating <ALKALINE> tags, <ALKALINE SKIP> will now produce expected bahaviour
 corrected URL resolver bug with links in the same page, such as to #top
 added LogPath setting to equiv.struct that defines a global path for log files generated by Alkaline server (not individual search groups log files!). Such log files are created only if the LogPath entry exists and are named asearch-Port.log where Port is the port on which Alkaline is running
27.05.1999
 added $time option to <!--SEARCH-GENERAL--regexp--> tag for the result templates that shows time of the search in seconds
 added intelligent cache system, consequent searches are now much faster; the cache information is also visible from the management section (stats)
 corrected results mapping with regexp such as $day.$month
 added <!--SET QUOTE-1--> option to enquote search results; é becomes &eacute;
 added pre-parsing of all indexed files and removal of URLs returning HTTP 404 / Not Found
 fixed a bug with opt:and searches (appeared in 19.05.1999 version)
 added SkipMeta, SkipLinks and SkipText directives to asearch.cnf that respectively allow to skip all meta tags, to avoid following links (index individual pages) and skip all simple text (everything except meta tags)
 added LogFile option to asearch.cnf that should point to a log filename; Alkaline will append the date, time, remote IP, the search group and the search string to this file each time a user performs a search operation
 added <alkaline url="..."> alkaline-specific tag that will add a link from the current page manually; useful for pages that generate JavaScript code, for example, menus that cannot be correctly interpreted by the parser
20.05.1999
 added scope specifiers before: and after: to search document modified before or after a certain date
 added input fields before and after to add input boxes to the search page
19.05.1999
 added possibility to create input fields on the search template page in order to append options for case-sensitive search for example
 added virtual memory facility to store non-vital data in a temporary file; this file is automatically created at Alkaline startup and destroyed on cleanup, even with a Ctrl-C break - memory gain of 60MB has been observed on a 40'000 page index with 250'000 word forms, previously occupying 120MB of RAM
 ?manage+stat will show swap filename location, swap usage and amount of clusters in the swap file (amount of raw objects stored)
 corrected a series of relevance bugs and searches of words with trailing spaces, such as "hello "
 added HeaderLength option to asearch.cnf that defines the maximum length of the free text to store (shown with $header) for every page indexed
 Windows NT specific: Alkaline can be also restarted online as UNIX versions do, CTRL-C will perform a clean exit, removing swap files and cleaning structures
14.05.1999
 added new storage facility for URL indexes, now compressed also in memory; the siteidx1.ndx and siteidx1.lnx files see their formats change (this version of Alkaline loads both formats and saves in the new one); all lists of numbers are now compressed when an interval is possible (1 2 3 4 becomes 1-4)
 improved quality of search results, pages with more relevant titles, keywords and descriptions get a more consequent weight and appear first
 corrected bug with proxies; a long timeout pause as Alkaline begins indexing a new search group has been removed
08.05.1999
 added an UrlListFile, UrlExcludeFile and UrlIncludeFile directives to asearch.cnf to include or exclude a list of URLs or domains from an external file; this file is reloaded at each round trip allowing to add URLs without restarting Alkaline; note that UrlExcludeFile does not remote documents, it sets restrictions to the newly indexed URLs
 a redirected top URL will be reconsidered as it's resulting filename, not the original one
For example indexing a site at http://www.vestris.com/alkaline was also indexing http://www.vestris.com/ even with Upper=N setting because "alkaline" was considered as a filename with no extension - Alkaline will now retrieve the document's redirected url (http://www.vestris.com/alkaline/) and correctly assume the search top as http://www.vestris.com/alkaline/.
Note that if a document is redirected to a different server, it will still be considered as remote and Remote=Y or an UrlInclude directive will be necessary to index that remote server.
 corrected bug in HTML parser with indexing nordic æ (&aelig;), equivalent to "ae" when searching and indexing without FreeCharset
 corrected bug in HTML parser with indexing nordic ø (&oslash;) and Ø (&Oslash;) characters
 corrected bug in HTML parser with value=""
 added <META NAME="ALKALINE" HTTP-EQUIV="SKIP SKIPTEXT SKIPLINKS SKIPMETA"> directives that avoid pages from being respectively indexed, have the free text indexed, links followed and meta data indexed
 added <ALKALINE SKIP> ... </ALKALINE> special tag to skip portions of text in the indexed document
05.05.1999
 added extension to Filter option that can now specify MIME types, ex: FilterApplication/Zip; the MIME type is checked first, the filename extension is checked in the second time. (todo: accept downloads checking the MIME type)
 executed filters get more parameters such as MIME type
 corrected bug in HTML parser when resolving invalid directories (such as http://www.vestris.com merged with ../../test.html)
 fixed the certification page availability
26.04.1999
  added possibility to specify MaxSize value in asearch.cnf in MB and KB, for example MaxSize=100k; valid extensions are KB,K,MB and M
 host:, path:, ext: and url: scope delimiters can have parameters separated by commas, such as path:alkaline/,software/xreplace
 added command line options series opt: to restrict searching:
  • opt:whole - search whole words only
  • opt:case - force case-sensitive search
  • opt:and - search all pages containing ALL terms only (default AND operator), note that it is still possible to specify an exclusion - operator manually
22.04.1999
 modified the action of the ext: scope delimiter; a shown result must satisfy any of the url:, host: and/or path: scope delimiters (if present) AND the ext: delimiter (if present)
 added a workaround for HTML errors of more than one closing quote for options inside tags
 added UrlInclude option to asearch.cnf that can restrict Alkaline scope using Remote=Y option; for example, to search all .vestris.com and .infomaniak.ch domains (such as www.vestris.com, 3dfx.infomaniak.ch, warzone.infomaniak.ch, etc.) and considering that there's a link on some page from www.vestris.com to www.infomaniak.ch, use the following configuration file:
UrlList=http://www.vestris.com
UrlInclude=.infomaniak.ch,.vestris.com
Remote=Y
		      
This will also allow Alkaline to search all domains in .fr, etc. An IPInclude and an IPExclude option is on the way.
20.04.1999
 added another search scope delimiter:
  • ext: followed by the rightmost part of the filename without the leading dot, separated by commas, ex: ext:cpp,h

 added results sort parameters (to be added to the search string):
  • sort:size - sort by size
  • sort:date - sort by date
  • sort:isize - sort by size (ascending)
  • sort:idate - sort by date (ascending)
  • sort:url - sort by URL
  • sort:title - sort by TITLE
  • sort: with no parameter or with an invalid parameter will sort by relevace

 added sort. variables to <!--SEARCH-GENERAL --> - allowing to re-resort elements after a search has been peformed; sort.size generates the link to the search page sorted by size, etc. for an example.
 corrected bug in spider with servers using ports other than 80, including the retrieval of the robots.txt file at the port pointed by the top url
15.04.1999
 added Alkaline specific option to the MV4 scripting host: /?manage+stat shows the server statistics and lists configurations
[example at Vestris.com]
 added clear digest and verification of parameters at startup time; Alkaline will parse asearch.cnf, equiv.struct, admin.struct and access.struct and expose eventually found errors.
02.04.1999
 corrected bug with CGI=Y, now fully working
 changed the way the HTML parser skips the script and object zones (to be tested more)
 URL resolver now aware of false links placed on HTML documents such as /economic/../economic and merges them correctly
  added search scope delimiters:
  • host: followed by the rightmost part of the hostname, ex: host:.vestris.com
  • path: followed by the leftmost part of a path without the leading slash, ex: path:software/xreplace
  • url: followed by the leftmost part of a full url without http://, ex: url:www.vestris.com/alkaline
It is possible to mix those entries and have multiple possibilities for each option. Only results matching the searched text and ANY of the scope delimiters will be returned. As usual, if no scope is specified Alkaline returns all results matching the searched text.
 added support for META REFRESH sections in HTML documents
23.03.1999
 added support for MAP sections in HTML documents
 added Auth Name=Password option to the asearch.cnf to support basic HTTP/1.0 authentification
 major improvements have been made in indexing speed and index file loading using a new version of chained cells mechanism; indexing speed slowdown is now minimal or inexistant while the ammount of words grows
18.02.99
 added CGI=[Y/N] option to the asearch.cnf
29.01.99
 added <!--SET FREECHARSET-1--> to the template options, when used with FreeCharset=Y in the asearch.cnf
 added Redirect operation to the asearch.cnf to make URLs like http://www.vestris.com equivalent to http://vestris.com or to simulate complete server redirections.
 corrected severe bug in URL resolution that could lead to unindexed pages
26.01.99
 added FreeCharset=[Y/N] option to asearch.cnf to disable the accentuated character transformations (rebuild your indexes if you enable this option)
08.01.99
 grouping of multiple sites with individual options inside the same search group (allows to specify options for separate sites or separate groups of sites inside a single asearch.cnf)
07.01.99
 added Maxsize option to asearch.cnf which defines the biggest file to retrieve
 added MD5 signature algorithm used to identify similar pages (ex: http://www.vestris.com/ and http://www.vestris.com/index.html), such pages are no more indexed twice
06.01.99
 binary files retrieval is supported
 added preprocessor filters for specified file extensions
 preprocessor filter: Adobe PDF, using Derek B. Noonburg's xpdf pdftotext; a more Alkaline oriented version producing PDF title, keywords and other meta data will be available
 TITLE is now also automatically a META data
23.12.98
 corrected a bug in the HTML parser with links having a target option
 added preemptive RAM allocator, resulting in index (siteidx) files faster loading
 rewritten results mapping code to support reentrance, resulting in better multithread load balancing with high client demand
 corrected + signs replaced by spaces in links to next results pages
21.12.98 / 22.12.98
 meta search - try searching author:Daniel on the Vestris Inc. site search
 added NT scheduling policy options
 rearranged multithread synchronization policy for more stability
 added -v (--verbose), -l (--log), -THREAD_, -SleepFile and -SleepRoundtrip options
 corrected data mapping bug that lead to Alkaline NT crash (was ignored by Unix servers)
18.12.98
 severe bug corrected under NT: socket closure failure
 changed WaitForSingleObject to EnterCriticalSection with NT API after reports of Alkaline crash
 rewritten and removed two critical sections resulting in a slightly more stable code
 added options to show images depending on the age of file: $recent and $recent.count variables in the <!--SET MAP --> tag, <!--SET RECENT-COUNT--X--> and <!--SET RECENT--...--> to define when a file is considered new and what text to use for the $recent variable
 added <!--SET DATE--...--> to format the modified and creation date fields
17.12.98
 added encoding to HTML (" into &quot; for example) for the $search variable (quoting search string for the search form on the search results page works for all cases now)
 added search string parser to enable searching of strings without the mandatory seprator space (ex: +"Vestris"-company will search for +"Vestris" and -company)
 added $date.french and $modif.french variables for template pages (dates in french) - changed format of english dates to something nicer too
16.12.98
 corrected URL and parameter encoding using standard "%" instead of MV4 "'"
 NT version corrects false result and false result count reported by $total (wrong binary was updated on 15.12.98)
 corrected another bug in HTML parser that lead to unindexed pages
 index.html option has no value by default (this lead to a lot of confusion in 15.12.98)
 Alkaline tags such as <!--SEARCH-GENERAL --> are not copied on the resulting page - thus it is possible to copy a search string on the resulting page and place it into the
15.12.98
 corrected bug: 1 result reported by $total variable when no results available
 added support for HTTP 1.0/1.1 - Temporary Redirect / Error 307
 added support for HTTP 1.0/1.1 - Use Proxy / Error 305
 new asearch.cnf option: index.html= main page to be appended to urls with no filename, default is index.html, (none) means do not append
 new asearch.cnf option: SleepFile= delay in seconds to pause after a file has been indexed in lazy mode (when responding to search requests)
 new asearch.cnf option: SleepRound= delay in seconds to pause after a whole group has been processed in lazy mode (if you search multiple sites with multiple asearch.cnf, you might want to make a longer delay after the last asearch.cnf is processed)
 corrected link to the next page when template alias name does not correspond to the data alias
 corrected quote parser for tag values with "'" characters
14.12.98
 only first <TITLE> tag is used, next titles are ignored
 robots.txt comments on the same line (after #) are ignored
 corrected more bugs in URL resolver that lead to unindexed pages
 refined 1/Undocumented errors with new error messages for socket errors
 added Retry=X to asearch.cnf that tells Alkaline to retry X times an HTML page retrieval when a communication error/timout has occurred, default is 3
 corrected <!-- --> parsing with JavaScript code with recursive < and >
 corrected parser errors with <STYLE> and <OBJECT> tags
11.12.98
 added WSA compliant socket implementation for Windows NT version
 corrected &...; parser errors inside tags and metas
 added META indexing (search will be implemented next week)
10.12.98
 corrected invalid URLs with line breaks
 added full support for /robots.txt, including META ROBOTS NOINDEX and NOFOLLOW directives
 excluded STYLE HTML tag from indexing
08.12.98
 +word (must exist) and -word (must not exist) in search results - fully implemented boolean search and refined boolean search
 added better error report, rather than [timout/error], HTTP failure is now shown and briefly described
 corrected bug: invalid URL resolving when starting indexing at a certain web page and not a host address (http://www.vestris.com/alkaline/search.html instead of http://www.vestris.com/alkaline)
 redirection (HTTP 301 or 302) works correctly with proxies, also a redirection is explicitely shown in the verbose mode when doing a live reindexing
04.12.98
 modified page results sorting: test timings give a 80% speedup as for this part of code which results in quite an overall speed gain
 added a Search Tips page
02.12.98
 released a PowerMAC version for MkLinux
 made consequement adjustments in the mainstream algorith
 corrected bugs in certification
 corrected duplicate pages in results
20.11.98
 searching words in quotes will search full words, case sensitive search is activated as usual when at least a character is capitalized
 undocumented function: Alkaline can gather email instead of indexing words, run asearch asearch_cnf_dir email
 added <!--SET PREV--...->, a link to previous pages is shown in the results using this option
 aded a major enhancement in the MV4 vector manipulation, Alkaline largely benefits from those improvements speeding up many internal operations, especially in results mapping
16.11.98
  added Exts and ExtsAdd to the asearch.cnf command set - a list of valid extensions to index, default is htm,html,shtml,txt
  added $index variable to search.html template's <!--SET MAP--...-> option, shows an increasing number, first search result starting at "1"

[ back to Alkaline doc ]

- © Vestris Inc.
1994 - 2004 - Switzerland - All Rights Reserved
Last modified: Tue Feb 06 08:52:19 Pacific Standard Time 2001