|
Alkaline version 1.2 news and information.
|
|
|
Minor Updates - Alkaline 1.2
| 15.06.1999 |
| | removed authentification information from /?manage+stat |
| | <!--SET C-BEFORE--> and <!--SET C-AFTER--> are inserted outside
of the link generated by <!--SEARCH-NEXT--> |
| 14.06.1999 |
| | corrected bug: circular HTTP redirections infinite loop |
| | added NSF option to asearch.cnf to improve
indexing of files generated by Lotus Notes (Domino); it will force
replacement of URLs such as *.nsf*?OpenDocument* by
*.nsf*?OpenDocument&ExpandView indexing all pages fully open
|
| | NSF option enables cleanup of multiple views for Domino
generated sites; this avoids duplicate pages with different URLs but
that are just a view on the same content |
| | Added Insens option to asearch.cnf that will
treat document duplicates and URLs case-insensitive (for Windows NT and Domino servers
that make no difference between /Internet and /internet for example)
|
| | Added NoEmptyLinks option that enables
skipping empty links, such as <A HREF="..."></A> - those
links cannot be clicked on, but they are useful if you need to link a page for a bot
exclusively; do enable this option for Domino generated sites, which makes empty
links leading to pages containing junk
|
| 30.05.1999 |
| | corrected bug in regexp, mapping of $header... |
| | corrected bug with multiple configurations |
| | the log file can be now different for inline multiple configurations;
searching the whole group will produce one entry in each logfile |
| | swap file is created with S_IRWXU permissions |
| | changed behaviour of <!--SET QUOTE-1-->, will enquote only
variable values instead of all the text |
| | added a command line option -no404 to skip the cleanup step
|
| | corrected bug with terminating <ALKALINE> tags,
<ALKALINE SKIP> will now produce expected bahaviour |
| | corrected URL resolver bug with links in the same page, such as
to #top |
| | added LogPath setting to equiv.struct that defines
a global path for log files generated by Alkaline server (not individual search
groups log files!). Such log files are created only if the LogPath entry exists
and are named asearch-Port.log where Port is the port on which Alkaline is running
|
| 27.05.1999 |
| | added $time option to <!--SEARCH-GENERAL--regexp--> tag
for the result templates that shows time of the search in seconds |
| | added intelligent cache system, consequent searches
are now much faster; the cache information is also visible from the management
section (stats) |
| | corrected results mapping with regexp such as $day.$month |
| | added <!--SET QUOTE-1--> option to enquote search results;
é becomes é |
| | added pre-parsing of all indexed files and removal of URLs returning
HTTP 404 / Not Found |
| | fixed a bug with opt:and searches (appeared in 19.05.1999 version) |
| | added SkipMeta, SkipLinks and SkipText directives
to asearch.cnf that respectively allow to skip all meta tags, to avoid following links (index
individual pages) and skip all simple text (everything except
meta tags)
|
| | added LogFile option to asearch.cnf that should point to a log filename;
Alkaline will append the date, time, remote IP, the search group and the search string to this
file each time a user performs a search operation
|
| | added <alkaline url="..."> alkaline-specific tag
that will add a link from the current page manually; useful for pages that generate JavaScript
code, for example, menus that cannot be correctly interpreted by the parser |
| 20.05.1999 |
| | added scope specifiers before: and after: to search document
modified before or after a certain date
|
| | added input fields before and after to add
input boxes to the search page |
| 19.05.1999 |
| | added possibility to create input fields on the search template page
in order to append options for case-sensitive search for example
|
| | added virtual memory facility to store non-vital data in a temporary
file; this file is automatically created at Alkaline startup and destroyed on cleanup, even
with a Ctrl-C break - memory gain of 60MB has been observed on a 40'000 page index
with 250'000 word forms, previously occupying 120MB of RAM
|
| | ?manage+stat will show swap filename location, swap usage and amount of
clusters in the swap file (amount of raw objects stored) |
| | corrected a series of relevance bugs and searches of words with trailing
spaces, such as "hello " |
| | added HeaderLength option to asearch.cnf that defines
the maximum length of the free text to store (shown with $header)
for every page indexed
|
| | Windows NT specific: Alkaline can be also restarted online as
UNIX versions do, CTRL-C will perform a clean exit, removing swap files
and cleaning structures |
| 14.05.1999 |
| | added new storage facility for URL indexes, now compressed also
in memory; the siteidx1.ndx and siteidx1.lnx files see their formats change (this version of
Alkaline loads both formats and saves in the new one); all lists of numbers are now compressed
when an interval is possible (1 2 3 4 becomes 1-4) |
| | improved quality of search results, pages with more relevant titles, keywords
and descriptions get a more consequent weight and appear first |
| | corrected bug with proxies; a long timeout pause as Alkaline
begins indexing a new search group has been removed |
| 08.05.1999 |
| | added an UrlListFile, UrlExcludeFile and UrlIncludeFile
directives to asearch.cnf to include or exclude a list of URLs or domains from
an external file; this file is reloaded at each round trip allowing to add URLs without
restarting Alkaline; note that UrlExcludeFile does not remote documents, it sets restrictions
to the newly indexed URLs
|
| | a redirected top URL will be reconsidered as it's resulting
filename, not the original one For example indexing a site at http://www.vestris.com/alkaline
was also indexing http://www.vestris.com/ even with Upper=N setting because "alkaline"
was considered as a filename with no extension - Alkaline will now retrieve the document's
redirected url (http://www.vestris.com/alkaline/) and correctly assume the search top
as http://www.vestris.com/alkaline/.
Note that if a document is redirected to a different server, it will still be considered
as remote and Remote=Y or an UrlInclude directive will be necessary to index that remote server.
|
| | corrected bug in HTML parser with indexing nordic æ (æ),
equivalent to "ae" when searching and indexing without FreeCharset |
| | corrected bug in HTML parser with indexing nordic ø (ø)
and Ø (Ø) characters |
| | corrected bug in HTML parser with value="" |
| | added <META NAME="ALKALINE" HTTP-EQUIV="SKIP SKIPTEXT SKIPLINKS SKIPMETA">
directives that avoid pages from being respectively indexed, have the free text indexed,
links followed and meta data indexed
|
| | added <ALKALINE SKIP> ... </ALKALINE> special tag to skip
portions of text in the indexed document
|
| 05.05.1999 |
| | added extension to Filter option that
can now specify MIME types, ex: FilterApplication/Zip; the MIME type is checked first,
the filename extension is checked in the second time.
(todo: accept downloads checking the MIME type) |
| | executed filters get more parameters such as MIME type |
| | corrected bug in HTML parser when resolving invalid directories
(such as http://www.vestris.com merged with ../../test.html) |
| | fixed the certification page availability |
| 26.04.1999 |
| |
added possibility to specify MaxSize value in asearch.cnf in MB and KB, for example
MaxSize=100k; valid extensions are KB,K,MB and M
|
| | host:, path:, ext: and url: scope delimiters can have parameters separated
by commas, such as path:alkaline/,software/xreplace
|
| | added command line options series opt: to restrict searching:
- opt:whole - search whole words only
- opt:case - force case-sensitive search
- opt:and - search all pages containing ALL terms only (default AND operator), note that
it is still possible to specify an exclusion - operator manually
|
| 22.04.1999 |
| | modified the action of the ext: scope delimiter; a shown result must satisfy
any of the url:, host: and/or path: scope delimiters (if present) AND the ext: delimiter (if present)
|
| | added a workaround for HTML errors of more than one closing quote for options
inside tags |
| | added UrlInclude option to asearch.cnf that can restrict Alkaline scope using
Remote=Y option; for example, to search all .vestris.com and .infomaniak.ch domains
(such as www.vestris.com, 3dfx.infomaniak.ch,
warzone.infomaniak.ch, etc.) and considering that there's a link on some page from www.vestris.com
to www.infomaniak.ch, use the following configuration file:
UrlList=http://www.vestris.com
UrlInclude=.infomaniak.ch,.vestris.com
Remote=Y
This will also allow Alkaline to search all domains in .fr, etc. An IPInclude and an IPExclude
option is on the way.
|
| 20.04.1999 |
| | added another search scope delimiter:
- ext: followed by the rightmost part of the filename without the leading dot, separated by commas, ex: ext:cpp,h
|
| | added results sort parameters
(to be added to the search string):
- sort:size - sort by size
- sort:date - sort by date
- sort:isize - sort by size (ascending)
- sort:idate - sort by date (ascending)
- sort:url - sort by URL
- sort:title - sort by TITLE
- sort: with no parameter or with an invalid parameter will sort by relevace
|
| | added sort. variables to <!--SEARCH-GENERAL --> -
allowing to re-resort elements after a search has been peformed; sort.size
generates the link to the search page sorted by size, etc.
for an example.
|
| | corrected bug in spider with servers using ports other than 80,
including the retrieval of the robots.txt file at the port pointed by the top url
|
| 15.04.1999 |
| | added Alkaline specific option to the MV4 scripting host: /?manage+stat shows the
server statistics and lists configurations
[example at Vestris.com]
|
| | added clear digest and verification of parameters at startup time;
Alkaline will parse asearch.cnf, equiv.struct, admin.struct and access.struct and
expose eventually found errors. |
| 02.04.1999 |
| | corrected bug with CGI=Y, now fully working |
| | changed the way the HTML parser skips the
script and object zones (to be tested more) |
| | URL resolver now aware of false links placed on HTML documents
such as /economic/../economic and merges them correctly |
| |
added search scope delimiters:
- host: followed by the rightmost part of the hostname, ex: host:.vestris.com
- path: followed by the leftmost part of a path without the leading slash, ex: path:software/xreplace
- url: followed by the leftmost part of a full url without http://, ex: url:www.vestris.com/alkaline
It is possible to mix those entries and have multiple possibilities for each option. Only results matching the
searched text and ANY of the scope delimiters will be returned. As usual, if no scope is specified Alkaline
returns all results matching the searched text.
|
| | added support for META REFRESH sections in HTML documents |
| 23.03.1999 |
| | added support for MAP sections in HTML documents |
| | added Auth Name=Password option to the asearch.cnf to support basic
HTTP/1.0 authentification
|
| | major improvements have been made in indexing speed and index file loading
using a new version of chained cells mechanism; indexing speed slowdown is now minimal or inexistant
while the ammount of words grows
|
| 18.02.99 |
| | added CGI=[Y/N] option to the asearch.cnf
|
| 29.01.99 |
| | added <!--SET FREECHARSET-1--> to the template options,
when used with FreeCharset=Y in the asearch.cnf
|
| | added Redirect operation to the asearch.cnf to make URLs
like http://www.vestris.com equivalent to http://vestris.com or to simulate
complete server redirections.
|
| | corrected severe bug in URL resolution that could lead to unindexed
pages |
| 26.01.99 |
| | added FreeCharset=[Y/N] option to asearch.cnf to disable the accentuated
character transformations (rebuild your indexes if you enable this option) |
| 08.01.99 |
| | grouping of multiple sites with individual options inside the same search group
(allows to specify options for separate sites or separate groups of sites inside a
single asearch.cnf) |
| 07.01.99 |
| | added Maxsize option to asearch.cnf which defines the biggest file to retrieve |
| | added MD5 signature algorithm used to identify similar pages (ex: http://www.vestris.com/
and http://www.vestris.com/index.html), such pages are no more indexed twice |
| 06.01.99 |
| | binary files retrieval is supported |
| | added preprocessor filters for specified file extensions |
| | preprocessor filter: Adobe PDF, using Derek B. Noonburg's xpdf pdftotext; a more Alkaline oriented version producing PDF title, keywords and other meta data will be available |
| | TITLE is now also automatically a META data |
| 23.12.98 |
| | corrected a bug in the HTML parser with links having a target option |
| | added preemptive RAM allocator, resulting in index (siteidx) files faster loading |
| | rewritten results mapping code to support reentrance, resulting in better multithread load balancing with high client demand |
| | corrected + signs replaced by spaces in links to next results pages |
| 21.12.98 / 22.12.98 |
| | meta search - try searching author:Daniel on the Vestris Inc. site search |
| | added NT scheduling policy options |
| | rearranged multithread synchronization policy for more stability |
| | added -v (--verbose), -l (--log), -THREAD_, -SleepFile and -SleepRoundtrip options |
| | corrected data mapping bug that lead to Alkaline NT crash (was ignored by Unix servers) |
| 18.12.98 |
| | severe bug corrected under NT: socket closure failure |
| | changed WaitForSingleObject to EnterCriticalSection with NT API after reports of Alkaline crash |
| | rewritten and removed two critical sections resulting in a slightly more stable code |
| | added options to show images depending on the age of file: $recent and $recent.count variables in the <!--SET MAP --> tag,
<!--SET RECENT-COUNT--X--> and <!--SET RECENT--...--> to define when a file is considered new and what text to use for the $recent variable |
| | added <!--SET DATE--...--> to format the modified and creation date fields |
| 17.12.98 |
| | added encoding to HTML (" into " for example) for the $search variable (quoting search string for the search form on the search results page works for all cases now) |
| | added search string parser to enable searching of strings without the mandatory seprator space (ex: +"Vestris"-company will search for +"Vestris" and -company) |
| | added $date.french and $modif.french variables for template pages (dates in french) - changed format of english dates to something nicer too |
| 16.12.98 |
| | corrected URL and parameter encoding using standard "%" instead of MV4 "'" |
| | NT version corrects false result and false result count reported by $total (wrong binary was updated on 15.12.98) |
| | corrected another bug in HTML parser that lead to unindexed pages |
| | index.html option has no value by default (this lead to a lot of confusion in 15.12.98) |
| | Alkaline tags such as <!--SEARCH-GENERAL --> are not copied on the resulting
page - thus it is possible to copy a search string on the resulting page and place it into the
|
| 15.12.98 |
| | corrected bug: 1 result reported by $total variable when no results available |
| | added support for HTTP 1.0/1.1 - Temporary Redirect / Error 307 |
| | added support for HTTP 1.0/1.1 - Use Proxy / Error 305 |
| | new asearch.cnf option: index.html= main page to be appended to urls with
no filename, default is index.html, (none) means do not append |
| | new asearch.cnf option: SleepFile= delay in seconds to pause after a file has been
indexed in lazy mode (when responding to search requests) |
| | new asearch.cnf option: SleepRound= delay in seconds to pause after a whole
group has been processed in lazy mode (if you search multiple sites with multiple asearch.cnf, you might want
to make a longer delay after the last asearch.cnf is processed) |
| | corrected link to the next page when template alias name does not correspond
to the data alias |
| | corrected quote parser for tag values with "'" characters |
| 14.12.98 |
| | only first <TITLE> tag is used, next titles are ignored |
| | robots.txt comments on the same line (after #) are ignored |
| | corrected more bugs in URL resolver that lead to unindexed pages |
| | refined 1/Undocumented errors with new error messages for socket errors |
| | added Retry=X to asearch.cnf that tells Alkaline to retry X times an HTML page retrieval
when a communication error/timout has occurred, default is 3 |
| | corrected <!-- --> parsing with JavaScript code with recursive < and > |
| | corrected parser errors with <STYLE> and <OBJECT> tags |
| 11.12.98 |
| | added WSA compliant socket implementation for Windows NT version |
| | corrected &...; parser errors inside tags and metas |
| | added META indexing (search will be implemented next week) |
| 10.12.98 |
| | corrected invalid URLs with line breaks |
| | added full support for /robots.txt, including META ROBOTS NOINDEX and NOFOLLOW directives |
| | excluded STYLE HTML tag from indexing |
| 08.12.98 |
| | +word (must exist) and -word (must not exist) in search results -
fully implemented boolean search and refined boolean search |
| | added better error report, rather than [timout/error], HTTP failure
is now shown and briefly described |
| | corrected bug: invalid URL resolving when starting indexing at a certain
web page and not a host address (http://www.vestris.com/alkaline/search.html instead
of http://www.vestris.com/alkaline) |
| | redirection (HTTP 301 or 302) works correctly with proxies,
also a redirection is explicitely shown in the verbose mode when doing a live reindexing |
| 04.12.98 |
| | modified page results sorting: test timings give a 80% speedup
as for this part of code which results in quite an overall speed gain |
| | added a Search Tips page |
| 02.12.98 |
| | released a PowerMAC version for MkLinux |
| | made consequement adjustments in the mainstream algorith |
| | corrected bugs in certification |
| | corrected duplicate pages in results |
| 20.11.98 |
| | searching words in quotes will search full words, case sensitive search is activated
as usual when at least a character is capitalized |
| | undocumented function: Alkaline can gather email instead of indexing
words, run asearch asearch_cnf_dir email |
| | added <!--SET PREV--...->, a link to previous pages is shown in the
results using this option |
| | aded a major enhancement in the MV4 vector manipulation,
Alkaline largely benefits from those improvements speeding up many internal operations,
especially in results mapping |
| 16.11.98 |
| |
added Exts and ExtsAdd to the asearch.cnf command set - a list
of valid extensions to index, default is htm,html,shtml,txt
|
| |
added $index variable to search.html template's <!--SET MAP--...->
option, shows an increasing number, first search result starting at "1"
|
|
|