There are basically three types of search engines.
Search Scripts are written generally in Perl or C searching sites of a maximum a few thousand pages.
They do not include complex algorithms to optimize searching and indexing and become unusable as the site grows
over a few thousand pages. Such scripts are still good when it comes to a home-made site, where a search box
is more of a gadget.
Technically, those scripts index modified pages once in a while (like once a day) in a few minutes or seconds and
produce usually fixed output with little layout options as search results. As those scripts are entirely CGIs, they are slower
as they never maintain persistent data in memory but store it and load or parse it each time the search engine is invoked.
Among such scripts you can find WebGlimpse (http://www.webglimpse.net/
)
or ht://Dig (http://www.htdig.org/
).
Search Servers, ISAPI or WAI applications, sometimes mixed with CGI scripts overcome this drawback of indexes constantly re-read
from the hard disk. Working as a permanently running server and answering to multiple search requests simultaneously this category
of search engines requires more hardware power, more memory and is aimed to larger sites that really need a good search engine.
Alkaline is designed as a server persistent search engine.
Technically, search servers maintain indexes in RAM or use some internal swap mechanism.
They have complex algorithms for searching and indexing, usually jealously kept secret by their designers.
Alkaline uses the concept of "cellular expansion" which gives quite an interesting performance
and opens doors for future research.
Cells are fast and resistant to growing data.
Of course, there's no mystery that a big server with a lot of hardware power will search
faster and will be able to index a larger site. Existing Alkaline powered sites maintain an index of 500'000 pages with
about 450'000 word forms and run on industry average Pentium III or Sun Ultra servers. Such a configuration can handle
from two to three search requests per second.
Among such servers you can of course find Alkaline, but also
Infoseek Ultraseek (http://www.ultraseek.com/
or Thunderstone Webinator (http://www.thunderstone.com/webinator/
).
Finally, Distributed Servers target searching and indexing of the whole web. This is the most fierce long term fight
of search engines as large companies compete for the best technology and for the most relevant search results. We plan a parallel
implementation of Alkaline for a cluster over a TCP/IP platform independent network and for IBM SP2. We have already made numerous
tests over a PVM network. For our distributed architecture we want Alkaline to index 5-10 million pages running fast on a cluster of 32 PII PCs.
Unlike Altavista we do not plan to set search limits to Alkaline depending on the price, that is we will distribute it as one single product
for the same value no matter what you search. Choosing Alkaline, you will also choose a team that works for the future.
Technically, distributed search servers perform both parallel indexing and searching. More hardware power you have,
faster indexing and searching is. Of course, this depends on the network charge overhead. All major search engines use
distributed architectures and can hit hundreds of requests per second.