To optimize search performance we have considered this issue is less important than speed. We are also aware of the fact that more data means more memory consumed. Large indexes take more memory.
Statistics taken from a live index running with a build of 10-Aug-2000 under Windows 2000:
Word forms: 291'043
Indexed urls: 114'062
Not using swap:
Memory Usage: 110'752 Kb
VM Size: 109'520 Kb
Using swap (--enableswap):
Memory Usage: 124'256 Kb
VM Size: 72'104 Kb
Altavista is powered by a cluster of DEC servers with a few terabytes of RAM.
If Alkaline eats a lot of memory, this is normal. To reduce memory usage, consider using exclusion dictionaries using the ExcludeWords configuration directive. A good exclusion dictionary will dramatically reduce memory usage, make searches more efficient and improve quality of results.