Software Documentation
Alkaline: a UNIX/NT Search Engine
Alkaline 1.9 Users Guide
Vestris Inc., Switzerland
Copyright © 1994-2002 by
Vestris Inc., Switzerland
Table of Contents
1.
Alkaline Concepts
What is Alkaline?
Search Engines Overview
Alkaline Features
Indexing
Searching
Online Administration
History of Alkaline
History of the Name
People Behind Alkaline
Copyright Notices
2.
Alkaline Server Installation
Concepts and Requirements
Windows NT/2000
UNIX Operating Systems
Hardware Requirements
Installation on UNIX
Download Alkaline
Install and Test Binary
Run the Demo
Create a Simple Configuration
Index a Site
Search Your Index
Stopping Alkaline
Installation on Windows NT/2000/XP
Download Alkaline
Install and Test Binary
Run the Demo
Create a Simple Configuration
Index a Site
Search Your Index
Stopping Alkaline
Troubleshooting
Segmentation Fault (core dumped).
./asearch.1.9.OS: can't load library 'xxx.ld.so'.
asearch.exe has provoked an unexpected error in ???.dll
Internal Server Error or binary code in browser
Browsing to http://server:port/ does not work
Upgrading Alkaline
Finding the Current Software Version
Before You Begin
Upgrading to Version 1.7
Upgrading from Version 1.4 and Above
Upgrading from Earlier Releases
3.
Alkaline Server Configuration
asearch.cnf Configuration File
Server Paths, Aliases and Templates
The global.cnf file, Server Passwords
4.
Searching with Alkaline
Simple Search
Boolean Search
Meta Data Search
Numeric Data Search
Returning all pages
Restricting Search Scope
Restricting Date Scope
Forcing Search Options
Hints and Techniques
5.
Running Alkaline
Running Alkaline as a Daemon
Reindexing a Database
Gather Email Addresses
Remove Urls from a Database
Merging Databases
Exclude Words from an Existing Database
Testing Regular Expressions
Querying Available Settings
Parsing Html Documents
Command Line Options
6.
Customizing Search Results
Creating Search Templates
Simple Tag
Extended Tag
Option
Creating Search Input Forms
Prerequisites
A Simple Search Form
Writing Expressions
Introduction
Expression Variables
Expression Commands
Examples
Server Side Includes
What is SSI?
Alkaline SSI Support
Valid SSI Tags
WAP/WML Templates and Wireless Support
WML Search Form
WML Search Template
7.
Advanced Alkaline Features
Indexing Other Document Formats
Introduction
Document Filters
Object Filters
Writing Filters
Available Filters
Adobe Pdf (pdf2text and pdf2html)
Microsoft Word (vwHtml)
Microsoft Rich Text Format (rtf2html)
LaTex / Tex (LaTex2Html)
Word Perfect, AmiPro, Wang WPS (Plus), etc.
Shockwave Flash
Extensible Markup Language (Xml)
MPEG Layer 3 Music (Mp3)
Other Sources
Alkaline Robots, HTML and Meta Tags
Alkaline Robot Support
Alkaline Specific Meta Tags
Alkaline Specific Html Tags
Online Administration and Statistics
Accessing the Online Administration
Server Parameters
Server Operations
Running Alkaline as a Windows NT/2000 Service
Installation
Options
Troubleshooting
Alkaline Virtual Memory and Swap
Indexing Guidelines
CGI-powered and Dynamic Sites
Indexing in Background
Lotus Notes Domino
Mirroring Sites
Indexing Local File System
Working With Us
Reporting Problems
Built-in Tracing
Providing Vestris Inc. with Server Access
8.
Alkaline Tools
Introduction
NdxScan - most frequent words dump
Introduction
Usage
Example
UrtList - indexed/found urls dump
Introduction
Usage
Example
LnxDump - cross-reference url dump
Introduction
Usage
Example (Forward)
Example (Reverse)
MrtgStats - mrtg-compatible statistics
Introduction
Installation
Perl Modules - open API
Introduction
Alkaline::Server
AlkalineStop.pl - stop the running daemon
Introduction
Usage
AlkalineReloadIndex.pl - reload an index
Introduction
Usage
Notes
AlkalineAddUrl.pl - add an url to an existing index
Introduction
Usage
Notes
AlkalineDeleteUrl.pl - remove an url from an existing index
Introduction
Usage
Notes
AlkalineGetStats.pl - get statistics in xml format
Introduction
Usage
Notes
AlkalineRefreshTemplate.pl - refresh a search results template
Introduction
Usage
Notes
AlkalineRestart.pl - restart the server
Introduction
Usage
Notes
AlkalineUnlock.pl - submit a certificate
Introduction
Usage
Notes
9.
Alkaline Licensing and Terms of Use
Introduction
Server License Agreement
Terms of Use
General
Copyright Notices
End User License Agreement
Evaluation License Agreement
Refunds Policy
Limitations of Use
Limitations of Liability
Privacy
Arbitration
Disclaimer
Definition of Non-Commercial Companies
It was organized solely for non-profit purposes.
Operated solely for non-profit purposes.
No personal benefit for any member.
Software and Service Pricing
Reseller, Source-Code and OEM Licensing
Paying for a License
Online Purchase
Email Purchase
Fax Purchase
Postal Purchase
Purchase Orders
Questions and Issues
Software Certification and Registration
Certification Mechanism
I.
The asearch.cnf Configuration Reference
UrlList
— specify the root URLs to index
UrlExclude
— exclude urls from indexing
UrlListFile
— add a list of urls from a file to UrlList
UrlExcludeFile
— add an url from a file to UrlExclude
Remote
— index or exclude urls from a remote site
UrlInclude
— define a global url include scope
UrlIncludeFile
— define a global url include scope list
UrlIndex
— follow links and index
UrlIndexFile
— follow links and index
UrlSkip
— follow links, but do not index
UrlSkipFile
— follow links, but do not index
Depth
— define the relative url depth of indexing
SiteDepth
— define the maximum paths depth of urls to follow
RemoteDepth
— define the maximum remote depth of urls to follow
MaxFiles
— the maximum number of files to index
MaxLinks
— the maximum number of links to follow
Upper
— follow or ignore parent server paths
Reindex
— index in background
Exts
— valid extensions for files to spider
ExtsAdd / AddExts
— add extensions to the Exts directive
Robots
— respect server robots directives
SkipText
— do not index plain text
SkipMeta
— do not index meta tags
SkipLinks
— do not follow links
EmptyLinks
— process and queue invisible links
Index.html
— default for urls without a filename
HeaderLength
— maximum $header length
FreeCharset
— disable character decoding
Redirect
— define equivalent or redirected urls
UrlReplace
— indexed urls text replacements
Replace
— absolute url string replacement
Cgi
— schedule cgi urls for indexing
Nsf
— Lotus Notes Domino support
Insens
— case-insensitive url parsing and MD5
LowerCase
— convert to lowercase
UpperCase
— convert to uppercase
ExcludeWords
— exclude a list of words from indexing
IndexWords
— include only a list of words from a dictionary
IncludePages
— include pages containing words
IncludePagesAll
— define the behavior of IncludePages
ExcludePages
— stop words
ExcludePagesAll
— define the behavior of ExcludePages
Expire
— treat all documents as out of date
NewOnly
— index only new documents
SkipParseLinks
— index links in skip sections
MetaDescription
— store meta descriptions
TextDescription
— store plain text descriptions
Md5
— enable the MD5 document matching
Cookie
— set a global cookie to be set to servers
Cookies
— retrieve, store and send cookies
RequestHeader
— define an http request header
Filter
— define a document type filter
Object
— define an embedded objects filter
ObjectDocument
— define the default param for Object
MaxSize
— maximum document size
Auth
— supply credentials for authentication
Proxy
— define an HTTP proxy to use for document retrieval
Timeout
— network timeout period
DnsTimeout
— dns lookup timeout period
Retry
— retry count for a timed-out connection
SleepFile
— lazy mode delay between files
SleepRoundtrip
— lazy mode delay between roundtrips
LogFile
— write a log file
ExactSize
— exact search word length
WriteIndex
— database write interval
RegExp
— enable regular expressions
CustomMetas
— define custom meta tags for search results
ReplaceLocal
— local url string replacement
SearchPartialLeft
— constraint search to left/right wildcards
ParseContent
— define html content-types
ParseMetas
— define parseable html meta content
Weight
— ranking parameters
WeakWords
— frequent weak words
MaxWordSize
— maximum size of indexed words
SearchCacheLife
— cache record life span
II.
Known global.cnf Variables
LogPath
— location of the log files
Proxy
— define an HTTP proxy
CacheTemplates
— cache search templates
UsFormats
— us-formatted dates
Realm
— basic auth realm
ErrorFooter
— error footer string
KeepAlive
— allow to keep-alive clients
Nagle
— disable nagle algorithm
Ssi
— enable server-side includes
RampupSearchThreads
— rampup search thread pool threads
MaxSearchThreads
— maximum search thread pool threads
MaxSearchQueueSize
— maximum search thread pool queue
MaxSearchThreadIdle
— maximum search thread idle time
MaxIndexThreads
— maximum index threads
RampupIndexThreads
— rampup index threads
III.
Known global.cnf Passwords
Pass Root
— general purpose password
Pass Alkaline-Restart
— restart the server password
Pass Alkaline-Add
— add urls password
Pass Alkaline-Manage
— access the online management
AdminPath
— location of the administrative path
DocumentPath
— plain document paths
ForwardAlnHeaders
— forward headers for aln templates
Redirect
— default redirect
Ping
— enable ping thread
PingInterval
— ping thread request interval
PingRestart
— failed ping restart count
PingUrl
— url to ping periodically
IV.
Search Templates Tags and Options
SEARCH-RESULTS
— search results
SEARCH-GENERAL
— search operation variables
SEARCH-NEXT
— links to more results
SET MAP
— format each search result
SET NEXT-DIVISION
— number of page links
SET NEXT
— link to the next page
SET PREV
— link to the previous page
SET NEXT-INHERIT
— pass form values to search pages
SET SEARCH-NORESULTS
— output when no results found
SET SEARCH-BASE-HREF
— base url of links to search results
SET SEARCH-BASE-ABS
— absolute base of search results links
SET C-BEFORE
— insert before the current page
SET C-AFTER
— insert after the current page
SET N-BEFORE
— insert before links to results pages
SET N-AFTER
— insert after links to results pages
SET DATE
— format date fields
SET RECENT-COUNT
— consider a document recent
SET EXPIRED-COUNT
— consider a document expired
SET RECENT
— recent value
SET EXPIRED
— expired value
SET QUANT
— amount of results per page
SET FREECHARSET
— do not quote output values
SET QUOTE
— force quoting of search results
SET HIGHLIGHT-OPEN
— left highlight
SET HIGHLIGHT-CLOSE
— right highlight
#tag name=value
— server side includes
V.
Database Formats
siteidx1.ndx
— unique words index (obsolete)
siteidx2.ndx
— unique words index
siteidx1.url
— index of uniform resource locators (obsolete)
siteidx1.urt
— uniform resource locator tree
siteidx1.inf
— digest information
siteidx1.lnx
— id cross-index
Next
Alkaline Concepts