Description
Globally enable regular expressions rather than partial matches for options that support it.
Currently these include UrlReplace
,
UrlExclude
,
UrlInclude
,
UrlSkip
,
ExcludeWords
,
IndexWords
,
IncludePages
and
ExcludePages
.
Contents of UrlExcludeFile
,
UrlIncludeFile
and
UrlSkipFile
, populating
UrlExclude
,
UrlInclude
and
UrlSkip
respectively,
also comply to this directive.
Regular expressions can contain the following special control characters:
Table 1. RegExp Control Characters
| ^ |
Beginning of the string. The expression ^A
will match an A
only at
the beginning of the string. |
| ^ |
The caret (^) immediately following the left-bracket ([) has a different meaning.
It is used to exclude the remaining characters within brackets from matching the target string.
The expression [^0-9]
indicates that the target character should not be a digit. |
| $ |
The dollar sign ($
) will match the end of the string.
The expression abc$
will match the sub-string abc
only if it is at
the end of the string. |
| | |
The alternation character (|
) allows either expression on its side to
match the target string. The expression a|b
will match a
as well
as b
. |
| . |
The dot (.
) will match any character. |
| * |
The asterisk (*
) indicates that the character to the left of the
asterisk in the expression should match 0 or more times. |
| + |
The plus (+
) is similar to asterisk but there should be at least
one match of the character to the left of the + sign in the expression. |
| ? |
The question mark (?
) matches the character to its
left 0 or 1 times. |
| () |
The parenthesis affects the order of pattern evaluation. |
| [ ] |
Brackets ([
and ]
) enclosing a set
of characters indicates that any of the enclosed characters may match the target character. |
The parenthesis, besides affecting the evaluation order of the regular expression,
also serves as tagged expression which is something like a temporary memory. This memory can then
be used when we want to replace the source expression with a replace expression. The replace expression
can specify an & character which means that the & represents the sub-string that was found.
So, if the sub-string that matched the regular expression is abcd
, then a replace
expression of xyz&xyz
will change it to xyzabcdxyz
. The
replace expression can also be expressed as xyz\0xyz
. The \0
indicates a tagged expression representing the entire sub-string that was matched.
Similarly you can have other tagged expression represented by \1
,
\2
etc. Note that although the tagged expression 0 is always defined,
the tagged expression 1, 2, etc. are only defined if the regular expression used in the search
had enough sets of parenthesis. Here are few examples:
Table 2. RegExp Examples
| String |
Search |
Replace |
Result |
| Mr. |
(Mr)(\.) |
\1s\2 |
Mrs. |
| abc |
(a)b(c) |
&-\1-\2 |
abc-a-c |
| bcd |
(a|b)c*d |
&-\1 |
bcd-b |
| abcde |
(.*)c(.*) |
&-\1-\2 |
abcde-ab-de |
| cde |
(ab|cd)e |
&-\1 |
cde-cd |
| |
([0-9,A-Z,a-z,\ ]*)(STOP:)([0-9,A-Z,a-z,\ ]*) -> \1\2 |
foo bar STOP: lkasdfkjakjlf |
foo bar STOP: |
Alkaline has command line parameters, such as rxmatch
and
rxrepl
to test regular expressions. For more information,
please refer to the Testing Regular Expressions
section.
Example of Global RegExp Option
Exclude the entire /bar section from http://www.foo.com and both words, Foo and foo.
Also, replace www by ns in all urls.
RegExp=Y
UrlExclude=http://www.foo.com/bar/.*
ExcludeWords=foo/words.regexp
UrlReplace (.*)(www)(.*)=\1ns\3
|
The words.regexp file contains: