How do I know when my siteidx.* files are corrupt?

Line count in the .inf and the .lnx files must be the same. You can try wc -l siteidx* - you should have the .inf and .lnx files with the same line count, a different .urt file and (usually) a larger ndx file (words index). It is a good idea to check that when you want to stop Alkaline with Ctrl-C or a kill statement, you can thus see if Alkaline is currently writing an index file or not.

Alkaline will take care of checking the validity of the indexes and will not load corrupt ones. A typical set of index files looks like this:
server:~/alkaline/foo> wc -l siteidx1.*
   10084 siteidx1.inf
   10084 siteidx1.lnx
   84703 siteidx1.ndx
   16604 siteidx1.urt
  121475 total

server:~/alkaline/foo> ls -la siteidx1.*
-rw-r--r-- 1 u gr 1023306 Aug 2 16:19 siteidx1.inf
-rw-r--r-- 1 u gr   61128 Aug 2 16:19 siteidx1.lnx
-rw-r--r-- 1 u gr 2874967 Aug 2 16:19 siteidx1.ndx
-rw-r--r-- 1 u gr  329763 Aug 2 16:19 siteidx1.urt

The right way of verifying whether indexes are corrupt or not is to let Alkaline load them. Currently, there's no way to recover corrupt indexes. Please report bugs and problems with corrupted indexes.