next up previous contents
Next: 12. Parameter descriptions Up: SAM Sequence Alignment and Previous: 10. Related programs

Subsections

11. System installation

The SAM system runs on a variety of Unix workstations (we have checked installation on workstations including DEC DECstation and Alpha, HP 715, IBM RS6000, SGI Onyx Reality Engine, Sun Sparc, Intel Pentium with the Linux operating system, and the UCSC Kestrel parallel processor.

The distribution includes an INSTALL file that discusses installation procedures.

The gnuplot, gunzip, and uncompress programs should be in the user's path, and other programs should be available as required by SAM-T99. See Section 4.9.

  
11.1 Runtime statistics

At the end of each run of buildmodel, a line of statistics is printed out, such as the line

-218.36  -217.00  -217.68   0.96  22 0 149
mentioned in Section 3. These numbers are quite useful for quick comparison of results when, for example, running the program many times using a shell script. The numbers are: minimum NLL-NULL score, maximum score, average score, sample deviation of scores, number of re-estimates, number of surgeries, and the length of the final model. In the above case, the scores are for the training set: if a test set were specified (Section 7.3), the minimum, maximum, average, and sample deviation for the test set would be reported after the model length, followed by the ratio of the average test set score to the average training set score (ideally, this value should be close to unity -- larger values may indicate overfitting of the model to the training set).

11.2 Reducing runtime

Training a model can be a be a time-consuming process. Each re-estimation cycles through all sequences in the training set, performing a dynamic programming algorithm with operations proportional to the product of the total number of characters in the training set and the length of the model. Then, there can be large numbers of re-estimations, making some runs take overnight.

Shorter execution times (and possibly worse models or alignments) can be had in several: a hard limit can be placed on the number of reestimates, or the stopcriterion can be increased, though both of these can decrease model quality. Similarly, the number of surgeries can be reduced. One of the most effective ways to reduce runtime is to simply reduce the number of sequences in the training set. A small, well-chosen training set, in which close homologs have been eliminated, can produce better models than a larger, random training set.

If a run seems to be taking too long, it is possible to tell SAM to save the next model as a prelude to killing the program. The two UNIX signals, SIGUSR1 and SIGUSR2, can be used to toggle the print_surg_models and print_all_models variables. In the first case, models are printed after each surgery procedure, and in the second, after each re-estimation cycle.

11.3 Future Features

There are many future features we would like to include in SAM. The following list will also point out some of the things you currently cannot do using the system. The items are of varying difficulty.

  
11.4 Prior versions

11.4.1 Version 2.2.1

11.4.2 Version 2.2

July, 1998

11.4.3 Version 2.1.2

June, 1998.

11.4.4 Version 2.1.1

April, 1998.

11.4.5 Version 2.1

February, 1998.

11.4.6 Version 2.0

November, 1997.

11.4.7 Version 1.4

August, 1996.

11.4.8 Version 1.3

May, 1996.

11.4.9 Version 1.2

March, 1996.

11.4.10 Version 1.1

November, 1995.

11.4.11 Version 1.0

January, 1995.


next up previous contents
Next: 12. Parameter descriptions Up: SAM Sequence Alignment and Previous: 10. Related programs
SAM
sam-info@cse.ucsc.edu
UCSC Computational Biology Group