next up previous contents
Next: About this document ... Up: SAM Sequence Alignment and Previous: 11. System installation

   
12. Parameter descriptions

This section alphabetically explains all the parameters that can be specified in an init file. Where appropriate, the type of the parameter and any default value is listed. The default values are automatically used by the program if the user does not specify any alternative setting. The dump_parameters option can be used to verify the default values. See Section 6.

The drawmodel and prettyalign programs do not use parameter files.

The SAM-T99 paramaters are discussed elsewhere. See Section 4.

The parameter reading routines will accept variations in capitalization and the presence or absence of underscores.

SAM supports reading compressed input files. If any of the file name arguments to the options end in a .gz or .Z extension. SAM will read the files using the appropriate decompression program. If an input file does not exist and does not have a .gz or .Z extension is not found, SAM will try to read from a compressed file with one of these extensions.

a2mdots (0 or 1) (1):
By default (1), align2model will place dots in the sequence alignment to fill space need for other sequences' insertions. If set to 0, these dots are not printed. See Section 10.1.

adjust_score (0, 1 or 2) (2):
If set, scores are adjusted appropriately according to the SW method, and model and sequence length, so that final scores are somewhat independent of sequence length and model length. Currently, this only applies to fully local scoring, in which case, the log of the sum of the model and sequence lengths is added to each score. This parameter is used by hmmscore. See Section 10.2.3.

alignfile (string) ():
A file containing an alignment of sequences for use with modelfromalign, uniqueseq, and sortseq, or as an initial model for buildmodel. See Section 10.5. See Section 10.2.

align_fim (0 or 1) (0):
Add FIMs to the ends of a model generated by modelfromalign or an alignfile in buildmodel. See Section 10.5.

alignment_weights (string) ():
A file containing sequence weights for alignments used to form initial models with buildmodel or models with modelfromalign. See Section 9.4.

alignshort (integer) (-1):
When less than 0 (default), multiple domain search produces an alignment file that copies the entire sequence for each copy of the domain occurring within the sequence. When 0, only the region matching the model is printed. When greater than zero, that many characters to the left and the right of the domain are also printed to the file. In both cases, sequence IDs in the new file can be used to locate where the hmmscore found copies of the model. See Section 10.2.4.

alphabet (string) (protein):
This system supports 3 alphabets: DNA, RNA or protein. The protein alphabet is the default, and does not need to be specified. The abbreviation a may be used in place of alphabet. If unset, the first train, test, or db file is checked to see if the alphabet can be determined from the data. See Section 7.1.

alphabet_def (string) ():
The alphabet_def variable can be used to define an alphabet of 2 to 25 letters plus a (require) all-matching wildcard character. In the quoted string argument, both an alphabet name and the list of characters, with the wildcard last, must be specified. See Section 7.1.1.

anneal_length (float) (0.8):
Indicates the speed with which noise should be decreased to zero. If greater than 1, decrease linearly over anneal_length re-estimates. If less than one, decrease exponentially. See Section 9.1.

anneal_noise (float) (5):
Amount of noise to add to the model (decreased linearly or exponentially according to anneal_length. See Section 9.1.

auto_fim (0 or 1) (1):
Cause hmmscore and align2model to automatically add FIMs to the model before scoring when simple or complex null model subtraction is used or fully local scoring (SW is 2) is used. Also, in alignments, the FIM-induced delete state is automatically removed whenever auto_fim is set, regardless of whether or not FIMs were originally present in the model. See Section 10.2.

aweight_bits (float) (0.5):
Target bits per column to save in determining alignment sequence weighting. See Section 9.4.3.

aweight_exponent (float) (0.5):
Exponent in determining alignment sequence weighting. See Section 9.4.3.

aweight_method (0, 1, 2, or 3) (1):
Internal weighting method for initial alignment provided to buildmodel or modelfromalign. 0 (none), 1 (karplus), 2 (henikoff), or 3 (flat). See Section 9.4.3.

binary_output (0 or 1) (0):
Tells model-generating programs to write models in text format if set to 0 or a binary format if set to 1. Default is text or 0. See Section 8.4.4.

constraints (string) ():
Specify a constraints definition file to be read. This option maybe specified multiple times. See Section 9.6.

constraints_out (string) ():
Specify the name for a constraints definition file to create. See Section 9.6.

constraints_from_align (0 or 1) (0):
If a true value is specified, constraints will be be created for all aligned positions when a model is created from an alignment. See Section 9.6.

cutinsert (float) (0.5):
If this fraction of sequences use an insert state, surgery will replaced with one or more match states. See Section 9.2.

cutmatch (float) (0.5):
When fewer than this fraction of sequences use a match state, surgery will delete the state. See Section 9.2.

db (string) ():
A file containing sequences that are to be scored against a model in hmmscore or aligned to a model in align2model. Multiple instances of the db variable add to the list of database files, rather than replacing the previous db file name. See Section 10.2.

dbsize (integer) (0):
When greater than 0, this value is used in the calculation of E-values rather than the number of sequences that are read in to hmmscore. This is useful for correctly calculating E-values when multiple scoring runs are performed, and to avoid having to perform a complete reading of the database twice, once to calculate the number of sequences, and second time to score the sequences. See Section 10.2.

del_jump_conf (float) (1.0):
Confidence in the regularizer for transitions leaving a delete state. The regularizer's transition values are multiplied by this number. See Section 8.1.

distfile (string) ():
File with already-calculated sequence distances for use with the makehist, makeroc, makeroc2, and sortesq programs. See Section 10.2, Section 10.7, and Section 10.8.4..

distfile2 (string) ():
A second file with already-calculated sequence distances for use with the makehist, makeroc or makeroc2. See Section 10.7.

dpstyle (0, 1, 2, 3, or 4) (0):
Flavor of internal dynamic programming. 0 indicates forward-backward (EM) sum-of-all-paths, 1 indicates Viterbi single best path, 2 indicates EM with the posteriors saved in a .pdoc file, 3 indicates EM outputting the frequency counts of each sequence in its own .freq file, and 4 indicates most probable alignment (posterior-decoded alignment). If dpstyle is set for other than 1 or 4 for alignment, it is changed to 1 (Viterbi). The hmmscore program always uses Viterbi for selected alignments. See Section 9.5, Section 10.2, and Section 10.1.

dump_parameters (0, 1, or 2) (0):
Normally, only modified parameters are printed to the output file. If this is set to 1, all parameters are printed. If 2, and specified alone on the command line, buildmodel and align2model will dump parameters and exit. Because in this case an alphabet is not specified and a regularizer not created, a setting of 2 will not reveal the default regularizer. See Section 8.4.

Emax (float) (0.001):
When a selection variable includes 4 in its binary representation, Emax is used to determine what sequences are selected. Also, when select_score/seq=4, sequences with an E-value better than Emax are selected. See Section 10.2.

family_base_file (string) ():
If non-null, and sequence_weights and family_specific are specified, initial models are read in from the files whose names are created by appending .i.mod, where i is an integer corresponding to the family number. For example, if there are three families and the base name is test, the family models will be read in from test.0.mod, test.1.mod, and test.2.mod. The first model in the file (of any type, including MODEL, REGULARIZER, NULLMODEL, and FREQUENCIES) is used. An error will result if the models are of different lengths. See Section 9.4.

FIM_method_train (0, 1, 2, 3, 5, 6) (-1):
During the model building process, one may employ an initial model that contains FIMs. The table probabilities can readily be changed to reflect different distributions. Negative values only cause changes to the tables when models are created by the program, rather than being read in. The default setting of -1 uses the letter frequencies in the training set when generating new models. See Section 8.6.

FIM_method_score (0, 1, 2, 3, 5, 6) (-6):
Similar to FIM_method_train, except that the insert probabilities in the FIMs are changed before sequences are scored against the model. Negative values only cause changes to the tables when models are created by the program, rather than being read in. The default method of -6 uses the geometric average of match state probabilities. See Section 10.2.1.

fimstrength (float) (1.0):
A factor by which to multiply the FIM letter emission probabilities. If set to 2.0, for example, each letter will have twice the probability of being generated as in the normalized insert state. This can be used to encourage the use of FIMs. The value is also applied to simple null models. When set to a value less than 0, the absolute value of fimstrength is applied to all insert states, FIM or otherwise. See Section 8.5.

fimtrans (float) (0.0):
When 0.0, the FIM's insert to insert probability is 1.0. When greater than 0.0, a factor by which to multiply the model's geometric average match to match probability to produce the FIM's insert-to-insert probability. When less than 0.0, the FIM is adjusted as according to the absolute value of fimtrans, and the non-FIM insert-to-insert probability is set to p-(1-f)p2, where p is the regularized and normalized frequency counts for the transition and f is the FIM insert-to-insert transition. See Section 8.5.

fracinsert (float) (1.0):
When an insert state is being replaced, surgery will replace it with the average number or characters generated by the insert state multiplied by this number. See Section 9.2.

FREQUENCIES () ():
A model structure that has frequency counts rather than probabilities. Output by buildmodel if the print_frequencies parameter is set to 1. The drawmodel program is the only program that can use frequencies as input. See Section 8.4..

histbins (integer) (10):
Number of bins used by the makehist program. See Section 10.7.1.

id (string) ():
A sequence identifier, used to restrict align2model or hmmscore to only considering specific sequences. Multiple occurrences of the id parameter are added to the list of sequence identifiers, rather than replacing the value of id.

initial_noise (float) (-1.0):
When greater than zero, amount of noise to add for the first iteration. See Section 9.1.

ins_jump_conf (float) (1.0):
Confidence in the regularizer for transitions leaving an insert state. The regularizer's transition values are multiplied by this number. See Section 8.1.

insconf (float) (10000):
Confidence in the regularizer for character probabilities in an insert state. The high default means that the regularizer will overpower the actual counts determined by aligning sequences to the model. The regularizer's character insert values are multiplied by this number. See Section 8.1.

insert (string) ():
Insert another parameter file. The single character i may be used in place of insert. See Section 6.

insert_file_dna (string) ():
Insert another parameter file if the current alphabet has been set to DNA. This is particularly useful for alphabet-specific regularizers. See Section 6.

insert_file_protein (string) ():
Insert another parameter file if the current alphabet has been set to protein. This is particularly useful for alphabet-specific regularizers. See Section 6.

insert_file_rna (string) ():
Insert another parameter file if the current alphabet has been set to RNA. This is particularly useful for alphabet-specific regularizers. See Section 6.

Insert_method_train (0, 1, 2, 3, 5) (-1):
Similar to FIM_method_train except that the insert probabilities are changed in the nodes that are not FIMs. Negative values only cause changes to the tables when models are created by the program, rather than being read in. The default method -1 uses the letter frequencies in the training set when generating models. If the model or regularizer includes a GENERIC node, then its match and insert tables are also filled in with these values. See Section 8.6.

Insert_method_score (0, 1, 2, 3, 5, 6) (0):
Similar to FIM_method_score except that the insert probabilities are changed in the nodes that are not FIMs. Negative values only cause changes to the tables when models are created by the program, rather than being read in. The default method 0 is to not change the insert tables during scoring. See Section 10.2.1.

internal_weight (0, 1, 2) (1):
Use internal maximum discrimination sequence weighting. Automatically turned off if not explicitly set and external weights are used. See Section 9.4.4.

jump_in_prob (float) (1.0):
The probability cost of jumping into the center of the model when the SW option is set. See Section 10.2.3.

jump_out_prob (float) (1.0):
The probability cost of jumping out of the center of the model when the SW option is set. See Section 10.2.3.

kestrel_fallback (0 or 1) (1):
Enables or disables fallback into sequential mode if a Kestrel is board is not available after the specified number of retries or if features not implemented on Kestrel are requested.

kestrel_min_model_len (integer) (0):
Specifies the minimum model length to use with Kestrel implementation of hmmscore EM scoring. Models smaller than this value will be be scored using the sequential algorithm. This is useful as small models may be slower on Kestrel. See Section 10.2.

kestrel_num_pe (integer) (0):
If greater than zero, enables use of the the Kestrel simulator with the specified number of simulated processing elements. This option is useful when debugging Kestrel components of SAM or examining the results of small models and database when a Kestrel board is not available. See Section 10.2.

kestrel_remote_db_dir (integer) ():
Specifies the remote directory containing the sequence databases in Kestrel format. This should be in Windows-NT syntax, for example \\merlin\data. See Section 10.2.

kestrel_retry_cnt (integer) (0):
Specifies the number of times to retries if a Kestrel board is not available. See Section 10.2.

kestrel_retry_time (integer) (0):
Specifies the number of seconds to wait between retries when Kestrel board is not available. See Section 10.2.

mainline_cutoff (float) (0.5):
Changing this value will set both cutmatch and cutinsert to the new value. See Section 9.2.

many_files (0,1) (0):
When zero, all the output of buildmodel is sent to the .mod file. When set, the probability model, frequency model, and run statistics are printed to different files. See Section 5.

match_jump_conf (float) (1.0):
Confidence in the regularizer for transitions leaving a match state. The regularizer's transition values are multiplied by this number. See Section 8.1.

matchconf (float) (1.0):
Confidence in the regularizer for character probabilities in a match state. The regularizer's character match values are multiplied by this number. This variable is ignored if a prior library is used. See Section 8.1.

maxinserts (integer) (100):
In buildmodel, the maximum number of states inserted after any node by the surgery. See Section 9.2.

maxmem (integer) (0):
Maximum size of dynamic programming array to use for training and alignment. See Grice, Hughey, and Speck, and Tarnas and Hughey CABIOS papers for more information on the algorithm used. Depending on system configuration, performance may increase with higher values. If set to zero (the default), SAM will always use the smallest possible amount of space.

maxmodlen (integer) (0):
When starting with multiple, randomly generated models, the longest model to use. If set to 0 (the default), the value is calculated as 10% above the average sequence length when needed. See Section 8.4.1.

mdNLLnull (float) (-10.0):
Criterion by which subsequences are judged to be matches to a single motif (model) during a multiple domain alignment if there is a 1 in the bit pattern of select_md. All occurrences for which NLL-NULL is better than the specified value are considered matches. See Section 10.2.4.

mdNLLcomplex (float) (-10.0):
Criterion by which subsequences are judged to be matches to a single motif (model) during a multiple domain alignment if there is a 2 in the bit pattern of select_md. All occurrences for which NLL-NULL complex or reverse null model score is better than the specified value are considered matches. See Section 10.2.4.

mdEmax (float) (0.01):
Criterion by which subsequences are judged to be matches to a single motif (model) during a multiple domain alignment if there is a 4 in the bit pattern of select_md. All occurrences for which reverse sequence null model e-value is better than the specified value are considered matches. See Section 10.2.4.

minmodlen (integer) (0):
When starting with multiple, randomly generated models, the shortest model to use. If set to 0 (the default), the value is calculated as 10% below the average sequence length when needed. See Section 8.4.1. See Section 8.4.1.

MODEL () ():
Specify an initial model. See Section 8.4..

model_abort_length (integer) (10000):
In buildmodel, if the initial model length is greater than this number, an error message is printed and the program is aborted. This is to avoid giant models that will never complete training because of their memory or execution time requirements.

model_file (string) ():
If non-null, this file is read for an initial model. The first model in the file (of any type, including MODEL, REGULARIZER, NULLMODEL, and FREQUENCIES) is used. This will override any models present in inserted files. See Section 5.

modellength (integer) (-1):
When greater than 0, sets the model length to a specific value in buildmodel. (overridden if a model or regularizer without a GENERIC node is present). If equal to 0 and maxmodlen is less than 1, all model lengths are set to the average length of the training sequences. If less than 0, model length(s) are set to a random value between minmodlen and maxmodlen according to randseed. These two bounds will default to 90% and 110% of average sequence length if maxmodlen is less than 1. See Section 8.4.1.

Motifcutoff (float) (0.5):
In multiple motif search, fragments which are smaller than this fraction of the model length are not considered for further processing. Further, processing stops if a fragment of length less than the square of Motifcutoff is the best match (this is needed when using SW scoring with weak thresholds). See Section 10.2.4.

NLLnull (float) (-10.0):
If a selection variable is odd, this value is checked against a sequence's simple null model score. See Section 10.2.

NLLcomplex (float) (-10.0):
If a selection variable includes 2 in its binary representation, this value is checked against a sequence's complex, user, or reverse sequence null model score. See Section 10.2.

NLLfile (string) ():
Alias for distfile.

NLLfile2 (string) ():
Alias for distfile2

Nmodels (integer) (3):
Multiple initial models can be trained simultaneously, with the best one being used for surgery and further training. See Section 8.4.1.

NscoreSeq (integer) (100000):
Maximum number of sequences to be read by the hmmscore or align2model program.

Nseq (integer) (10000):
Maximum number of sequences to be read from any of the up to four sequence files or a database files in buildmodel. See Section 7.3.

nsurgery (integer) (3):
Maximum number of surgeries to perform. Each surgery will result in a full EM cycle until stopcriterion or reestimates is reached.

Ntrain (integer) (0):
Number of sequences to train on. If zero, all sequences that were read from the files train and train2 (up to a limit of Nseq per file) form the training set. If Ntrain is greater than than the number of sequences read in from the files train, train2, test, and test2, all sequences are used for training. If Ntrain is less than the total number of sequences read in from the four files, all the sequences are randomly partitioned (using trainseed) into the training set with Ntrain sequences, and of the remaining sequences (i.e., whether or not a sequence occured in a training file or a test file is ignored). See Section 7.3.

nucleotide_prior (string) ():
The prior library to use if the RNA or DNA sequences are being modeled and prior_library has not been set. See Section 8.1.

NULLMODEL () ():
Identifies a user defined null model in a model file. The parameter subtract_null must be set to 3 to use this null model. See Section 10.2.

nullmodel_file (string) ():
If non-null, this file is read for a complex null model. The first model in the file (of any type, including MODEL, REGULARIZER, NULLMODEL, and FREQUENCIES) is used. This will override any null models present in inserted files. To use this null model, subtract_null must be set to 3. See Section 5.

percent_id (float) (1.0):
For alignments passed to uniqueseq, specifies fraction identity to use for deleting sequences. See Section 10.8.5.

plotcolumn (integer) (3):
Column of score file to use in calculating plots. Length (0), simple null model (1), complex or reverse null model (2), or Evalue (3). See Section 10.7.

plotleft (float) (0.0):
Lowest X axis value on a graph generated by gnuplot. The X axis is calculated internally if plotleft=plotright. Used in conjunction with makehist, makeroc and makeroc2. See Section 10.7.

plotline (float) (0.0):
Creates a vertical line at this value in a graph generated by gnuplot if plotline is nonzero. Used in conjunction with makehist, makeroc and makeroc2. See Section 10.7.

plotmax (float) (0):
Highest Y axis value on a graph generated by gnuplot. The Y axis is calculated internally if plotmax=plotmin. Used in conjunction with makehist, makeroc and makeroc2. See Section 10.7.

plotmin (float) (0):
Lowest Y axis value on a graph generated by gnuplot. The Y axis is calculated internally if plotmax=plotmin. Used in conjunction with makehist, makeroc and makeroc2. See Section 10.7.

plotnegate (int) (0):
Negates the scores on a graph generated by gnuplot if set to 1. Used in conjunction with makehist, makeroc and makeroc2. See Section 10.7.

plotps (int) (1):
Creates a postscript file runname.ps if set to 1. When set to 0, only a .plt file is generated. A square plot postscript file is generated for a setting of 2. For options 1 and 2, the .data and .plt files used to create the postscript file are deleted. When set to 3, the postscript file is generated and the .data and .plt files are retained. Used in conjunction with makehist, makeroc and makeroc2. See Section 10.7.

plotright (float) (0.0):
Highest X axis value on a graph generated by gnuplot. The X axis is calculated internally if plotleft=plotright. Used in conjunction with makehist, makeroc and makeroc2. See Section 10.7.

print_all_models (0 or 1) (0):
When set, models are printed after each iteration of the forward-backward procedure. Models are printed to files of the form runname.a.mrrr.mod, where `mrrr' is the catenation of the number of the model (or 1 if only one model is being estimated at a time) and the re-estimate number. This variable can be toggled at runtime by sending a SIGUSR2 signal to the program, providing a means to look at intermediate results while the program is running or checkpointing a program run.

print_all_weights (0 or 1) (0):
When set, a weight output file is generated after each iteration of the forward-backward procedure. Weights are printed to files of the form runname1.weightoutput, where `1' is the number of the iteration.
print_frequencies (0 or 1) (0):
If this option is set, the frequency counts for each state will be printed as well as the model.

print_surg_models (0 or 1) (0):
When set, models are printed after each surgery (surgery occurs after a sequence of EM re-estimates). Models are printed to files of the form runname.s.rr.mod, where `rrr' is the re-estimation index for the run. When surgery is used, a single winning model is automatically selected after the first EM re-estimation loop if multiple initial models are used. This variable can be toggled at runtime by sending a SIGUSR1 signal to the program.

prior_library (string) ():
When set, use Dirichlet mixture priors to regularizer the models. Transition costs and insert states are still regularized by the default (or specified) regularizer, but match states are regularized with Dirichlet mixtures. The matchconf variable is ignored if a prior library is used, in favor of the prior_weight variable. If prior_library is not set and protein_prior or nucleotide_prior is set, the indicated prior library is used. See Section 8.1.

prior_weight (float) (1.0):
Weight of the prior library, if it is used. See Section 8.1.

protein_prior (string) (recode4.20comp):
The prior library to use if the proteins are being modeled and prior_library has not been set. See Section 8.1.

randseed (integer) (-1):
Random seed for noise generation and for selection of initial model lengths if modellength is less than one. The default value causes the process's pid to be used, which will then be printed to the output file to enable replication of results.

rdb (0 or 1) (0):
Create the score file in RDB format with the extension .dist-rdb rather than the standard .dist format. See Section 10.2.

randomize (integer) (50):
Determines how noise is added to the model. See Section 9.1.

read_smooth (0 or 1) (0):
Tells hmmscore whether or not to read a smooth curve from smooth_file, or its default (runname.smooth). See Section 10.2.

reestimates (integer) (40):
Maximum number of re-estimates to perform after a surgery. Generally, this should be set higher than the number of iterations that have noise. See Section 9.

reglength (integer) (-1):
Similar to modellength, sets the length of the regularizer. Usually not needed. See Section 8.4.1.

REGULARIZER () ():
Specify an initial regularizer. See Section 8.4.

regularizer_file (string) ():
If non-null, this file is read for a single-component regularizer. The first model in the file (of any type, including MODEL, REGULARIZER, NULLMODEL, and FREQUENCIES) is used. This will override any regularizers present in inserted files. See Section 5.

rerun (integer) (-1):
The program optimizes Nmodels models until the first `surgery', and then continues with the best one. Sometimes it is interesting to see how the second best would have done. If the second best is number 4 (starting from 0!), a setting this parameter to 4 would optimize that model. Models can also be accessed using one print_all_models.

retrain_noise_scale (float) (0.1):
If an initial model or alignment is passed to buildmodel, initial_noise (or anneal_noise if initial_noise is unspecified)is scaled by this multiplier, which must be between 0.0 and 1.0. See Section 9.1.

segments (integer) (1):
Number of segments hmmscore should logically split database into. Segmentation is based on number of sequences. See Section 10.2.5.

segment_number (integer) (1):
Segment number among segments. See Section 10.2.5.

segment_size (integer) (100):
Number of sequences read in at a time and given to one of the segments. See Section 10.2.5.

select_align (integer) (0):
Tells hmmscore what selection criteria should be used for placing aligned sequences into the file runname.a2m. If 0, no sequences are selected; if 1, sequences are selected according to their simple null model scores and NLLNull; if 2, sequences are selected according to their complex, user, or reverse sequence null model score and NLLcomplex; if 4, sequences are selected according to their E-values and Emax; if 8, all sequences are selected. Selection criteria can be combined: 3 requires sequences to score better than NLLnull with the simple null model and NLLcomplex with the complex null model. Negative numbers indicate that sequences that do not pass the corresponding positive test should be selected. See Section 10.2.

select_mdalign (integer) (0):
Tells hmmscore what selection criteria should be used for performing a multiple multiple domain check on a scored sequence. Sequences that pass the select_mdalign criteria are analyzed and record is they pass the select_md criteria during the mult-domain Viterbi alignment pass. See Section 10.2.

select_md (integer) (1):
Tells hmmscore what selection criteria should be used treating a multiple domain alignment as found, in which case the alignment is written to runname.mult with scores in runname.mstat. Functions as with select_align with the variables mdNLLnull, mdNLLcomplex, and mdEmax. Only sequences that pass the selection criteria (which is always based on Viterbi scores) are recorded in the files. The default is to require passing the simple null model test. See Section 10.2.4.

select_score (integer) (8):
Tells hmmscore what selection criteria should be used for listing sequence scores in the file runname.dist. Functions as with select_align. See Section 10.2.

select_seq (integer) (0):
Tells hmmscore what selection criteria should be used for placing sequences in the file runname.sel. Functions as with select_align See Section 10.2.

sequence_models (float) (0.0):
Build initial models from randomly-selected sequences in the training set when greater than zero. Value indicates the weight the sequence should have when combined with the regularizer. See Section 8.3.

sequence_warning (integer) (0):
Primarily for debugging. Set to -1 to print out all sequences in which a `wrong' letter was found, or to -2 to print out all sequences.

sequence_weights (string) ():
File to read for sequence weights. See Section 9.4.

simple_threshold (integer) (0):
Complex, user, and reverse sequence scores will not be calculated by hmmscore unless the simple null model score is less than this number. Set to 10000 to require all scores to be calculated. See Section 10.2.1.

sort (integer) (4):
Indicates whether or not sequence scores should be sorted by hmmscore. With a value of 1, sequences are sorted by column 1 (simple null model score). With a value of 2, sequences are sorted by column 2 (other null model selections; see subtract_null). With a value of 4, sequences are sorted by E-value if available or by column 1. When negative, scores are sorted in reverse order, worst first. When 0, scores are not sorted. Sort also indicates whether or not uniqueseq should sort sequence IDs and sequences to check for uniqueness. See Section 10.2 and Section 10.8.5.

stopcriterion (float) (0.1):
The re-estimation loop will stop whenever the improvement in the NLL score is less than this number (provided noise is less than 10% of its original value for that iteration), or when the maximum number of reestimates is reached. See Section 9.

subtract_null (integer) (1):
In hmmscore and other programs, decides the type of null model to be used. In score files, this will be the second score column (the first is always the simple null model). When set to 0, raw scores are reported in the second column. Setting to 1 provides simple null model scores; to 2, complex null model scores; to 3, user's input null model; and to 4, the reverse sequence null model.

surgery_noise_scale (float) (0.1):
After the first surgery, anneal_noise is scaled by this multiplier, which must be between 0.0 and 1.0. See Section 9.1.

SW (integer) (0):
When set to 1, hmmscore uses submodel to sequence (semilocal) scoring . When set to 2, hmmscore uses submodel to subsequence (local) scoring. When set to 3, hmmscore uses model to subsequence (domain) scoring. Can also be used with align2model and buildmodel. Similar to the Smith and Waterman method. See Section 10.2.3.

test (string) ():
A file to read test sequences from. See Section 7.3.
test2 (string) ():
A second file to read test sequences from. See Section 7.3.

trainseed (integer) (-1):
Random seed for partitioning the sequences into the test set and the training set. The default value causes the process's pid to be used, which will then be printed to the output file to enable replication of results. See Section 7.3.

train (string) ():
A file to read training sequences from. See Section 7.3.
train2 (string) ():
A second file to read training sequences from. See Section 7.3.

train_reset_inserts (0,1,2,3, or 6) (6):
At the end of buildmodel training, all insert and FIM character tables are set according to this variable, which takes on the same meanings as FIM_method_train. The default setting is to set all insert and FIM tables to the normalized geometric average of the match state costs. See Section 8.6.

trans_priors (string) ():
The name of the structure-specific transition prior library to use when structural information for transition probability estimation is to be used for HMM estimation. See Section 8.1.2.

transweight (float) (1.0):
A multiplier that affects the influence of the pseudocounts generated by the structure-specific transition priors. See Section 8.1.2.

template (string) ():
For use with the structure-specific transition prior library. A three- column file (amino acid sequence, secondary structure, accessibility) that is used during HMM estimation to assign a structural environment to each model node. See Section 8.1.2.

use_kestrel (0 or 1) (0):
If 1, use the Kestrel implementation of the hmmscore scoring algorithm. See Section 10.2.

weight_final (float) (1.0):
The final (steady-state) multiplier of sequence weights. The default (1.0) means that, if no sequence weight file is used, each sequence is weighted as being one sequence. If a weight file is used, all values in that file are multiplied by this value. See Section 9.4 and Section 9.1.

weight_length (float) (0):
An annealing schedule for the sequence weight multiplier. If greater than 1.0, the weight multiplier is increased from zero linearly over weight_length re-estimates. If less than one, increase exponentially. See Section 9.4 and Section 9.1.


next up previous contents
Next: About this document ... Up: SAM Sequence Alignment and Previous: 11. System installation
SAM
sam-info@cse.ucsc.edu
UCSC Computational Biology Group