Model Specification


Specify a physical model

NUPACK 4 analysis and design jobs are run based on a physical model created using the Model class:

model1 = Model(material='rna', ensemble='stacking', celsius=37,
    sodium=1.0, magnesium=0.0)

Any unspecified properties take on their default values (which happen to be the ones specified for model1 above).


Model options

The valid options for each property are described below.

Material

NUPACK 4 algorithms use the following temperature-dependent RNA and DNA free energy parameter sets specified by the keyword material (default: material='rna'):

  • rna06 (shorthand: rna) Based on [Mathews99] and [Lu06] with additional parameters [Xia98,Zuker03] including coaxial stacking [Mathews99,Turner10] and dangle stacking [Serra95,Zuker03,Turner10] in 1M Na^+.

  • rna95 Based on [Serra95] with additional parameters [Zuker03] including coaxial stacking [Mathews99,Turner10] and dangle stacking [Serra95,Zuker03,Turner10] in 1M Na^+.

  • dna04 (shorthand: dna) Based on [Santalucia98] and [Santalucia04] with additional parameters [Zuker03] including coaxial stacking [Peyret00] and dangle stacking [Bommarito00,Zuker03] in user-specified concentrations of Na^+ and Mg^{++} [Santalucia98,Peyret00,Santalucia04].

  • custom-parameters Custom parameters provided in a JSON file (e.g., custom-parameters.json) using the same format as the provided parameter files. Provide \Delta G_{37}(\mathrm{loop}) and \Delta H(\mathrm{loop}) values to allow calculations at different temperatures or only \Delta G(\mathrm{loop}) values to allow calculations at one temperature. Place the JSON file in the same directory as the default parameter files (specify material = 'custom-parameters') or specify the full path to the file (material = 'path/to/my/custom-parameters.json').

Free energies are expressed in kcal/mol. Base pairs are either Watson-Crick pairs (G\cdot C and A\cdotU for RNA; G\cdotC and A\cdotT for DNA) or wobble pairs (G\cdotU for RNA). Note that for DNA, G and T form a mismatch and not a wobble pair [Santalucia04].

DNA/RNA hybrids are not allowed.


Stacking

NUPACK 4 algorithms perform calculations on the following complex ensembles specified by the keyword ensemble (default: ensemble='stacking'):

  • stacking Complex ensemble with coaxial and dangle stacking (ensemble \overline\Gamma^\shortparallel(\phi)).

  • dangle-stacking Complex ensemble with dangle stacking.

  • coaxial-stacking Complex ensemble with coaxial stacking.

  • nostacking Complex ensemble without coaxial and dangle stacking (ensemble \overline\Gamma(\phi)).


Temperature

  • celsius Temperature is specified in ^\circC using the keyword celsius (default: celsius=37).
  • kelvin Alternatively, the temperature can be specified in K using the keyword kelvin.

Salt

The default salt conditions for RNA and DNA parameter sets are [\mathrm{Na}^+] = 1 {\rm M}; these are the only salt conditions for RNA. Salt corrections are available for DNA parameters to permit calculations in user-specified sodium, potassium, ammonium, and magnesium ion concentrations.

  • sodium Based on [Santalucia98,SantaLucia04] the sum of the concentrations of (monovalent) sodium, potassium, and ammonium ions, [{\rm Na}^+] + [\mathrm{K}^+] + [\mathrm{NH}_4^+], is specified in units of molar (default: 1.0, range: [0.05,1.1]) using the keyword sodium.

  • magnesium Based on [Peyret00,Koehler05] the concentration of (divalent) magnesium ions, [{\rm Mg}^{++}], is specified in units of molar (default: 0.0, range: [0.0,0.2]) using the keyword magnesium.


Examples

  • Define a model for DNA calculations at 23 ^\circC in [{\rm Na}^{+}]= 0.5 M and [{\rm Mg}^{++}]= 0.01 M:

model2 = Model(material='dna', celsius=23, sodium=0.5, magnesium=0.01)
Note that ensemble is unspecified so it defaults to ensemble='stacking'.

  • Define a model using custom parameters at 45 ^\circC without coaxial and dangle stacking:
model3 = Model(material='path/to/my/custom-parameters.json',
    ensemble='nostacking', celsius=45)

Historical options

For backwards compatibility with NUPACK 3, the following historical complex ensembles without coaxial stacking and with approximate dangle stacking are supported:

  • none-nupack3 No dangle stacking and no coaxial stacking (dangles none option for NUPACK 3)

  • some-nupack3 Some dangle stacking and no coaxial stacking (dangles some option for NUPACK 3). A dangle energy is incorporated for each unpaired base flanking a duplex (a base flanking two duplexes contributes only the minimum of the two possible dangle energies).

  • all-nupack3 All dangle stacking and no coaxial stacking (dangles all option for NUPACK 3). A dangle energy is incorporated for each unpaired base flanking a duplex (a base flanking two duplexes contributes both possible dangle energies).

For these historical ensembles, base pairs are either Watson-Crick pairs (G\cdotC and A\cdotU for RNA; G\cdotC and A\cdotT for DNA) or wobble pairs (G\cdotU for RNA; G\cdotT for DNA). Note that for the historical ensembles, G\cdotT is classified as a DNA wobble pair and not as a mismatch. The historical ensembles prohibit a wobble pair (G\cdotU or G\cdotT) as a terminal base pair in an exterior loop or a multiloop. As a result, an attempt to evaluate a free energy for a sequence \phi and secondary structure s that place a wobble pair as a terminal base pair in an exterior loop or multiloop will return \overline{\Delta G}(\phi,s)=\Delta G(\phi,s) = \infty. These historical ensembles can be used for calculations in combination with the following historical DNA and RNA parameter sets:

  • rna95-nupack3 Same as rna95 except that terminal mismatch free energies in exterior loops and multiloops are replaced by two dangle stacking free energies.

  • dna04-nupack3 Same as dna04 except that G\cdotT was treated as a wobble pair (analogous to a G\cdotU RNA wobble pair) instead of classifying G and T as a mismatch. Note that while terminal mismatch free energies in exterior loops and multiloops are replaced by two dangle stacking free energies, this is the same treatment as in dna04, as terminal mismatch parameters are not public for DNA [Santalucia04].

  • rna99-nupack3 Parameters from [Mathews99] with terminal mismatch free energies in exterior loops and multiloops replaced by two dangle stacking free energies. Parameters are provided only for 37 ^\circC.

Compute loop free energy

The loop_energy method operates on a Model object to calculate the loop free energy in kcal/mol. The loop sequence is specified with keyword loop and the loop structure is specified with keyword structure. For example:

my_model = Model(material='RNA', ensemble='stacking')

#Calculate the free energy of an unstructured strand
dGloop2 = my_model.loop_energy(loop='AAUU', structure='....')
print(dGloop2)
# --> 0.0

#Calculate the free energy of a hairpin loop
dGloop3 = my_model.loop_energy(loop='AACCCUU', structure='(.....)')
print(dGloop3)
# --> 5.15

#Calculate the free energy of an exterior loop
dGloop4 = my_model.loop_energy(loop='AA+UU', structure='((+))')
print(dGloop4)
# --> -0.9

#Calculate the free energy of a multiloop
dGloop5 = my_model.loop_energy(loop='AAU+ACU+AGU', structure='(.(+).(+).)')
print(dGloop5)
# --> 9.355

Compute stacking state free energies

The stack_energies method operates on a Model object to calculate the stacking state free energies for the subensemble of stacking states in a single loop. The loop sequence is specified with keyword loop and the loop structure is specified with keyword structure. The algorithm returns a list of stacking states and the free energy for each in kcal/mol.

For a loop defined as a list of N snippets, a stacking state is specified as a string composed of one letter per snippet. For each snippet, the returned letter is:

  • 's' if the snippet contains only 2 nucleotides, each base-paired to a nucleotide in the adjacent snippet, with the two base pairs coaxially stacked on each other
  • 'b' if both the 5' and 3' unpaired nucleotides are dangle stacking on adjacent base pairs
  • '5' if only the 5'-most unpaired base is dangle stacking on its adjacent base pair
  • '3' if only the 3'-most unpaired base is dangle stacking its adjacent base pair
  • 'n' if none of the above apply (i.e., the snippet does not have a dangle at either the 5' or 3' end nor does it contain only 2 adjacent nucleotides participating in a coaxial stack)

For example, the following figures illustrate snippet annotations for coaxial and dangle stacking states in representative multiloops and exterior loops:

Stacking notation

For a specified multiloop or exterior loop sequence and structure, the stack_energies method returns a set of stacking state strings each with a corresponding stacking state free energy (kcal/mol):

# Calculate the dangle stacking state free energies for an exterior loop
my_model.stack_energies(loop='CA+UC', structure='.(+).')
# --> {'35': -0.15, '3n': 0.15, 'n5': 0.35, 'nn': 0.45}

# Calculate the coaxial stacking state free energies for an exterior loop
my_model.stack_energies(loop='AA+U+U', structure='((+)+)')
# --> {'nnn': 0.9, 'snn': 0.0}

# Calculate the coxial stacking state free energies for a multiloop
my_model.stack_energies(loop='AU+AU+AU', structure='((+)(+))')
# --> {'nnn': 11.9725, 'nns': 10.8725, 'nsn': 10.8725, 'snn': 10.8725}

For loops that are not multiloops or exterior loops, the loop free energy is returned with a string indicating that there is no stacking state. For example, for a hairpin loop:

my_model.stack_energies(loop='AAAAU', structure='(...)')
# --> {'n': 5.85}
Mathews99

Mathews D.H., Sabina J., Zuker M., Turner D.H.: Expanded Sequence Dependence of Thermodynamic Parameters Improves Prediction of RNA Secondary Structure. J. Mol. Biol.. 288, (1999)

Lu06

Lu Z.J., Turner D.H., Mathews D.H.: A Set of Nearest Neighbor Parameters for Predicting the Enthalpy Change of RNA Secondary Structure Formation. Nucleic acids research. 34, (2006)

Xia98

Xia T., SantaLucia J., Burkard M., Kierzek R., Schroeder S., Jiao X., Cox C., Turner D.: Thermodynamic Parameters for an Expanded Nearest-Neighbor Model for Formation of RNA Duplexes with Watson-Crick Base Pairs. Biochemistry. 37, (1998)

Zuker03

Zuker M.: Mfold Web Server for Nucleic Acid Folding and Hybridization Prediction. Nucleic Acids Res.. 31, (2003)

Turner10

Turner D.H., Mathews D.H.: NNDB: The Nearest Neighbor Parameter Database for Predicting Stability of Nucleic Acid Secondary Structure. Nucleic Acids Res.. 38, (2010)

Serra95

Serra M.J., Turner D.H.: Predicting Thermodynamic Properties of RNA. Methods Enzymol.. 259, (1995)

Santalucia98

SantaLucia J.: A Unified View of Polymer, Dumbbell, and Oligonucleotide DNA Nearest-Neighbor Thermodynamics. Proc. Natl. Acad. Sci. USA. 95, (1998)

Santalucia04

SantaLucia J., Hicks D.: The thermodynamics of DNA structural motifs. Annu. Rev. Biophys. Biomol. Struct.. 33, (2004)

Peyret00

Peyret N.: Prediction of Nucleic Acid Hybridization: Parameters and Algorithms. (2000)

Bommarito00

Bommarito S., Peyret N., SantaLucia J.: Thermodynamic Parameters for DNA Sequences with Dangling Ends. Nucleic Acids Res.. 28, (2000)

SantaLucia04

SantaLucia J., Hicks D.: The thermodynamics of DNA structural motifs. Annu. Rev. Biophys. Biomol. Struct.. 33, (2004)

Koehler05

Koehler R.T., Peyret N.: Thermodynamic Properties of DNA Sequences: Characteristic Values for the Human Genome. Bioinformatics. 21, (2005)