Utilities Jobs

Utilities commands analyze or design a single complex ensemble. For each command, the strand ordering of the complex is specified using keyword strands and the physical model is specified using keyword model. For commands that require a structure (e.g., calculation of the equilibrium structure probability using prob), the structure is specified using the keyword structure. By default, NUPACK utilities jobs run in parallel.

To initialize a model for the following examples, run the following code:

my_model = Model(material='RNA')

Compute partition function

pfunc calculates the partition function of the complex as well as the free energy of the complex:

partition_function = pfunc(strands=['CCC', 'GGG'], model=my_model)
print(partition_function)
# --> (Decimal('4525.512868'), -5.187871791642832)

Compute structure free energy

structure_energy calculates the structure free energy for the specified secondary structure:

dGstruc = structure_energy(strands=['AAAA', 'UUUU'], structure='((((+))))',
    model=my_model)
print(dGstruc)
# --> -0.18135141907945873

Compute equilibrium structure probability

structure_probability calculates the equilibrium structure probability of a specified secondary structure contained in the complex ensemble:

probability = structure_probability(strands=['CCC', 'GGG'], structure='(((+)))',
    model=my_model)
print(probability)
# --> 0.7152766753194949

Compute Boltzmann-sampled structures

sample calculates a set of Boltzmann-sampled structures from the complex ensemble. The number of structures is specified using the keyword num_sample:

sampled_structures = sample(strands=['CCC', 'GGG'], num_sample=3, model=my_model)
print(sampled_structures)
# --> [Structure('(((+)))'), Structure('(((+)))'), Structure('(((+)))')]

Compute equilibrium base-pairing probabilities

pairs calculates the matrix of equilibrium base-pairing probabilities:

probability_matrix = pairs(strands=['CCC', 'GGG'], model=my_model)
print(probability_matrix)
# -->
# [[0.1002 0.0000 0.0000 0.0007 0.1474 0.7518]
#  [0.0000 0.0037 0.0000 0.1474 0.8307 0.0182]
#  [0.0000 0.0000 0.1904 0.7910 0.0185 0.0001]
#  [0.0007 0.1474 0.7910 0.0609 0.0000 0.0000]
#  [0.1474 0.8307 0.0185 0.0000 0.0035 0.0000]
#  [0.7518 0.0182 0.0001 0.0000 0.0000 0.2299]]

(Convert the result to a numpy array via probability_matrix.to_array().)

Compute MFE proxy structure(s)

mfe calculates MFE proxy structure. The algorithm returns the MFE proxy secondary structure, the free energy of the MFE stacking state, and the free energy of the MFE proxy secondary structure:

mfe_structures = mfe(strands=['CCC', 'GGG'], model=my_model)
print('Free energy of MFE proxy structure: %.2f kcal/mol' % mfe_structures[0].energy)
print('MFE proxy structure in dot-parens-plus notation: %s' % mfe_structures[0].structure)
print('MFE proxy structure as structure matrix:\n%s' % mfe_structures[0].structure.matrix())

Output:

Free energy of MFE proxy structure: -4.98 kcal/mol
MFE proxy structure in dot-parens-plus notation: (((+)))
MFE proxy structure as structure matrix:
[[0 0 0 0 0 1]
 [0 0 0 0 1 0]
 [0 0 0 1 0 0]
 [0 0 1 0 0 0]
 [0 1 0 0 0 0]
 [1 0 0 0 0 0]]

If there is more than one MFE stacking state, the algorithm returns a list of the corresponding MFE proxy secondary structures, each with the free energy of the MFE proxy secondary structure and the (same) free energy of the MFE stacking state.


Compute suboptimal proxy structures

subopt calculates the set of suboptimal proxy structures with a stacking state within a specified free energy gap of the MFE stacking state. The (non-negative) free energy gap is specified with keyword energy_gap in kcal/mol. The algorithm returns a list of suboptimal proxy secondary strutures, each with the free energy of the suboptimal proxy secondary structure and with the free energy of its lowest-energy stacking state that falls within the energy gap:

subopt_structures = subopt(strands=['CCC', 'GGG'], energy_gap=1.5, model=my_model)
print(subopt_structures)
# --> [StructureEnergy(Structure('(((+)))'), energy=-4.981351375579834, stack_energy=-4.981351375579834),
#      StructureEnergy(Structure('((.+)).'), energy=-4.000725746154785, stack_energy=-3.781351089477539)]

Compute complex ensemble size

ensemble_size calculates the complex ensemble size in terms of either number of secondary structures or number of stacking states. Specify a physical model with nostacking to obtain the number of secondary structures:

num_struc = ensemble_size(strands=['CCC', 'GGG'],
    model=Model(material='RNA', ensemble='nostacking'))
print(num_struc)
# --> 18

Specify a physical model with stacking to obtain the number of stacking states:

num_stack = ensemble_size(strands=['CCC', 'GGG'],
    model=Model(material='RNA', ensemble='stacking'))
print(num_stack)
# --> 90

Design a sequence

des performs complex design to generate a sequence intended to adopt a target secondary structure at equilibrium within the ensemble of the complex. The strand ordering of the complex can be specified using IUPAC degenerate nucleotide codes to incorporate any sequence constraints (the strand ordering can be omitted if there are no sequence constraints). The target structure is specified using keyword structure:

# design a sequence without sequence constraints
designed_sequence1 = des(structure='(((+)))', model=my_model)
print(designed_sequence1)
# --> ['GGC', 'GCC']

# alternative specification to design a sequence without sequence constraints
designed_sequence1 = des(strands=['NNN','NNN'], structure='(((+)))', model=my_model)
print(designed_sequence1)
# --> ['GGC', 'GCC']

# design a sequence with sequence constraints
designed_sequence2 = des(strands=['HHH','BBW'], structure='(((+)))', model=my_model)
print(designed_sequence2)
# --> ['ACC', 'GGU']

Compute complex ensemble defect

defect evaluates the normalized complex ensemble defect with respect to the structure specified using keyword structure:

ensemble_defect = defect(strands=['CCC', 'GGG'], structure='(((+)))', model=my_model)
print(ensemble_defect)
# --> 0.20883411169052118

Represent a structure

A secondary structure can be defined using any of three notations (keyword Structure):

s1 = Structure('((((((((((((+..........))))))))))))') # dot-parens-plus notation
s2 = Structure('(12+.10)12') # run-length-encoded dot-parens-plus notation
s3 = Structure('D12 (+ U10)') # DU+ notation

Any object or command that accepts a structure as an argument (e.g, TargetComplex in Design or structure_probability in Utilities) will accept either a structure defined in one of the above three notations, or a previously defined Structure object:

dGstruc = structure_energy(strands=['AAAA', 'TTTT'], structure='((((+))))',
    model=my_model)

my_struc = Structure('((((+))))')
dGstruc = structure_energy(strands=['AAAA', 'TTTT'], structure=my_struc,
    model=my_model)

Structure supports the following methods to assist with structure representation:

  • pairlist(): A pair list contains a list S of zero-based indices such that S_i = j if bases i and j are paired, and S_i = i if base i is unpaired.
  • matrix(): A structure matrix of the structure.
  • nicks(): A list of zero-based indices of each base 3' of a nick between strands (one entry per strand)
  • dotparensplus(): Representation of the structure in dot-parens-plus notation.
  • rle_dotparensplus(): Representation of the structure in run-length-encoded dot-parens-plus notation.

For example:

s4 = Structure('(((+))).')

print(s4.pairlist())          # --> [5 4 3 2 1 0 6]

print(s4.matrix())            # --> [[0 0 0 0 0 1 0]
                              #      [0 0 0 0 1 0 0]
                              #      [0 0 0 1 0 0 0]
                              #      [0 0 1 0 0 0 0]
                              #      [0 1 0 0 0 0 0]
                              #      [1 0 0 0 0 0 0]
                              #      [0 0 0 0 0 0 1]]
print(s4.nicks())             # --> [3 7]
print(s4.dotparensplus())     # --> (((+))).
print(s4.rle_dotparensplus()) # --> (3+)3.

Compute sequence distance

seq_distance calculates the sequence distance for two sequences that have the same number of nucleotides:

seq_distance('ACGUUUU+ACCC','ACGUUUU+AGGG') # --> 3

seq_distance('G5', 'G3C2') # --> 2

Compute structure distance

struc_distance calculates the structure distance for two structures that have the same number of nucleotides:

struc_distance('((((((((+..........))))))))', '(((((((.+...........)))))))') # --> 2

struc_distance('.15', '(5.5)5') # --> 10

Note

Note that sequence distance and structure distance are defined independent of whether the nick locations match between two sequences or two structures. However, seq_distance and struc_distance will return a warning if the nick locations do not match.