Polymerase Chain Reaction (PCR) is a widely used technology in molecular biology for DNA amplification. To generate multiple copies of a DNA molecule, a pair of primers (two synthesized DNA sequences with a total length of 15-30 bases) are annealed to the boundaries of the targeted DNA molecule. Then, the new replicated DNA fragment elongates from one primer to the other.
Though primers always hybridize to their respective complements within DNA sequences, primer pairs for targeted DNA sequences can also anneal non-targeted DNA fragments containing common DNA sub-sequences also found in targeted DNA molecules. During the PCR process, primer pairs that offer high specificity and coverage rates for targeted fragments among all the copies are preferred.
To provide primer pairs with high selectivity, several computational algorithms have been proposed. Most state-of-the-art algorithms take into account signature primers, or common short DNA fragments in the targeted DNA molecules. However, these algorithms do not account for the fact that during the PCR process in which primer pairs designed using signature primers are used, DNA fragments that do not have signature primers will not become amplified. These algorithms are, then, limited in various ways. Predicting primers' respective binding affinities is crucial in primer design because, during the PCR process, the annealing between the targeted DNA fragments and the primers with low binding affinity degenerates during the PCR process's thermal cycles. Because of this degeneration, targeted fragments expected to be reproduced by the primer pairs go missing during DNA amplification.
It is important to note that a particular primer's nucleic acids do not contribute equally to the binding affinity. Specifically, this binding affinity is determined by the nucleic acids in the 3' end of the primer more than the nucleic acids in the 5' end. Existing algorithms typically oversimplify their predictions by either ignoring primers with high binding affinity or including primers with low binding affinity.
To address current algorithms' limitations, we created PRISE2, a robust computational tool for sequence-selective PCR primer design. This innovative tool considers all subsequences of potential primer pairs to increase the coverage rate of the targeted fragments. This tool also provides a flexible mechanism with which to formulate positional bias when estimating primers' binding affinity. Importantly, the execution time of locating binding sites for all potential primers is positively proportion to the number of the subsequences. To accelerate searching for the binding sites, this tool clusters subsequences according to their sequence prefices to reduce the searching space. PRISE2 not only provides a user-friendly interface, but also offers full functionality for primer-design tasks. It was implemented using C++ and Qt frameworks to guarantee efficiency and achieve a cross-platform requirement.
In applications where a collection of similar sequences need to be amplified using PCR, degenerate primers can be used to improve the efficiency and accuracy of amplification, since they can hybridize into multiple, unique DNA fragments. Conceptually, degenerate primers allow multiple bases at various positions. However, in reality, they are mixtures of regular primers that differ on certain bases. Specific degenerate primers' degeneracy refers to the number of regular primers in a mixture. Higher degeneracy allows a primer to amplify more targeted sequences simultaneously and also leads to low specificity for targeted sequences that adversely affect the quality and quantity of amplification. It is essential to find a good balance between high coverage and low degeneracy, a balance that a tool like PRISE2 helps achieve.
For degenerate primer design, we proposed a new heuristic algorithm, RRD2P, to compute degenerate primer pairs with near-optimal coverage to targets under the specified degeneracy threshold. RRD2P runs in polynomial time and is confirmed to produce primer pairs with good coverage on three biological data sets. This production compares favorably with a similar tool called HYDEN. The fundamental goal driving RRD2P : to represent computing optimal primers as an integer linear program, solve their fractional relaxation, and then apply randomized rounding to obtain an integral solution.