Abstract
In recent decades, whole genome sequencing (WGS) technology has provided researchers with vast amounts of human sequence data, enabling them to advance their understanding of diseases and push for the development of potential new treatments. However, in clinical practice, where only a limited number of disease-related genes, rather than the patient’s whole genome, require sequencing, WGS has not been frequently used due to its high cost, long processing time, and limited ability to process multiple samples at a time. Furthermore, most of the existing DNA enrichment methods are based on PCR amplification techniques, which primarily work with short DNA sequences and experience complications when working with sequences longer than a few kilobases (kb). To address this setback, researchers have been working to develop non-PCR-based methods that will allow for the enrichment of target sequences from patients’ DNA samples regardless of length. With this goal in mind, we have developed a target gene enrichment method that utilizes the newly discovered CRISPR/Cas12a system and its unique property of producing 5’-overhang ends after cleavage at sequence-specific sites. When the 5’-overhang ends are extended through the incorporation of -phosphorothioate deoxyribonucleotide analogues (dNTPS), the modified DNA becomes resistant to exonuclease degradation, whereas the rest of the DNA remains susceptible. Subsequent exonuclease treatment then produces a strong enrichment effect of the target genes by removing non-target DNA without the need for PCR amplification. Using this method, we demonstrated the ability to enrich the human ABL1 gene, which is approximately 175 kb in length, indicating that this type of enrichment is not limited by the length of the target DNA sequence. We believe that this novel enrichment method, when used in conjunction with modern sequencing technologies, may become a useful tool in clinical settings by enabling physicians to promptly diagnose and treat diseases.
Keywords: whole genome sequencing (WGS), target gene enrichment, CRISPR/12a, -phosphorothioate deoxyribonucleotide (dNTPS), exonuclease, ABL proto-oncogene 1 (ABL1)
Introduction
In recent decades, advancements made in DNA sequencing technologies have significantly improved scientists’ ability to efficiently sequence human genomes, providing them with an abundant amount of sequence data to gain a deeper understanding of how many diseases form and develop1. One type of modern sequencing technology known as whole genome sequencing (WGS) has been used extensively in research settings, however; it is less commonly used in clinical settings due to its high cost, long processing time, and limited ability to process multiple samples at a time. In clinical settings, typically far less than 1% of a patient’s genome needs to be sequenced for disease diagnosis or treatment purposes, therefore it is more time and cost effective to only sequence the genes related to the specified disease rather than the whole genome2’3. In recent years, as a response to this issue, several methods have been developed to enable the enrichment of selected regions from the human genome before subjecting them to DNA sequencing.
However, most of these existing target enrichment methods are based on PCR amplification and hybridization capturing techniques4’5’6, which have significant limitations when used to enrich long, continuous DNA sequences. PCR-based methods can effectively amplify DNA fragments that are a few hundred base-pairs long, however; most target DNA sequences of clinical interest range from a few kilobase pairs (kb) to as long as over 100 kb. Applying these PCR-based methods would thus be too costly and complex, since hundreds or even thousands of small, overlapping fragments would need to be independently amplified to cover the whole target region. Furthermore, following amplification, many of the enrichment methods require the simultaneous capture of the fragments onto a sequencing platform coated with small capturing probes individually designed for each fragment. The large number of PCR-amplified fragments makes it challenging to complete the process in an efficient manner, thus furthering the complexity of these enrichment methods and making them not suitable for broad applications in clinical settings.
The key to overcoming the limitations of the current enrichment methods is finding a way to isolate and enrich target genes in large segments, preferably in one whole piece, without the need for PCR amplification. A successful enrichment method should have the ability to cleave and release target DNA sequences from the human genome as well as get rid of any unwanted DNA. With this in mind, we have developed a novel DNA enrichment method utilizing the newly discovered CRISPR/Cas12a system (Cpf1) and its unique property of producing 5’-overhang ends after cleavage at selected sites7. Unlike CRISPR/Cas9 system that introduces a blunt DNA break at the cleavage site thus making the cleaved DNA no longer capable of accepting additional nucleotide incorporations at the terminus, Cas12a enzyme introduces a staggered double-stranded DNA break with a 4 or 5-nucleotide 5’ overhang which is accessible for nucleotide additions through DNA polymerization reactions7. By filling in the 5’-overhang ends with chemically modified dNTPS deoxyribonucleotides through DNA polymerization reactions8’9’10, the target DNA becomes resistant to exonuclease digestion, whereas the rest of the DNA remains susceptible. Subsequent exonuclease treatment then produces a profound enrichment effect of the target DNA by getting rid of most of the unprotected DNA. We believe that this non-PCR-based enrichment method, when used together with modern sequencing technologies such as nanopore sequencing and other new DNA sequencing platforms, has the potential to be quite useful in clinical settings where sequencing long, continuous DNA segments is essential for the timely diagnosis and treatment of diseases.
Methods
Reagents
CRISPR/Cas12a (Cpf1) protein, restriction enzymes, PCR reagents, and Taq DNA polymerase were provided by AdvancedSeq (Pleasanton, CA). E. coli DNA polymerases and exonuclease III were purchased from New England Biolabs (Ipswich, MA); -phosphorothioate deoxyribonucleotides were purchased from TriLink BioTechnologies (San Diego, CA); Oligonucleotide primers, qPCR probes, and crRNAs were synthesized at Integrated DNA Technologies (IDT) with their sequences listed in Table 1 and Table 2; Human genomic DNA isolated from cultured HEK293 cells was provided by Dr. Ping Cao of BridGene Inc. (San Jose, CA).
PCR primer and probe | Sequence (5’ to 3’) |
ABL1 forward primer | AGGTACTGGTCCCTTCCTTT |
ABL1 reverse primer | CTATGCACACGCCACTTAGAA |
ABL1 probe (Cy5) | ACACGTATCTAOTCTTGGTATGCATCTT |
GAPDH forward primer | CCTCTTAATGGGGAGGTGGCC |
GAPDH reverse primer | TAAAAGCAGCCCTGGTGACCAG |
GAPDH probe (FAM) | CCTCCCCTCCTCATGCCTTCTTGC |
Cas12a crRNA | Sequence (5’ to 3’) |
pGEM-crRNA1 | UAAUUUCUACUCUUGUAGAUCAGUCGGGAAACCUGUCGUGCCAG |
pGEM-crRNA2 | UAAUUUCUACUCUUGUAGAUGUAUCUGCGCUCUGCUGAAGC |
ABL-5’-crRNA | UAAUUUCUACUCUUGUAGAUCGGAGAGCAAAGCAGAGAAGC |
ABL-3’-crRNA | UAAUUUCUACUCUUGUAGAUAAUCCAAAUCUGUCCUCUGUA |
DNA digestion with CRISPR/Cas12a
CRISPR/Cas12a cleavage reactions were carried out following the manufacturer’s instructions. To cleave pGEM-3Zf plasmid DNA, CRISPR/Cas12a protein, PGEM-crRNA1 and pGEM-crRNA2 (final concentration of 30 nM each) were mixed in a total reaction volume of 30-100 µl and incubated at room temperature for 15 minutes to assemble the catalytic Cas12a/crRNAs complexes. PGEM-3Zf plasmid was then added to the reaction mixture (final concentration of 3 nM), and the reaction was incubated for 30 minutes at 37°C. The final reaction products were analyzed on a 1% agarose gel. To cleave ABL1 gene, genomic DNA from HEK293 cells was incubated with Cas12a/ABL-5’-crRNA and Cas12a/ABL-3’-crRNA complexes for 30 minutes at 37°C. The reaction mixtures were subjected to additional treatments as described below.
Incorporation of dNTPS deoxyribonucleotides into 5’-overhang DNA ends.
After PGEM-3Zf plasmid or HEK293T DNA was cleaved with Cas12a/crRNAs complexes, a mixture of dNTPS at final concentrations of 25 µM for each substrate and 10 Units of Taq DNA polymerase were added to the cleavage reaction mixture without changing buffer considerations. After 30 minutes of incubation at 72°C, the reaction products were purified with ZYMO DNA Clean Kit (Zymo Research, Irvine, CA) by following the manufacturer’s instructions. Purified DNA products were eluted with TE buffer to a final concentration of 20 ng/µl.
Digestion with exonuclease
After the incorporation of dNTPS substrates, the purified DNA products were treated with exonuclease III following the manufacturer’s instructions. Each reaction was carried out in a total volume of 20 µl containing ~0.1 µg of purified DNA and 50 Units of exonuclease III. The reaction mixtures were incubated for 30-60 minutes at 37°C followed by incubation at 70°C for an additional 15 minutes to inactivate any remaining exonuclease activity. The final products were then analyzed on a 1% agarose for plasmid DNA samples or by real-time quantitative PCR assays for genomic DNA samples.
Real-time quantitative PCR assay (qPCR)
After genomic DNA from cultured HEK293T cells was treated with CRISPR/Cas12a and ABL1-crRNAs, the reaction mixtures were incubated with Taq DNA polymerase and dNTPS nucleotides to protect the 5’-overhang DNA ends, followed by exonuclease treatment as described above. After exonuclease treatment, TaqMan real-time qPCR assays were performed following the manufacturer’s instructions. Each qPCR reaction contained 10 µM dNTP substrates, 0.1 µM gene-specific primers and probe, and 40 ng of template DNA in standard buffer conditions in a total volume of 10 µl. The PCR amplification reaction included an initial incubation at 94°C for 5 minutes, followed by 40 amplification cycles, with each cycle incubating at 94°C for 10 seconds and then 60°C for 40 seconds. The probes for detecting the housekeeping gene GAPDH and the human ABL1 gene are labeled with FAM and Cy5 fluorescence dye, respectively.
Results
Overview of our target gene enrichment method using CRISPR/Cas12a and exonuclease
Figure 1 below shows the overall scheme for our non-PCR based method of enriching target genes from human DNA. First, we utilize CRISPR/Cas12a protein and its unique property of producing 5’-overhang ends at specific cleavage sites to release the target genes from the rest of the DNA (Figure 1, Step 1)7. After cleavage, the resulting 5’-overhang ends are extended through the incorporation of dNTPS deoxyribonucleotides, a class of chemically modified deoxyribonucleotides containing a sulfur group (Figure 1, Step 2). The presence of these dNTPS deoxyribonucleotides will act as a “protective handle” on the target DNA, shielding it from exonuclease degradation while the rest of the un-modified DNA remains susceptible8’9’10. When such DNA samples are subjected to exonuclease treatment, we expect a profound enrichment effect of the target genes as the non-target DNA is digested away (Figure 1, Step 3).
Incorporation of dNTPS deoxyribonucleotides into the 5’-overhang ends protects DNA from exonuclease digestion
dNTPS deoxyribonucleotides used to incorporate into the DNA terminal ends contain a sulfur group instead of an oxygen group at the -phosphorus position (Figure 2A). They are known to block cleavage by a restriction enzyme or exonuclease when incorporated into DNA sequences8’9’10. We first tested the effect of incorporating dNTPS nucleotides using pGEM-3Zf plasmid, which was cut with EcoRI restriction enzyme to produce a single linear DNA fragment (3.2 kb) with 5’-overhang ends (Figure 2B). We then used Taq DNA polymerase to incorporate either natural dNTP substrates (Figure 2C, Lanes 1 & 2) or dNTPS substrates (Figure 2C, Lanes 3 & 4) into the terminal positions of the DNA fragment, which was then subjected to exonuclease treatment. As shown in Figure 2C, the plasmid DNA fragment filled in with dNTPS substrates (Lane 4) was resistant to exonuclease digestion, whereas the one filled in with natural dNTP substrates (Lane 2) remained susceptible, confirming that dNTPS can act as a “protective handle” to block exonuclease degradation when incorporated into 5’-overhang ends.
Using CRISPR/Cas12a and crRNAs, rather than EcoRI restriction enzyme, to specifically cleave plasmid DNA
After verifying the protection effect of dNTPS incorporation into 5’-overhang ends, we first wanted to confirm the cleavage ability of CRISPR/Cas12a and specifically designed crRNAs on a small plasmid DNA template, as cleaved fragments can easily be visualized on an agarose gel. Unlike restriction enzymes, which only cleave DNA at specific sequences, the CRISPR/Cas12a cutting system can be programmed to cleave at different sequences by designing crRNAs to guide Cas12a protein to these pre-selected sites. In our experiment, we designed two crRNAs, pGEM-crRNA1 and pGEM-crRNA2 (sequences in Materials and Methods), to guide Cas12a protein to cleave pGEM-3Zf plasmid at two sites that are approximately 700 bp away from each other. If cleavage is successful, the plasmid should be cleaved into two fragments: one that is 0.7 kb long and the other that is 2.5 kb long (Figure 3A). As shown in Figure 3B, by comparing the pattern of the cut plasmid (Figure 3B, Lane 2) to that of the uncut plasmid (Figure 3B, Lane 1), it is evident that the plasmid was successfully cut into these specific fragments, indicating that, if correctly designed, crRNAs can guide Cas12a cleavage at selected sequences.
Validation of enrichment effect through exonuclease digestion on Cas12a-cleaved pGEM-3Zf plasmid
With the confirmation that dNTPS nucleotides can serve as a “protective handle” against exonuclease digestion and the establishment of a plasmid-based Cas12a cleavage assay, we then proceeded with our proposed enrichment method by testing the enrichment ability of exonuclease digestion after Cas12a cleavage reaction. In this experiment, pGEM-3Zf plasmid was cleaved with Cas12a/pGEM-crRNA complexes to produce two fragments of 0.7 kb and 2.5 kb (Figure 4B, Lane 1), which served as target DNA for enrichment. dNTPS nucleotides were then incorporated into the 5’-overhang ends of the fragments using Taq DNA polymerase, followed by treatment with exonuclease III and analysis on an agarose gel. To demonstrate the specificity of exonuclease III digestion, purified DNA from Phage (0.5 g per reaction) was added to the reaction mixture before exonuclease treatment to serve as a control. As shown in figure 4B, the incorporation of dNTPS deoxyribonucleotides into the 5’-overhang ends protected the DNA fragments from exonuclease degradation, whereas the unprotected Phage DNA was completely removed (Lane 3 vs. Lane 2). This result confirms that our enrichment strategy works effectively with plasmid DNA.
Enrichment of ABL proto-oncogene 1 (ABL1) from human genomic DNA
To demonstrate that our Cas12a and exonuclease-based enrichment method works on DNA sequences from the actual human genome, we tested it on the human ABL1 gene, which is approximately 175 kb in length and consists of numerous introns and exons (Figure 5). ABL1 gene is a proto-oncogene that encodes a protein tyrosine kinase involved in various cellular processes including cell division, adhesion, differentiation, and response to stress11. It has also been found fused to several translocation partner genes, particularly the breakpoint cluster region gene (BCR). The resulting BCR-ABL fusion proteins are correlated with many forms of leukemia, including chronic myeloid leukemia (CML), therefore ABL1 gene serves as an important biomarker for these diseases.
In this experiment, we used two specifically designed crRNAs, ABL-5’-crRNA and ABL-3’-crRNA, to cut and release ABL1 gene from human genomic DNA isolated from cultured HEK293T cells. Following the cleavage reaction, the DNA mixtures were treated with Taq DNA polymerase to incorporate dNTPS nucleotides into the 5’-overhang ends of the target gene, and the products were then treated with exonuclease III for 60 minutes at 37°C. Quantitative PCR assays that detect ABL1 gene along with an endogenous housekeeping gene, GAPDH or glceraldehyde 3-phosphate dehydrogenase, were performed with the treated DNA before and after exonuclease digestion. As shown in Figure 6, the level of florescence signals with increased amplification cycles was analyzed to detect ABL1 (purple) and GAPDH (blue) genes with and without exonuclease treatment. Whereas the ABL1 fluorescence signal continued to rise as the number of amplification cycles increased both before and after exonuclease treatment, the GAPDH signal was drastically reduced after exonuclease treatment, indicating a selective enrichment of ABL1 gene over the non-target GAPDH gene. This result demonstrates that our enrichment method utilizing CRISPR/Cas12a and exonuclease degradation can effectively enrich target genes from the human genome regardless of length and without the need for PCR amplification.
Discussion
Many diseases, including several types of cancers, are associated with mutant variants in certain genes of the human genome. Therefore, to efficiently diagnose and treat these diseases, it is essential to have the tools available for identifying such mutations in patients’ genomes early on. A type of modern sequencing technology known as WGS has become a valuable tool for scientists in sequencing the entire human genomes especially in research settings2’3. In contrast, WGS has not been used extensively in clinical settings due to its high cost, long processing time, and limited throughput capabilities. With the large size of the human genome, WGS is mainly useful for identifying genetic variants that occur in relatively high frequencies, however; due to the limited number of reads it can obtain on individual DNA regions, it experiences complications when trying to identify variants that are rare or only present in a small number of cells in the human body. In situations like this, sequencing technologies that focus specifically on disease-related genes in a patient’s genome, rather than the entire genome, are helpful in identifying rare mutant variants that may contribute to the development of diseases, therefore allowing for the early diagnosis of these diseases to maximize the chance of treatment success.
In clinical settings, direct sequencing of selected genes from the human genome often requires access to reliable, low-cost enrichment tools to enrich target genes before subjecting them to DNA sequencing. Currently, many enrichment technologies rely on the PCR amplification of many short overlapping DNA fragments, typically a few hundred base pairs each, to cover a single target gene1’4’5’6’12. These methods, however, face significant limitations when used to enrich long and continuous DNA sequences and lack the ability to maintain certain native features that exist in a patient’s DNA, such as epigenetic modifications, which may play an important role in the development and progression of diseases. In response to these shortcomings, we have developed a new target gene enrichment technique that utilizes the newly discovered CRISPR/Cas12a system and its unique property of introducing 5’-overhang ends at cleavage sites7. By filling in these 5’-overhang ends with dNTPS deoxyribonucleotides, we demonstrated that we could make the target DNA resistant to exonuclease degradation and therefore produce a strong enrichment effect of the target DNA once it is subjected to exonuclease treatment. Additionally, using this method, we successfully enriched the human ABL1 gene, a proto-oncogene approximately 175 kb long that encodes a protein tyrosine kinase involved in various cellular functions. Our target enrichment method involves a minimal number of steps, does not require extensive optimizations, and, most importantly, does not require PCR amplification, therefore it can be used to enrich both short and long DNA sequences as well as have the potential to enrich multiple genes simultaneously from a single reaction. We believe that this new enrichment tool, when used along with modern DNA sequencing technologies including nanopore sequencing and other new sequencing platforms, will prove to be useful in clinical settings to support the diagnosis, monitoring, and treatment of diseases.
As a critical component of the target enrichment method, CRISPR/12a requires the presence of a T-rich protospacer adjacent motif (PAM) to activate its DNA cleavage activity. A recent publication by Cao et al showed that the PAM dependence of Cas12a can be eliminated through the use of asymmetric RPA (Asy-RPA) that converts double-stranded DNA to single-stranded DNA which in turn activates the trans-cleavage activity of Cas12a13. It is plausible that, if the PAM-independent feature of Cas12a is incorporated in the target enrichment method, it may further broaden the scope of genes for enrichment to essentially all human sequences regardless of whether or not a specific PAM motif for Cas12a is present.
Due to time and resource limitations for this student research project, we were not able to thoroughly optimize the sample preparation and enrichment conditions, to explore PAM-free cleavage of Cas12a through the use of Asy-RPA14, or to expand testing to include additional human genes besides ABL1. In the future, we plan to further optimize reaction conditions of the target enrichment method and to improve its robustness and throughput capability so that multiple genes from multiple human DNA samples can be processed and analyzed in parallel.
References
- L. Y. Ballester, R. Luthra, R. Kanagal-Shamanna, R. R. Singh, Advances in Clinical Next-generation Sequencing: Target Enrichment and Sequencing technologies. Expert Rev Mol Diagn. 16(3): 357–372 (2016). [↩] [↩]
- L. Y. Ballester, R. Luthra, R. Kanagal-Shamanna, R. R. Singh, Advances in Clinical Next-generation Sequencing: Target Enrichment and Sequencing technologies. Expert Rev Mol Diagn. 16(3), 357–372 (2016). [↩] [↩]
- J. Shendure, H. Ji, Next-generation DNA Sequencing. Nat Biotechnol. 26(10),1135–1145 (2008). [↩] [↩]
- J. Dapprich, D. Ferriola, K. Mackiewicz, P. M. Clark, E. Rappaport, M. D’Arcy, A. Sasson, X. Gai, J. Schug, K. H. Kaestner, D. Monos, The Next Generation of Target Capture Technologies-Large DNA Fragment Enrichment and Sequencing Determines Regional Genomic Variation of High Complexity. BMC Genomics. 17: 486 (2016). [↩] [↩]
- S. E. Eckert, J. Z. Chan, D. Houniet, J. Breuer, G. Speight, Enrichment by Hybridisation of Long DNA Fragments for Nanopore Sequencing. Microb Genom. 2(9): e000087 (2016). [↩] [↩]
- A. Gnirke, A. Melnikov, J. Maguire, P. Rogov, E. M. LeProust, W. Brockman, T. Fennell, G. Giannoukos, S. Fisher, C. Russ, S. Gabriel, D. B. Jaffe, E. S. Lander, C. Nusbaum, Solution Hybrid Selection with Ultra-long Oligonucleotides for Massively Parallel Targeted Sequencing. Nat Biotechnol. 27(2): 182–189 (2009). [↩] [↩]
- B. Zetsche, J. S. Gootenberg, O. O. Abudayyeh, I. M. Slaymaker, K. S. Makarova, P. Essletzbichler, S. E. Volz, J. Joung, J. van der Oost, A. Regev, E. V. Koonin, F. Zhang, Cpf1 is a Single RNA-guided Endonuclease of a Class 2 CRISPR-Cas System. Cell 163, 759-771 (2015). [↩] [↩] [↩] [↩]
- Z. Yang, A. M. Sismour, S. A. Benner, Nucleoside Alpha-thiotriphosphates, Polymerases and Exonuclease III Analysis of Oligonucleotides Containing Phosphorothioate Linkages. Nucleic Acids Res. 35(9):3118-3127 (2007). [↩] [↩] [↩]
- T. T. Nikiforov, R. B. Rendle, M. L. Kotewicz, R. H. Rogers, The Use of Phosphorothioate Primers and Exonuclease Hydrolysis for the Preparation of Single-stranded PCR Products and Their Detection by Solid-phase Hybridization. PCR Methods Appl. 3, 285-291 (1994). [↩] [↩] [↩]
- F. Eckstein, Nucleoside Phosphorothioates. Annu. Rev. Biochem. 54:367-402 (1985). [↩] [↩] [↩]
- J. Colicelli, ABL Tyrosine Kinases: Evolution of Function, Regulation, and Specificity. Sci Signal. 14;3(139) (2011). [↩]
- G. Garcia-Garcia, D. Baux, V. Faugere, M. Moclyn, M. Koenig, M. Claustres, A. Roux, Assessment of the Latest NGS Enrichment Capture Methods in Clinical Context. Sci Rep. 6: 20948 (2016). [↩]
- G. Cao, N. Yang, Y. Xiong, M. Shi, L. Wang, F. Nie, D. Huo, C. Hou. Completely free from PAM limitations: asymmetric RPA with CRISPR/Cas12a for Nucleic Acid Assays. ACS Sens. 2023, 8, 12, 4655 (2023). [↩]
- G. Cao, N. Yang, Y. Xiong, M. Shi, L. Wang, F. Nie, D. Huo, C. Hou. Completely free from PAM limitations: asymmetric RPA with CRISPR/Cas12a for Nucleic Acid Assays. ACS Sens. 2023, 8, 12, 4655 (2023). [↩]