Abstract
With a constantly evolving scientific field, scientists must also quickly adapt to the latest technology and techniques. However, in genetic modification scientists are often deciding on an editing technique based on popularity instead of optimizing for their unique needs. A wrong choice risks negative consequences, wasting both time and money and leading to more harm done than benefits. Therefore, we aimed to solve for the aforementioned problem by creating a universalized means of choosing a genetic editing technique in both research and clinical settings. The model takes both profession and research priorities into account and generates an optimal genetic editing technique that users should use. In doing so, the incorrect applications of genetic editing techniques are minimized as scientists are better educated on which tool they should use. In the future, this model can be expanded to include more preferences, a greater gradient of professions, and can be applied to other scientific fields as well.
Keywords: Genome editing, CRISPR, ZFNs, TALENs, editing technique characteristic variables, mathematical modeling
Introduction
Throughout the hundreds of years of ever-changing scientific discoveries, genome editing has emerged: a novel method to manipulate genes for laboratory and clinical settings. By allowing for the modification of genes, these emerging technologies can be used for a wide range of applications including, but not limited to, agricultural sustainability1, disease treatment and prevention2, engineering of biomaterials3, and optimizing biofuel production. Even though genome editing can be a versatile tool that drives societal advancement, there is ongoing hesitation and debate among scientists and medical professionals due to concerns surrounding fidelity of techniques and ethics of their use. These technologies lack high precision components that are necessary to minimize off-target effects, a balance that is crucial for scientific research and certainly for clinical trials.
In the early 1950s, a non-hereditary phenomenon which resulted in variation in bacterial viruses emerged4. The cell would be able to return to its original state and could spontaneously manifest. These ideas led to the discovery of restriction modifications, consisting of two main components: restriction enzymes and a modification enzyme. Purified from bacteria, restriction enzymes attack the double helix of DNA at precise sequences to cleave both strands of the DNA molecule5). Afterwards, the modification enzyme then tags the DNA at those recognition sequences to prevent further cutting by restriction enzymes. In this way, the modification enzymes protect the DNA from its own restriction enzymes6.
In the early 1970s, DNA sequencing methods arose, making it easier for scientists to understand the genetic information in specific DNA segments. In this way, nucleotide sequences could be easily discerned, and some organisms could even have their entire genomes sequenced. In a gene editing context, being able to sequence genomes enabled scientists to target the specific sites at which restriction enzymes cut to improve accuracy and reduce unwanted off-target cutting.
Targeted gene editing first emerged in the late 1900s through Zinc Fingers7). Zinc fingers are proteins that work to regulate a multitude of cellular processes including DNA recognition (which later helps for gene editing), transcriptional activity, and protein folding, and are primarily found in settings that aim to slow down tumor progression. Zinc Finger Nucleases (ZFNs) are made up of a zinc finger domain, with moieties that bind to precise DNA sequences, and a bacterial nuclease which cuts the DNA in proximity. While they were integral to the timeline in which genetic editing techniques evolved, they are typically quite restrictive and less advanced in comparison to other techniques. This is due to its limitations in only being able to recognize three base pairs at a time, contributing to its diminished specificity, and therefore, increased off-target effects. Despite this, Zinc Finger proteins provided a vital stepping-stone to other genome editing techniques.
Due to the limitations with Zinc Fingers, scientists have since developed new technologies that solve for previous methods’ drawbacks. More recently, in 2011, transcription activator-like effector nucleases (TALENs) were discovered and two years later, clustered regularly interspaced short palindromic repeats (CRISPR) were named8. Both genome editing techniques, which have gained popularity and constitute the majority of the genome editing field, use double-stranded breaks (DSBs) in eukaryotes to perform their function. Specifically, CRISPR relies on the DNA nuclease Cas 9, which is directed to specific genetic sequences at precise locations of the genome called protospacer-adjacent motif (PAM) sites by an RNA handle called a guide RNA (gRNA). The gRNA binds to the target DNA sequence adjacent to the PAM site to maximize its accuracy. TALENs are composed of two main components: transcription activator-like effectors and nucleases. The former component acts as the proteins that attach to the DNA and the latter component works as an enzyme to cut the DNA. The transcription activator-like effectors are custom-designed to bind to specific DNA sequences that would enhance clarity. While these techniques are understood and used broadly at both the academic and medical level, several concerns preclude them from becoming standard practice, including their efficiency and target specificity.
The importance of genome editing is omnipresent; These techniques are required to be extremely precise to correct genetic mutations without compromising the rest of the genome. Therefore, it is vital that these techniques cut precisely, edit efficiently, and prevent subsequent repeat cutting at these sites later on. Their applications are plenty, ranging from weeding out damaged genes in the human genome and removing genetic diseases to modifying food to yield the maximum number of crops. For these genetic editing techniques to perform at optimal levels, the target specificity, cutting efficiency, versatility, ability to enter various cells, ethics, and cost must be optimized. Yet, in the current field, we see a lack of direction in terms of how to choose an editing technique over another for a specific use.
We hypothesize that prioritizing gene editing technique characteristics will lead to more effective selection and application in diverse professional contexts. In creating a model to guide researchers’ choice of technique, an objective metric is established to choose the best option tailored to each researcher’s needs. By doing so, researchers and clinicians can be better educated on which specific editing technique they should use. This study aims to develop a model that prioritizes gene editing techniques based on specific criteria, improving decision-making in both research and clinical fields and thereby optimizing research and clinic results.
Methods
Our primary goal was to find the optimal genome-editing technique based on core needs and limitations of an individual research lab or clinical setting. Therefore, we used a high-level review of the literature to compile a list of variables to consider when working with one of these editing techniques. We evaluated our editing techniques using the following characteristics: target efficiency, cutting efficiency, versatility, ability to enter cells, ethics, and cost. We then conducted a deeper review of genetic editing techniques using PubMed covering three types of editing techniques: CRISPR, TALENs, and ZFNs. We then holistically looked at each paper to gauge what each scientist considered when working with each technique.
Nature Communication Returns Per Variable Search Query | |
Target Specificity | 81,638 |
Cutting Efficiency | 10,434 |
Versatility | 4,295 |
Ability to Enter Cell | 3,554 |
Cost | 2,163 |
Ethics | 5,293 |
In Table 1, an example of such searches is displayed. The number of results returned when each variable is searched up is recorded. The list of variables, from most important to least important, is then made using this table. However, some variables are reprioritized due to misrepresentation in articles which look at a certain variable over another. For instance, ethics has more results than many other variables and are easier to discern and quantify, making them an ideal research topic, whereas versatility is harder to measure. Therefore, when searched for, ethics has a larger number of search results than versatility, even though these results do not truly reflect research and clinic needs accurately. Thereby, some variables are rearranged based on their misrepresentation in literature and usability in cells after factoring in real-world impact, practicality, and scientific consensus, and thus ensuring a proper prioritization of variables that accurately reflects the importance and relevance.
Initial research returned the following key variables: repair efficiency, target specificity of the enzyme, cutting efficiency, versatility among organisms, ability to deliver to human cells, accessibility, ethics, and cost. However, after some revision of these variables, we have decided that our variable list should be cut down to target specificity, cutting efficiency, versatility among organisms, ability to enter human cells, ethics, and cost. We reasoned that repair efficiency is something that the cell must do on its own and often is not highly affected by the type of genome-editing technique used. Additionally, versatility among organisms is less of an issue because the end goal for both clinical and research environments is to send compounds into humans, or at the very least, in vivo models.
In order to corroborate the results from the query search mentioned above, we conducted a survey to discern what matters most to both academic and medical professionals who are versed in the genetic editing field. We then sent this survey through email to a multitude of researchers across the country who would have a background in genome editing techniques. After receiving enough responses, we consolidated that data to examine these rankings and compare them to the ranking we previously had, making adjustments to the list accordingly.
To create our model, our weighting and prevalence of editing techniques were guided by prevalence in the literature and our survey. We used Excel Spreadsheets to organize our information. We created two different pages in the spreadsheet: the background calculations sheet and the dashboard. The background calculations house all the weightages and calculations that happen behind the screen, unbeknownst to the user. The dashboard is the user interface, where users select their profession and primary project goal in order to calculate the optimal genetic editing technique. When the user chooses their specific preferences on the dashboard, their preferences then get linked to the background page. On the background page, these choices are directly linked to calculations that correspond to optimization scores based on the primary variable chosen by the user. To summarize our methods, Figure 1 portrays the study holistically and in a stepwise manner.
Results
The necessary information required for creating the model included details regarding the editing techniques and their different characteristics such as target specificity and cost. To begin, a greater understanding of each editing technique was necessary, and therefore, PubMed was used to gauge characteristics of each technique. In PubMed, the following search criteria and phrases were used: “[variable name] pros and cons in modern research” and “[variable name]”. Then, the results from these searches were whittled down depending on the pertinence of the articles to the study at hand; For instance, target efficiency is also a common term in drug discovery.
It was found that ZFNs, being the oldest technique6, are the least used due to the advent of more novel and efficient technologies like CRISPR in research settings9). Then, each variable was investigated further, which allowed us to gauge what characteristics of the editing techniques were integral to their function. Academic settings require multiple trials to create a viable result, and therefore, need to minimize their cost. Consequently, they typically lean towards using CRISPR or ZFNs due to their relative cost-effective nature. However, CRISPR and ZFNs are known to have more off-target editing than TALENs. Therefore, since lives outweigh the costs required in a clinical setting, a more precise genome editing technique must be used: TALENS10.
We also used survey results to corroborate this data.
The survey first separated the respondents into academic and medical professionals. In the academic category, the survey received around 40 respondents. As depicted in Figure 2, the editing characteristic that most academic researchers chose is target specificity, which is the same we found through the PubMed query search. As depicted in Figure 3, the survey shows what academic researchers prefer to optimize, in order. Though some of these are inconsistent with the query search done on PubMed, we ultimately chose to use the survey as our basis for ordering the editing techniques since it is a more accurate depiction of what actual researchers want.
In the medical category, the survey received around 10 respondents. As depicted in Figure 4, the editing characteristic that most medical professionals chose is target specificity. As depicted in Figure 5, the survey shows what medical professionals prefer to optimize, in order. Therefore, for the order of the editing characteristics, we decided on the following: target specificity, cutting efficiency, ability to enter a cell, ethics, cost, and versatility.
To organize this information and create coherent relations between them, we looked towards making a mathematical model that would calculate ideal editing conditions for specific settings.
The dashboard serves as the primary interaction surface for the user to enter their information and receive their final result of the optimal editing technique they should use. The dashboard requires the user to provide two crucial components: their profession and the primary variable (e.g. target efficiency, cutting efficiency, versatility, ability to enter cells, ethics, and cost) that they wish to be met in their editing technique. With this information, calculations are performed on the background sheet. The calculations weigh each variable based on the user’s preference and their profession. The weighting system works as a gradient, giving the user’s preference the highest value and sequentially weights the next characteristic lower and lower. The sequence of the editing characteristics are based on the previous metrics examined in literature and using scientific validity and relevance.
Weighted values are critical to the validity of the model, as they influence the optimized outputs, considering which variables are more important to the choice of an editing method. Differentiating between professions allows for a more precise decision on editing technique, based on specific necessities of each field. For the profession, users can choose either a medical user, academic user, or an in-between option where both these professions overlap. Though we understand the complexities of all scientific professions are not encompassed in these three professions, we found that this was a good baseline that could be expanded on in later studies. We found that medical professionals and academic professionals tend to prioritize different variables because they have different goals and operational constraints. For instance, medical professionals would aim to optimize accuracy and minimize off-target effects since they deal with people’s lives while academics would also like to optimize accuracy but need to minimize cost as well. Based on the profession chosen, one of the techniques is chosen, which is displayed on the dashboard accordingly. Each profession has a separate technique assigned to it, corresponding to its previously assigned metrics, alongside the editing technique based on primary variable preference chosen by the user.
We divided the interface of the model into two main sheets: the background calculations and the dashboard. The background calculations sheet is where all weightages and manipulation of variables is applied. On the other hand, the dashboard, an example of which is shown in Figure 6, is where the user would choose their primary variable preference, as well as their profession and it would also be where the model would display the appropriate gene editing techniques based on the user’s choices.
Academic | Between | Medical | |||
2 | Target Specificity | 2 | Target Specificity | 2 | Target Specificity |
3 | Cutting Efficiency | 3 | Cutting Efficiency | 3 | Cutting Efficiency |
4 | Ability to Enter Cell | 4 | Ability to Enter Cell | 4 | Ability to Enter Cells |
5 | Cost | 5 | Versatility | 5 | Ethics |
1 | Versatility | 1 | Cost | 1 | Cost |
6 | Ethics | 6 | Ethics | 6 | Versatility |
Weightage Order: | CRISPR | TALEN | ZFN | Weighted Score | |||
1 | 30 | Target Specificity | 2 | 1 | 3 | 20 | 40 |
2 | 20 | Cutting Efficiency | 3 | 2 | 1 | 15 | 15 |
3 | 15 | Versatility | 3 | 2 | 1 | 10 | 10 |
4 | 10 | Ability to Enter Cell | 3 | 2 | 1 | 10 | 10 |
5 | 10 | Ethics | 3 | 2 | 1 | 10 | 10 |
6 | 10 | Cost | 1 | 2 | 3 | 30 | 90 |
Separate from the profession, an editing technique is picked based on the variable chosen (Table 2, Table 3). After finalizing the variables, we began to test how much each variable should be weighed when trying to get a quality editing technique. However, a challenge we faced in ordering these variables was the lack of reliable metrics and numerical values to define each editing technique in existing literature—for instance, if labs preferred to optimize target specificity more or less compared to clinical settings. Without numbers to quantify exactly how prevalent and important each technique is in clinical and academic fields, it was a lot harder to discern how each variable should be weighed in comparison to another. Thus, we base the weightages on a holistic view of the variable and based upon the frequency at which they have been mentioned in past literature10,11,12,13,14,15. In order to get better metrics, we also conducted a survey to determine how much academic researchers prioritize each editing characteristic and weighted each characteristic based on this information.
The model also solves for trade-offs that are common in real-world scientific applications by recognizing the rankings that need to be less prioritized than the chosen characteristic. For instance, in order to optimize target specificity, cost must be sacrificed, resulting in the choosing of TALENs. Then with this information, in order to reduce subjectivity, the model was redesigned multiple times in order to corroborate literature mentions and survey rankings of editing techniques through different weightings of variables, through which a weighting order was determined. Target specificity is highly valued overall in literature as well as in our survey, as many laboratories engage in screening for this variable and other related quantitative measurements. Additionally, many current studies urge for the shift towards devising a way to minimize off-target effects in editing techniques as a primary focus16). Therefore, this is reflected in our model with it having the highest weightage in the background calculations. Cutting efficiency, versatility, and the ability to enter a cell follow close behind, serving as heavy influences in the overall efficiency of the technique.17) Finally, ethics and cost, though extremely important, do not directly affect the efficiency of the technique and are weighed the least, as also depicted in our survey. Though they do significantly impact the feasibility and acceptability of genome editing techniques, they do not directly correlate to how effective a genome editing technique would be, which would be the most important factor to consider when debating the merits of a technique. Without proper efficacy, ethics and cost are rendered negligible. Additionally, each of these weightages was ordered into rankings based upon different professions. The three categories of professions we have used are academic labs, medical labs, and one category for labs that identify as in between in these categories. The rankings of how important each variable is for each profession is something that is also considered by the model.
We divided the interface of the model into two main sheets: the background calculations and the dashboard. The background calculations sheet is where all weightages and manipulation of variables is applied. On the other hand, the dashboard is where the user would choose their primary variable preference, as well as their profession and it would also be where the model would give the appropriate gene editing techniques based on the user’s choices.
Discussion
Editing Technique | Frequency of Output |
CRISPR | 66.67% |
TALEN | 16.67% |
ZFN | 16.67% |
Whilst in the process of making our model, we found that based on the metrics we had used, CRISPR is most likely to be chosen as a final output for the gene editing technique and is chosen 66% of the time. This correlates directly with the abundance of usage in practice in current laboratories, as the current most-used genetic editing technique is CRISPR (Table 3), validating the effectiveness of the model. The percentages were calculated by discerning the number of times a certain genome editing technique was displayed when all combinations of editing technique characteristics are used and dividing that value by the total number of combinations.
One issue we faced was a low number of respondents for our survey due to a lack of adequate resources. Therefore, our sample size was limited, which in future surveys could be better.
Another issue we faced while creating the model was the unequal weighting of both profession and primary preference, which swayed the results of the model to be based solely on the primary preference. This was due to an inaccurate way of feeding the information to create a cohesive model, which would require literature precedent. Without any literature to base our model on, we solved this issue by manipulating the model after receiving the metrics for solely primary preference by basing the importance order of variables off of the primary preference of variables. For instance, if target specificity is chosen as the primary variable, all other weightage values are reduced so that the chosen variable has the highest weightage value. Subsequently, these weightage values were then added together in a way so that each possible result number corresponds to a certain editing technique.
We then fed this value into another matrix that mimicked the original variable choice-based one to include the profession which calculated an editing technique based on the profession chosen by the user. This matrix worked in a similar way to the first matrix, but used the value from the first matrix and added or subtracted from the value based on the profession chosen. The final numerical value was then translated to an editing technique. The technique was decided by using the output from the input of the editing technique characteristic and mimics that matrix to produce a result that takes into account the profession’s unique requirements. Then the model’s results are corroborated with expected results previously mentioned through literature. However, we found that doing so made the model now solely attuned to the profession, rather than an equal weightage of both. The model’s results only changed when changing the profession, and gave no weightage to the editing characteristic chosen. To rectify this, we decided to change the outlook of the model. Instead of trying to incorporate both deciding factors into one model, we separated both factors and had the model decide on a gene editing technique based on each factor separately. Currently, the model chooses a separate gene editing technique based on primary preference and a separate gene editing technique based on the setting of the usage of the editing technique, leaving the user to make a final decision based on the two editing techniques they are shown. Though this has solved the aforementioned problems, it does create a new issue in that it presents the user with two editing techniques instead of just one. Therefore, future studies could better this model and create a more comprehensive model that finds a way to give equal weightage to both profession and user choice to display a single editing technique. Another flaw that exists in the model is the emphasis on user preference, rendering it too volatile. The model is based on reliance on subjective user data, making its results in accordance with biased preferences. However, in the future, with more literature precedent, this subjective user data could be minimized in order to make the model as objective as possible.
In the future, this model can be grown and expanded through creating a dependent relationship between variable choice and profession, which would require more information on how user choice correlates or does not correlate to what their profession is. Moreover, the utilization of this model can move beyond the scope of genetic editing techniques, and rather may be expanded to other techniques in both research and medicine. For instance, different biological assays and their effectiveness in various situations could be deciphered using a similar model with paralleling logic. With the aid of the aforementioned model, scientists are given the tools to optimize their choice in the assay that they would perform. The logic instilled in this model creates a precedence for further models that can optimize a broader range of scientific variants.
Acknowledgements
I would like to acknowledge Lumiere for giving me a platform on which I can perform all this research. I would also like to greatly thank Dr. Matthew Hurlock for being a wonderful mentor and guiding me through the entire research process. I would also like to acknowledge Ayush Dhall for helping me through the publication submission process and formatting of this paper.
References
- P. Sharma, S. P. Singh, H. M. N. Iqbal, R. Parra-Salvidar, S. Varjani, Y. W. Tong, Genetic Modifications Associated with Sustainability Aspects for Sustainable Developments. Bioengineered, 13, 9508-9520. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9161841/ (2022). [↩]
- L. F. M. Rocha, L. A. M. Braga, F. B. Mota, Gene Editing for Treatment and Prevention of Human Diseases: A Global Survey of Gene Editing-Related Researchers. Human Gene Therapy, 31, 852-862, https://pubmed.ncbi.nlm.nih.gov/32718240/ (2020). [↩]
- A. Abdeen, B. D. Cosgrove, C. A. Gersbach, K. Saha, Integrating Biomaterials and Genome Editing Approaches to Advance Biomedical Science. Annual Review of Biomedical Engineering, 23, 493–516. https://pubmed.ncbi.nlm.nih.gov/33909475/ (2021). [↩]
- B. Alberts, A. Johnson, J. Lewis, M. Raff, K. Roberts, P. Walter, Isolating, Cloning, and Sequencing DNA. Garland Science. https://www.ncbi.nlm.nih.gov/books/NBK26837/ (2002). [↩]
- B. Alberts, A. Johnson, J. Lewis, M. Raff, K. Roberts, P. Walter, Isolating, Cloning, and Sequencing DNA. Garland Science. https://www.ncbi.nlm.nih.gov/books/NBK26837/ (2002 [↩]
- H. Chial, Restriction Enzymes. Scitable by Nature Education. https://www.nature.com/scitable/spotlight/restriction-enzymes-18458113/ [↩]
- A. Klug, The Discovery of Zinc Fingers and Their Development for Practical Applications in Gene Regulation and Genome Manipulation. Quarterly Reviews of Biophysics, 43, 1–21. https://pubmed.ncbi.nlm.nih.gov/20192761/ (2010 [↩]
- A. A. Nemudryi, K. R. Valetdinova, S. P. Medvedev, and S. M. Zakian. 2014. “TALEN and CRISPR/Cas Genome Editing Systems: Tools of Discovery.” Acta Naturae 6 (3): 19–40. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4207558/ [↩]
- Bhokisham, Narendranath, Ethan Laudermilch, Lindsay L. Traeger, Tonya D. Bonilla, Mercedes Ruiz-Estevez, and Jordan R. Becker. 2023. “CRISPR-Cas System: The Current and Emerging Translational Landscape.” Cells 12 (8). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10136740/ (2023 [↩]
- Z. Cui, Z. Cui, R. Tian, Z. Huang, Z. Jin, L. Li, J. Liu, Z. Huang, H. Xie, D. Liu, H. Mo, R. Zhou, B. Lang, B. Meng, H. Weng, Z. Hu, FrCas9 Is a CRISPR/Cas9 System with High Editing Efficiency and Fidelity. Nature Communications, 13, 1425. https://www.nature.com/articles/s41467-022-29089-8 (2022). [↩] [↩]
- Q. Chen, G. Chuai, H. Zhang, J. Tang, L. Duan, H. Guan, W. Li, W. Li, J. Wen, E. Zuo, Q. Zhang, Q. Liu, “Genome-Wide CRISPR off-Target Prediction and Optimization Using RNA-DNA Interaction Fingerprints.” Nature Communications, 14, 7521. https://www.nature.com/articles/s41467-023-42695-4 (2023). [↩]
- P. Liu, J. Foiret, Y. Situ, N. Zhang, A. J. Kare, B. Wu, M. N. Raie, K. W. Ferrara, L. S. Qi, “Sonogenetic Control of Multiplexed Genome Regulation and Base Editing.” Nature Communications, 14, 6575. https://www.nature.com/articles/s41467-023-42249-8 (2023). [↩]
- M. Gautam, A. Jozic, G. L. Su, M. Herrera-Barrera, A. Curtis, S. Arrizabalaga, W. Tschetter, R. C. Ryals, G. Sahay, “Lipid Nanoparticles with PEG-Variant Surface Modifications Mediate Genome Editing in the Mouse Retina.” Nature Communications, 14, 6468. https://www.nature.com/articles/s41467-023-42189-3 (2023). [↩]
- D. Archard, P. Dabrock, J. Delfraissy, “Human-Genome Editing: Ethics Councils Call to Governments Worldwide.” Nature Publishing Group UK. https://www.nature.com/articles/d41586-020-00614-3 (2020). [↩]
- H. Tsai, H. Kao, M. Kuo, C. Lin, C. Chang, Y. Chen, H. Chen, P. Kwok, A. L. Yu, J. Yu, “Whole Genomic Analysis Reveals Atypical Non-Homologous off-Target Large Structural Variants Induced by CRISPR-Cas9-Mediated Genome Editing.” Nature Communications, 14, 5183. https://www.nature.com/articles/s41467-023-40901-x (2023). [↩]
- Tycko, Josh, Vic E. Myer, and Patrick D. Hsu. 2016. “Methods for Optimizing CRISPR-Cas9 Genome Editing Specificity.” Molecular Cell 63 (3): 355–70.https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4976696/ [↩]
- R. D. Mittal, Gene Editing in Clinical Practice: Where Are We? Indian Journal of Clinical Biochemistry: IJCB, 34, 19–25. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6346614/ (2019 [↩]