Shedding Light on Genomic Dark Matter: An Introduction to Long Noncoding RNA
While the central dogma of biology describes RNA’s role as a messenger to relay DNA code into protein production, not all RNA is directly involved in protein synthesis. Within the past few decades, largely due to the availability of whole-genome sequences and emerging RNA sequencing technologies, research has expanded to focus on RNA that does not code for proteins, or noncoding RNA. While scientists have known about noncoding RNAs (ncRNAs), such as ribosomal RNA and transfer RNA, for quite some time, a new type of noncoding RNA, called long noncoding RNA (lncRNA), has recently been found to play a variety of regulatory roles in gene expression. For example, misregulation of certain lncRNAs is associated with diseases, including cancer. Currently, researchers are utilizing computational methods, human genetics, cell culture, and animal models to discover and categorize lncRNAs, examine expression patterns in various tissues, and determine the mechanisms by which lncRNAs regulate gene expression.
What is RNA?
Biological information flows from DNA, through RNA, to the proteins that make up cells. This flow is considered the central dogma of biology, but, like all general rules, has a few key exceptions. DNA is the molecule that stores genetic information, which determines an organism’s traits, from organ function and development to physical appearance. In order for this genetic information to be expressed as distinct traits, the DNA code must be reflected in the proteins (structural, catalytic, and signal processing molecules) synthesized by cells. During this process, DNA is first transcribed into RNA, specifically messenger RNA (mRNA). These molecules then carry the genetic code to the cell’s ribosomes, where a process called translation (assembly of amino acids according to the genetic code) occurs to create proteins1.
However, some of the RNA found in cells does not directly code for proteins2. An abundant class of noncoding RNA is involved in protein synthesis. These RNAs include ribosomal RNAs (rRNA) and transfer RNAs (tRNA). rRNA makes up the cell’s ribosomes, the cellular machines that carry out translation1. tRNAs bind to both the mRNA molecule and an amino acid, matching up the amino acid with its corresponding three-nucleotide mRNA sequence in order to assemble the protein in the proper order1.
Other noncoding RNAs serve several roles in regulating gene expression, from DNA methylation to post-transcriptional RNA splicing. These noncoding RNAs include small nucleolar RNAs (snoRNAs), microRNAs (miRNAs), short interfering RNAs (siRNAs), small nuclear RNAs (snRNAs), and long noncoding RNAs (lncRNAs), among others3. Here, we focus on long noncoding RNAs, a new and important area of noncoding RNA biology. LncRNAs are particularly interesting because they resemble mRNAs, but are not translated. A few classic examples, including Xist and H19, have been studied for over two decades and are implicated in processes such as chromosome silencing and imprinting. Recently, thousands of novel lncRNAs have been identified4. The enormous amount of additional information encoded in this new class of genes may elucidate gene regulatory dynamics and disease progression.
What are lncRNAs?
LncRNAs are defined as RNA molecules longer than 200 base pairs that do not have protein-coding potential256. These molecules possess many of the same properties as mRNA, including a 5’ cap and a poly-A tail (which are added to the RNA molecule after translation), yet they lack mRNA’s unique ability to encode protein sequence information6. With the advent of new sequencing technologies and the availability of genome-wide sequences, the importance of lncRNAs has come to light in the past few decades.
At the end of the 20th century, the idea arose that what was previously known as genetic “junk” does, in fact, produce RNA transcripts that resemble mRNAs but lack the potential to encode proteins. New methods have allowed researchers to examine the “transcriptome” (or set of transcribed regions of the genome) and compare it to known protein-coding genes. Researchers have extracted RNA from cells and, using microarrays, probed for regions along the entire genome in order to determine which sections of DNA are translated into RNA6. RNA sequencing utilizes a similar method, obtaining the nucleotide sequences of the RNA found in a cell and computationally mapping it back to the genome6. Another method of discovering lncRNAs involves studying chromatin markers that are indicative of active transcription6. These methods have revealed that much of cells’ RNA can be traced to loci that do not contain protein-coding genes. Studies have estimated that the human genome codes for over 8,000 lncRNAs, compared to about 23,000 mRNAs (3, 4), and lncRNAs have been discovered in multiple organisms in addition to humans, including mice, zebrafish, worms, and even fruit flies7.
LncRNAs are defined based on their location relative to protein-coding genes. For example, intronic lncRNAs are found within introns (non-coding sections) of protein-coding genes56. Antisense lncRNAs overlap with protein-coding gene sequences56. Long intergenic noncoding RNAs (lincRNAs) are found entirely within intergenic space, or the space between two protein-coding genes56. Of the various types of lncRNAs, lincRNAs are the most widely studied because they do not overlap with protein-coding genes, which allows for less convoluted interpretation of experimental results56.
LncRNAs’ ability to bind to DNA, RNA, and proteins suggests several initial mechanisms for lncRNA function, such as molecular scaffolds, guides, or decoys ((Rinn, J. L., & Chang, H. Y. (2012). Genome regulation by long noncoding RNAs. Annual Review of Biochemistry, 81(1), 145-166. doi: 10.1146/annurev-biochem-051410-092902))8. Specifically, a molecular scaffold brings together multiple components. In this mechanism, the lncRNA interacts with two or more important protein partners, assembling a functional complex. As guides, lncRNA molecules bring protein partners or entire complexes to genomic targets. Finally, lncRNAs act as decoys by associating with DNA-binding proteins and preventing them from finding their target DNA sequences.
The classical lncRNA Xist acts as both a scaffold and guide, and is known to completely coat the entire X chromosome, inactivating it and generating the Barr body present in every female mammalian cell910. This silencing ensures that the expression of genes located on the X chromosome in female cells equals that of male cells, which have only a single X chromosome. This method of gene dosage compensation does not occur in mutant cells lacking the Xist gene, indicating that Xist is necessary for X inactivation10. Xist acts as a molecular scaffold, harboring binding sites for multiple proteins conferring transcriptional repressive activity8. Xist also acts as a guide, bringing these repressive proteins to the target X chromosome.
A more recently discovered lncRNA, GAS5, functions as a decoy and is active in starvation conditions. It reduces cell metabolism by complexing with the glucocorticoid receptor (a regulator of cell development and metabolism), preventing it from binding to its DNA targets and activating their transcription, thus slowing cellular metabolism1112
The general examples described here represent only the tip of the lncRNA mechanism iceberg. Additional mechanisms of lncRNA function are currently already known and, given the large number of recently discovered lncRNAs, scientists are likely to find many more. Although many of the specific ways in which lncRNAs function remain unsolved, this new class of molecule is associated with a wide range of processes, from embryonic stem cell pluripotency to organ development, neurodegeneration, and even cancer6.
LncRNAs in Disease
Genetics offers the strongest argument for disease association. Mendelian inheritance describes how predisposition to disease can literally be “in our genes”. Even diseases that arise without this pattern of inheritance can be traced to mutations in protein coding genes, but the precise mutation is often unknown. More recently, these genetic analyses have pointed to the noncoding genome as harboring mutations that give rise to disease risk. In fact, over 200 genomic loci associated with human traits and disease risk reside in regions that do not code for any proteins but do code for lncRNAs4. Thus, lncRNA regions may be implicated in many human diseases.
A powerful example of combining noncoding human genetics and lncRNAs arises from a lethal childhood disorder, called alveolar capillary dysplasia (ACD)13. In humans, a mutation in a noncoding region upstream from an important lung development transcription factor, FOXF1a, leads to this lethal lung defect13. ACD can be caused by point mutations either in FOXF1a itself or in the upstream noncoding region. Importantly, the noncoding region mutated in ACD houses a lncRNA with a positional equivalent in mice. This lncRNA in mice, termed Fendrr, is embryonic lethal, and appears to have an impact on murine lung physiology14. The search for the region responsible for this lung disorder highlights how human genetics has causally implicated noncoding regions in disease. The observation that lncRNAs are vital for healthy lung development in humans and mice points to the physiological relevance of lncRNAs.
In addition to developmental disorders, lncRNAs have recently been associated with various neurological diseases2515. This connection is substantiated by the fact that both neurological disorders and lncRNAs are highly tissue-specific within the brain15. In the inherited neurological disorders fragile X syndrome (FXS) and fragile X tremor ataxia syndrome (FXTAS), patients carry a mutation in a noncoding regulatory region of the mRNA for the protein-coding gene FMR115. However, the wide range of phenotypes among FXS and FXTAS patients suggests that these conditions are not caused entirely by the FMR1 mutation but that misregulation of other genes may play a role as well. The lncRNAs FMR4 and ASFMR1 are down-regulated in FXS and upregulated in FXTAS15. This observation may explain the phenotypic differences among patients, but the mechanisms by which these lncRNAs function are still unknown. Expression of lncRNAs is also altered in Alzheimer’s disease, a gradually worsening form of dementia that is believed to be caused by amyloid plaques in the brain15. The lncRNAs BACE1-AS and BC200 (a primate-specific lncRNA involved in the translation of certain proteins) are up-regulated in patients with Alzheimer’s disease15. It is not yet known if the dysregulation of these lncRNAs is sufficient to cause the disease or by what mechanism they function.
Another important area of study involves the link between lncRNA and cancer, including breast, colorectal, brain, and skin cancers16. In fact, many cancer patients do not have mutations in known protein-coding tumor suppressor genes or oncogenes, but rather in noncoding genomic regions, some of which have been linked to poor prognosis and a high grade of malignancy15. Genes involved in cancer can be classified as either oncogenes or tumor suppressors11. Oncogenes are genes that have the potential to cause cancer when mutated or aberrantly upregulated, while tumor suppressor genes protect cells against harmful mutations or initiate apoptosis (programmed cell death) when these mutations have occurred11. We see protein-coding examples of both oncogenes and tumor suppressors: RAS is a well-known oncogene, and p53 is a well-known tumor suppressor. Similarly, we see long noncoding examples of both oncogenes and tumor suppressors.
Oncogenic lncRNAs include, but are not limited to, HOTAIR and ANRIL11. These two lncRNAs show elevated expression in cancer types and are known to interact with chromatin modifiers, suggesting that they alter the cell’s epigenetic state, or the proteins that attach to DNA to control transcription or gene silencing. High levels of HOTAIR are seen in breast cancer, liver cancer, and colorectal cancer15. HOTAIR is implicated in metastasis, or the process by which cancer cells become highly mobile and invade other tissues. In the case of breast cancer, high levels of HOTAIR have been correlated with poor prognosis and low survival rates11 ((Niland, C. N., Merry, C. R., & Khalil, A. M. (2012). Emerging roles for long non-coding RNAs in cancer and neurological disorders. Frontiers in Genetics, 3. doi: 10.3389/fgene.2012.00025)). Another onco-lncRNA, ANRIL, is highly expressed in breast cancer, acute lymphoid leukemia, nasopharyngeal carcinoma, glioma, basal cell carcinoma, and plexiform neurofibromas15. ANRIL represses several known tumor suppressor genes and prevents them from properly regulating the cell cycle, thus contributing to cancer progression15. Conversely, the lncRNA GAS5, described above, is an example of a tumor-suppressor lncRNA. One of the hallmarks of a tumor suppressor is downregulation in cancer, because the function of that gene is necessary to prevent the disease. Studies have demonstrated that GAS5 is dramatically down regulated in breast cancer1112. This lncRNA reduces cell metabolism, and consequently, its downregulation may contribute to uncontrolled cell division1117.
LncRNAs also have the potential to improve human health by providing new diagnostic methods. Because of their differential expression in disease states, lncRNAs may offer non-invasive diagnoses. This possibility is promising because lncRNAs are stable in body fluids6. For example, Prensner et al. identified the lncRNA PCAT-1, which is expressed in human prostate cancers and causes altered gene repression. Furthermore, the level of PCAT-1 in human urine can be used to identify patients with poor prognosis6. Finally, if key lncRNAs are found to be causally implicated in certain cancers, therapies may be designed to target them.
Here, we discuss three primary methods of lncRNA research. First, lncRNAs are discovered and preliminarily categorized computationally. These lncRNAs can then be experimentally manipulated in either cell culture experiments or animal models.
Computational methods are used to identify and preliminarily functionally annotate lncRNAs. Analysis of RNA sequencing data can identify new lncRNA transcripts and create reference catalogs. There is a national consortium called ENCODE (encyclopedia of DNA elements), which is working to map lncRNAs, as well as GENCODE (encyclopedia of genes and gene variants). Anyone can access and investigate the lncRNAs identified in these mapping efforts with tools such as the UCSC Genome Browser (Figure 3). In an example of a large-scale computational study, Cabili et al. compiled RNA-seq data from 24 different cell types. They used this data to generate a catalog of over 8,000 lincRNAs, which included greater than 4,500 novel transcripts4. They then categorized and annotated the lincRNAs based on a range of properties, including structure and evolutionary changes. Among the results, two observations were especially notable: lincRNA expression is more tissue-specific than protein-coding gene expression, and many lincRNAs are co-expressed with neighboring genes4.
Cell culture represents another method of studying lncRNA function. Researchers can observe the effects of targeting and removing a specific lncRNA from the cell using RNA interference (RNAi). This approach introduces double-stranded RNA, called short interfering RNA (siRNA), which unwinds and binds to the target lncRNA. An RNAse enzyme then associates with and cuts the target lncRNA, inactivating it18. Thus, researchers can observe the phenotypic effects of lncRNA removal. In addition, total RNA can be purified from these cells, allowing researchers to examine the effects of lncRNA expression modulation, including changes in the expression of other specific genes in the cell, or at the level of the whole transcriptome.
Knockout animal models are a final frontier in current lncRNA research and are vital for determining the physiological relevance of this class of molecule. Knockout animals lack a certain gene, and therefore are useful for observing the phenotypic effects of the missing gene. Knockout mice were first created by Dr. Mario R. Capecchi in the 1980s by removing a gene from the genome19. Since then, knockout mouse strains have been created for a tremendous number of protein-coding genes as well as for a select few classic lncRNAs, including Xist and H1920. More recently, however, knockout mice are being used to study a large cohort of novel lncRNAs. In 2013, Sauvageau et al. conducted a study in which eighteen strains of knockout mice were created by removing lncRNA loci. They observed a range of phenotypes when specific lncRNAs were knocked out, from perinatal lethality to neurological defects14
Often, when knockout models are used to study genes, the targeted gene is removed and replaced by a LacZ reporter gene21. In tissues and developmental time points in which the gene is expressed, the LacZ gene actively transcribes an enzyme called beta-galactosidase, which cleaves X-gal to a produce a blue byproduct. Consequently, when tissues of a knockout animal are placed in X-gal, the areas in which the knocked out lncRNA locus is active turn blue. This provides information about when and where the gene is expressed, leading to inferences about the processes in which the gene may be involved. Sauvageau et al. represents the first study in which lncRNA knockout mice were made using this reporter gene technology, and also the largest cohort of lncRNA knockout mice currently available. Genes can be knocked out in other organisms as well, including zebrafish, worms, and fruit flies. One problem associated with knockout organisms is that lncRNAs are not well conserved across species, so while some “positional equivalents” have been discovered, it is difficult to draw functional parallels to human biology22.
Because the field of lncRNA is so new, the goal is to discover and characterize these molecules and to eventually associate them with specific gene expression pathways and developmental processes.
Many challenges need to be overcome before gaining a deeper understanding of lncRNAs and their function. For example, efforts are underway to achieve complete annotation of lncRNAs and to group them into families based on their structure, function, and expression patterns. These classification efforts can be achieved through computational methods, cell culture studies, and use of knockout models. Another emerging field of study is the examination of the mechanisms by which lncRNAs operate, including how they target specific genomic regions, the roles they play in protein interactions, and which lncRNAs act in cis (on nearby genes) versus in trans (on faraway genes, often on different chromosomes)6. In addition, the observation that lncRNAs are poorly conserved across species raises questions about how and why they evolve so rapidly and whether they are functional molecules or transcriptional byproducts6. Ultimately, the goal is to understand the roles of lncRNAs in tissue and organism development, homeostasis, and pathogenesis.
LncRNAs represent a group of RNA genes that may offer new insight into the world of molecular biology and gene regulation. In other words, lncRNAs may provide important information about when and how genes are turned on or off. The discoveries outlined here, along with ongoing research in laboratories around the world, can help us understand how the body develops and what happens when developmental processes malfunction, as in cancer. The hope is that continued study of lncRNAs will solve many mysteries of gene regulation and lead to new methods of detecting and treating disease.
I would like to thank Drs. Paul Khavari and John Rinn for their mentorship and the opportunity to participate in their research. I would also like to thank Abbie Groff for her inspiration and guidance in the preparation of this manuscript.
- Lodish, H., Berk, A., Zipusky, L., Matsudaira, P., Baltimore, D., & Darnell, J. (2000). The Three Roles of RNA in Protein Synthesis. In Molecular cell biology (Vol. 4). Retrieved August 4, 2014, from http://www.ncbi.nlm.nih.gov/books/NBK21603/ [↩] [↩] [↩]
- Groff, A. (2014, January 21). Long noncoding RNAs: A new class of RNA and why you should care. Retrieved from http://sitn.hms.harvard.edu/flash/2014/long-noncoding-rnas/ [↩] [↩] [↩]
- Sana, J., Faltejskova, P., & Svoboda, M., Slaby, O. (2012). Novel classes of non-coding RNAs and cancer. Journal of Translational Medicine 10(103). doi: 10.1186/1479-5876-10-103 [↩]
- Cabili, M. N., Trapnell, C., Goff, L., Koziol, M., Tazon-Vega, B., Regev, A., & Rinn, J. L. (2011). Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes & Development, 25(18), 1915-1927. doi: 10.1101/gad.17446611 [↩] [↩] [↩] [↩]
- LncRNA: Why and how to study it? (2013). Retrieved from http://www.arraystar.com/news/news_main.asp?id=230 [↩] [↩] [↩] [↩] [↩] [↩]
- Rinn, J. L., & Chang, H. Y. (2012). Genome regulation by long noncoding RNAs. Annual Review of Biochemistry, 81(1), 145-166. doi: 10.1146/annurev-biochem-051410-092902 [↩] [↩] [↩] [↩] [↩] [↩] [↩] [↩] [↩] [↩] [↩] [↩] [↩] [↩]
- Park, C., Yu, N., Choi, I., Kim, W., & Lee, S. (2014). LncRNAtor: a comprehensive resource for functional investigation of long noncoding RNAs. Bioinformatics. doi: 10.1093/bioinformatics/btu325 [↩]
- Wang, K., & Chang, H. (2011). Molecular mechanisms of long noncoding RNAs. Molecular Cell, 43(6), 904-914. doi: 10.1016/j.molcel.2011.08.018 [↩] [↩]
- Brown, C. J., Hendrich, B. D., Rupert, J. L., Lafrenière, R. G., Xing, Y., Lawrence, J., & Willard, H. F. (1992). The human XIST gene: Analysis of a 17 kb inactive X-specific RNA that contains conserved repeats and is highly localized within the nucleus. Cell, 71, 527-542. [↩]
- Marahrens, Y., Panning, B., Dausman, J., Strauss, W., & Jaenisch, R. (1997). Xist-deficient mice are defective in dosage compensation but not spermatogenesis. Genes & Development, 11(2), 156-166. doi: 10.1101/gad.11.2.156 [↩] [↩]
- Huarte, M., & Rinn, J. L. (2010). Large non-coding RNAs: missing links in cancer? Human Molecular Genetics 19(R2), 152-161. doi: 10.1093/hmg/ddq353 [↩] [↩] [↩] [↩] [↩] [↩] [↩]
- Kino, T., Hurt, D. E., Ichijo, T., Nader, N., & Chrousos, G. P. (2010). Noncoding RNAgas5 is a growth arrest- and starvation-associated repressor of the glucocorticoid receptor. Signal 3(107):ra8. doi: 10.1126/scisignal.2000568 [↩] [↩]
- Szafranski, P., Dharmadhikari, A. V., Brosens, E., Gurha, P., Ko?odziejska, K. E., Zhishuo, O., Dittwald, P., Majewski, T., Mohan, K. N., Chen, B., Person, R. E., Tibboel, D., de Klein, A., Pinner, J., Chopra, M., Malcolm, G., Peters, G., Arbuckle, S., Guian, S. F., Hustead, V. A., Jessurun, J., Hirsch, R., Witte, D. P., Maystadt, I., Sebire, N., Fisher, R., Langston, C., Sen, P., & Stankiewicz, P. (2013). Small noncoding differentially methylated copy-number variants, including lncRNA genes, cause a lethal lung developmental disorder. Genome Research,23(1), 23-33. doi: 10.1101/gr.141887.112 [↩] [↩]
- Sauvageau, M., Goff, L. A., Lodato, S., Bonev, B., Groff, A. F., Gerhardinger, C., Sanchez-Gomez D. B., Hacisuleyman, E., Li, E., Spence, M., Liapis, S. C., Mallard, W., Morse, M., Swerdel, M. R., D’Ecclessis, M. F., Moore, J. C., Lai, V., Gong, G., Yancopoulos, G. D., Frendewey, D., Kellis, M., Hart, R. P., Valenzuela, D. M., Arlotta, P., & Rinn, J. L. (2013). Multiple knockout mouse models reveal lincRNAs are required for life and brain development. ELife, 2. doi: 10.7554/eLife.01749 [↩] [↩]
- Niland, C. N., Merry, C. R., & Khalil, A. M. (2012). Emerging roles for long non-coding RNAs in cancer and neurological disorders. Frontiers in Genetics, 3. doi: 10.3389/fgene.2012.00025 [↩] [↩] [↩] [↩] [↩] [↩] [↩] [↩] [↩] [↩]
- Wapinski, O., & Chang, H. Y. (2011). Long noncoding RNAs and human disease. Trends in Cell Biology, 21(6), 354-361. doi: 10.1016/j.tcb.2011.04.001 [↩]
- Kino, T., Hurt, D. E., Ichijo, T., Nader, N., & Chrousos, G. P. (2010). Noncoding RNAgas5 is a growth arrest- and starvation-associated repressor of the glucocorticoid receptor. Signal 3(107):ra8. doi: 10.1126/scisignal.2000568 [↩]
- RNA interference (RNAi). (n.d.). Retrieved from http://www.ncbi.nlm.nih.gov/genome/probe/doc/TechRnai.shtml [↩]
- Transgenic mice. (n.d.). Retrieved from http://learn.genetics.utah.edu/content/science/transgenic/ [↩]
- White, J. K., Gerdin, A., Karp, N. A., Ryder, E., Buljan, M., Bussell, J. N., Salisbury, J., Clare, S., Ingham, N. J., Podrini, C., Houghton, R., Estabel, J., Bottomley, J. R., Melvin, D. G., Sunter, D., Adams, N. C., The Sanger Institute Mouse Genetics Project, Tannahill, D., Logan, D. W., MacArthur, D. G., Flint, J., Mahajan, V. B., Tsang, S. H., Smyth, I., Watt, F. M., Skarnes, W. C., Dougan, G., Adams, D. J., Ramirez-Solis, R., Bradley, A., & Steel, K. P. (2013). Genome-wide Generation and Systematic Phenotyping of Knockout Mice Reveals New Roles for Many Genes. Cell, 154, 452-464. doi: 10.1016/j.cell.2013.06.022 [↩]
- Li, L., & Chang, H. Y. (2014). Physiological roles of long noncoding RNAs: insight from knockout mice. Cell Press. doi: 10.1016/j.tcb.2014.06.003 [↩]
- Baker, M. (2011). Long noncoding RNAs: The search for function. Nature Methods, 8(5), 379-383. [↩]