Abstract
PIK3CA (phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit alpha), which is associated with the PI3K signaling molecules, is frequently mutated in human breast cancer. The relationship between PIK3CA and breast cancer is evaluated by using tools such as cBioPortal, UALCAN, STRING, COSMIC, UCSC genome browser, and DEPMAP. The Cancer Genome Atlas (TCGA) Breast Invasive Carcinoma Firehose Legacy Dataset was the main source of data analyzed in this research. The multi-omics study aimed to determine whether PIK3CA can be utilized as a biomarker for identifying and treating breast cancer. The hypothesis is that PIK3CA will be able to be used as a biomarker for breast cancer specifically because its mutated expression will be associated with breast cancer malignancy. The most common genetic alterations that were found to be associated with PIK3CA are missense mutations and amplifications. Two hotspot locations are found to overlap with post-translational modification sites as well as the PIK3CA gene is found to have a positive connection with the breast cancer subtype TNBC (triple-negative breast cancer) which is commonly associated with having aggressive and negative breast cancer outcomes. The gene expression of mutated PIK3CA, survival analysis, proteomics analysis, and gene dependency analysis based on the CRISPR-CAS9 method all suggest that high mutated PIK3CA expression is associated with high breast cancer progression and negative breast cancer outcomes. This is while the UALCAN data found that high non-mutated expression of PIK3CA was associated with good breast cancer outcomes. These findings suggest that PIK3CA has a clear relationship with breast cancer making it a valuable breast cancer biomarker for upcoming treatments and diagnosis.
Keywords: PIK3CA, breast cancer, data-mining, cancer bioinformatics
Introduction
Cancer is one of the most deadly diseases in the world with millions of people being affected by it worldwide. The most common cancer in women is breast cancer with 2.3 million women diagnosed in 2020 and 685,000 deaths globally1. Breast cancer is most likely to occur in women ages 50 years or older. Some prominent risk factors for breast cancer include family history, gene mutations, smoking, lack of exercise, alcohol consumption, and infections2. Though there have been many improvements in cancer treatments and diagnosis, breast cancer is becoming more and more prevalent throughout the world each year3.
The molecular subtype of the cancer greatly affects the progression and survival rates of the patients diagnosed with it. Breast cancer’s most widely and commonly accepted classification is based on the expression of many hormone receptors which are estrogen (ER), human epidermal growth factor (HER2) and progesterone (PR)4. Accordingly, the four major subtypes of breast cancer include luminal A, luminal B, HER2-positive and triple negative breast cancer (TNBC). The tumors categorized under luminal A have the presence of ER and/or PR along with the absence of HER2 and a low expression cell proliferation marker Ki-67. These tumors have the highest prognosis of survival rate and a less incidence of relapse rate as well due to being mostly low grade and slow growing. Luminal B tumors are ER positive, can be PR negative and have a high expression of the cell proliferation marker, Ki-67. As a result of the high expression of Ki-67, the luminal B subtype has a much worse prognosis and higher grade than luminal A. The HER2-positive subtype has an absence of ER and PR, but it is characterized by a high HER2 expression. It is much more aggressive than the luminal A and B subtypes as it is extremely fast growing. TNBC is ER negative, PR negative, as well as HER2 negative. TNBC is commonly associated with the characteristics of being aggressive, having early relapse, and high proliferation. It can also be further classified into additional subgroups such as basal-like (BL1 and BL20), mesenchymal (MES), luminal androgen receptor (LAR), or immunomodulatory (IM).
PIK3CA mutations are much more significant compared to other oncogene mutations. This is because Phosphoinositide kinases (PIK’s) are types of genes that are important activators in numerous significant signaling pathways which are directly related with breast cancer. PIKs are categorized into three major groups which change based on phosphorylation on the carbohydrate. The types include phosphoinositide 3-kinases (PI3Ks), phosphoinositide 4-kinases (PIP4Ks), and phosphoinositide 5-kinases (PIP5Ks). PI3Ks are made of regulatory and catalytic subunits and are known commonly as heterodimeric enzymes. It phosphorylates other signaling molecules by being activated by a growth factor bound receptor tyrosine kinase (RTK), which results in downstream conduction of chemical signals. PI3K signaling plays pivotal roles in the pathways of tumor development and progression and in cellular processes including cell proliferation and cell survival.
The research question at hand is how the expression of the gene PIK3CA in the human body affects breast cancer outcomes and consequences and if it could be utilized as a breast cancer biomarker. This is important towards contributing to breast cancer research as it allows a faster and easier way to determine breast cancer prognosis and treatment targeted toward specific patients allowing for diagnoses to be accurate rather than uncertain. Currently, there are gaps between research of the gene PIK3CA because many papers skip over comparing different characteristics of the genes to each other as they only focus on one. From these findings, theoretical advancements such as personalized gene-targeted therapies for breast cancer could emerge in the coming years. The scope of this study encompasses the survival rate, copy number alterations, gene dependency, and protein number interactions, but it does leave out different types of alterations. The general approach for this paper was using available online tools to answer the question.
Results
A.
B.
C.
D.
PIK3CA Mutation Analysis and Copy Number Alterations
PIK3CA is located on chromosome 3 between the 176,000,001th and 179,300,000th base pairs. It has a length of 91,737 bp (base pairs) (Figure 1A). Copy number alterations and mutations are analyzed through the OncoPrint tool in cBioPortal for Cancer Genomics. The frequency of alteration of PIK3CA (32.5%) is only in breast cancer, but is higher in Breast Invasive Carcinoma (NOS) and in Breast Invasive Lobular Carcinoma (Figure 1B). The most common alteration in PIK3CA is a missense mutation (putative driver) with 97.77% of PIK3CA samples having it6,7. Along with missense mutations, the second most common alteration type is a type of copy number alterations and amplification. In total, there are 344 driver mutations including the majority, 340 missense mutations, and 4 inframe mutations (Figure 1C). To add on, there are 11 mutations with an unknown significance with 7 missense, 2 truncating mutations, 1 inframe, and 1 splice mutation. The somatic mutation frequency is displayed as 28.2%. In addition, the PIK3CA protein consists of 1068 amino acids. A protein kinase domain called PI3Ka exists in between the 520th and 703rd amino acids where a repetitive hotspot with a probability of oncogenic effects is located. This hotspot has a missense mutation which causes the protein to change into E545K (Figure 1D). Though there is not much functional data for the PIK3CA E545K mutation, it has still been recognized as a statistically significant hotspot. Another hotspot is spotted near the PI3_PI4_kinase domain with 144 mutated out of 1084 profiled patients displaying that PIK3CA is associated with elevated chances of being diagnosed with breast cancer.
Figure 2. Survival Curves or Kaplan-Meir plots display the overall survival (OS) probability and the disease free (DF) probability versus time (120 months). Figures A-B are taken from the cBioPortal TCGA firehose legacy dataset. (A) The oncogene, PIK3CA, is not very statistically significant for determining overall survival rate; however, there are differences seen between the months 65-120. (B) For disease free rates, PIK3CA is more significant than for OS, but still not very significant. (C) is from the Kaplan-Meier plotter gene chip data for breast cancer. High expression of PIK3CA is associated with a low survival rate compared to low expression of PIK3CA.
Survival Analysis Using PIK3CA Expression Levels and Alterations
High PIK3CA mutated expression predicts a higher probability of OS rate compared to low PIK3CA mutated expression (log rank P= 2.1e-06) (Figure 2C). This survival curve is obtained from the Kaplan-Meier plotter that runs the PIK3CA mutated expression level versus OS analysis among 4929 breast cancer patients with all subtypes including. The hazard ratio (HR) is expressed as 1.28 ( 95% confidence interval 1.16-1.42). There is no statistically significant difference between the PIK3CA altered and unaltered group for OS in the cBioPortal TCGA Firehose Legacy dataset for breast invasive carcinoma (log-rank test P-Value: 0.557) (Figure 2A). PIK3CA is found to be associated more with the TNBC subtype compared to others which could explain why there were no statistically significant results. The oncogene, PIK3CA, is minorly statistically significant for determining overall survival rate because though the P-value is higher than 0.05, the HR shows some type of change. Though there are minor differences seen on the graph between the months 65-120, the hazard ratio is calculated as HR: 1.112 which falls in the 95% confidence interval: 0.785 – 1.576. This survival curve is found from 1108 samples. For the disease free curve, PIK3CA is more statistically significant compared to the OS curve; however, it is still not significant enough to have a major impact on the data as a whole (log-rank test P-Value: 0.253). Although it does not have a major impact PIK3CA still affects OS as HR: 1.271 (95% confidence interval: 0.858 – 1.885) (Figure 2B)8.
A.
B.
C.
Gene Dependency Analysis Using CRISPR/CAS9 Method
DEPMAP is an online database which utilizes the CRISPR/CAS9 method in order to analyze genes to a better extent. CRISPR/CAS9 is a genome editing technology that makes it possible to correct errors in the genome and turn on and off cells cheaply and with ease9. The data shown in DEPMAP displays different cell lines’ lives when the gene PIK3CA is deleted using the CRISPR/CAS9 method. The further away from 0, the cell line is, the more dependent it is on PIK3CA. When the data is divided through types of cancer subtypes, it is observed that the ER2-Positive/HER2 -Positive and the ER2-Negative/HER2-Positive subtypes have the most dead cell lines (the farther away from 0) than ER2-Positive/HER2-Negative and ER2-Negative /HER2-Negative (Figure 3A). The MDAMB361 cell line in the ER2-Positive /HER2-Positive category is the furthest away from zero (-3.403186), stating that it had the worst outcome out of all the cell lines and the entire cell line is dead. This displays that MDAMB361 in the ER2-Positive/HER2-Positive category requires non-mutated PIK3CA to function. In addition, the UACC893 cell line is the second farthest away from 0 (-2.360704). The majority of the points are in the range -1 to -2 emphasizing that PIK3CA is essential in cancer subtypes and cell lines. Next, the gene dependency of PIK3CA is categorized and analyzed through lineage, separating the data points by type of cancer (Figure 3B). Breast cancer has the 3 farthest cell lines, MDAMB361, UACC893, and OCUBM. Closely following breast cancer with the second farthest points is Esophagus/Stomach cancer with HGC27 and KYAE1 being far from 0 as well (-2.141143 & -1.697129). The gene dependency is displayed with lineage only for breast cancer (Figure 3C). Other cell lines that are far from 0, in the breast cancer subtype, are AU565, CAL51, HCC202, HCC1954 (P>-1: represents how far away from 0). The diagram also concludes that the majority of breast cancer types depend on the gene PIK3CA.
A.
B.
Protein-Protein Interaction (PPI’s) of PIK3CA
Protein-protein interactions of the gene PIK3CA are displayed utilizing the online tool, STRING. Each protein is represented by a circle or node. Known (current databases and experiments), predicted (gene fusions, gene neighborhood, and gene co-occurrence), and other (protein homology, gene co-expression, and text mining) interactions are represented as lines with different colors in between each gene (Figure 4A). In order to calculate scores next to the predicted functional partners, text mining, lab experiments, co-expression data, and knowledge from previous databases are utilized (Figure 4B). As the score of each interaction gets closer to the number 1, the confidence or likelihood that the interaction is valid increases. Based on the scores, the functional partners, PIK3R3, PIK3R2, PIK3R5, EGFR, and IRS1, are all equal in strength of protein function with PIK3CA. However, they are greater in the predicted strength than PIK3CD, PIK3CB, and AKT1. The connection between PIK3CA and PIK3CB is involved with the pathway of PI3K/AKT/mTOR which is an intracellular signaling pathway that is important in regulating the cell cycle including apoptosis, proliferation, and angiogenesis10,11. In addition, between PIK3CA and PIK3R1 the connection displays a mutually exclusive pattern which eventually leads to oncogenesis and hyperactivity of the PI3K pathway.
A.
B.
C.
D.
E.
F.
Expression of PIK3CA in Breast Cancer
The UALCAN-TCGA breast invasive carcinoma dataset is used to plot the non-mutated expression levels of PIK3CA based on sample types, cancer subtypes, menopause status, TP53 mutation status, age, gender, cancer stage, and race (Figures 5A-G). For sample types, PIK3CA non-mutated expression levels in transcripts per million are compared between the normal tissue and primary tumor tissue (Figure 5A). The comparison between the two is statistically significant (P= 1.62447832963153E-12). The expression of non-mutated PIK3CA is much greater in the normal tissue than the primary tumor tissue. This displays that PIK3CA when mutated with high expression would be negatively associated with being diagnosed with breast cancer. The major subtypes comparison resulted in no significant difference between Luminal-vs-HER2 Positive or HER2 Positive-vs-TNBC12. However, there is a significant difference between the Normal-vs- Luminal, Normal-vs-HER2 Positive, Normal-vs-TNBC, and Luminal-vs-TNBC (P<0.05). HER2 Positive has the lowest PIK3CA non-mutated expression levels followed closely by Triple-Negative breast cancer (TNBC). PIK3CA non-mutated expression is analyzed in different stages of menopause. There is no significant difference until the Perimenopause-vs-Pre – menopause comparison (P= 4.953900E-02) meaning that Pre-menopause has a greater median expression level than Perimenopause. The next category PIK3CA non-mutated expression is analyzed is the patient’s age. There is no significant differentiation between the ages; however, there is significant comparison between Normal-vs-Age(41-60Yrs), Normal-vs-Age(61-80Yrs), and Normal-vs-Age(81-100Yrs) (P<0.05) (Figure 5D). For race, there is only statistically significant data in between the normal tissue and other values; however, there is nothing in between the values themselves (Figure 5F). PIK3CA non-mutated expression is also analyzed based on cancer stage and progression (Figure 5E). Stage 4 has the lowest expression level and all of the stages have significant differences with the normal tissue (P<0.05) but not with each other.
Discussion
PIK3CA provides instructions for making the p110 alpha protein and is in the PI3K family. Differing from other phosphoinositide kinases (PIKs), PIK3CA has structural and statistical differences which give it an important role in cell fate decisions through the PI3K/AKT/mTOR pathway. Findings from Figure 4 suggest that the PPI network of PIK3CA along with the PI3K/AKT/mTOR pathway should be considered in order to better understand the role of PIK3CA in breast cancer.
Survival curve data of breast cancer with regards to PIK3CA, including overall survival rate, disease free rate, and OS based on high/low expression was analyzed in Figures 2A-2C. PIK3CA was not found to be statistically significant with disease free rates; however, there was a minor significance for overall survival rate with HR:1.28 ( 95% confidence interval 1.16-1.42) , however; the data cannot be reported as clinically significant due to the P-value (0.557) being higher than 0.05. The OS based on high expression of PIK3CA was found to produce a higher probability of overall survival rate for breast cancer compared to low expression of PIK3CA. (log rank P= 2.1e-06) (Figure 2C) .
The DepMap data (Figures 3A, 3B and 3C) displays that many different cellular lines are affected by the expression of the gene PIK3CA. The cellular lines are in variety with each being unique and in different subtypes and cancer types. For breast cancer, there are the most negatively affected cellular lines out of all cancer types. This data suggests that PIK3CA is negatively associated in concluding the mortality and gene dependency of a variety of different genes specifically in breast cancer13.
The PPI between PIK3CA and PIK3R1 is an extremely strong connection with a value of .998 (Figures 4A and 4B). It is significant because the connection displays a mutually exclusive pattern which eventually leads to the oncogenesis and hyperactivity of the PI3K pathway. The PI3K/AKT/mTOR is important in regulating the cell cycle which is the underlying factor of cancer. According to this data, the connection between the PIK3CA gene and PIK3R1 gene plays an important role in cancer and is negatively associated with cancer diagnosis.
PIK3CA’s non-mutated expression level, in transcript per million, is statistically significant between the tumor and normal samples (Figure 5A). To add on, the expression of non-mutated PIK3CA when grouped according to subtypes displays that HER2 positive and TNBC show the lowest PIK3CA non-mutated expression levels compared to the other subtypes (Figure 5B). TNBC is commonly known as the most malignant subtype of BRCA which aligns with the data as high mutated expressions of PIK3CA were found to be associated with bad outcomes while high non-mutated expressions of PIK3CA were found to be associated with better outcomes of BC. The expression of PIK3CA when grouped by stage of cancer gradually lowers as the cancer stage progresses, with stage 4 having the lowest expression of PIK3CA (Figure 5E). This suggests that PIK3CA expression might be negatively associated with cancer progression as a whole.
This study advances the research of breast cancer as it was found that the presence of PIK3CA in the human body results in a greater possibility of breast cancer. Therefore, medical specialists are able to look for the gene rather than monotonous and lengthy processes to determine breast cancer presence in patients. This method of collecting data is very accurate because the online resources have a variety of data sets. However, because some of the results are from years ago, there is a chance that the data could be a bit outdated. The research hypothesis was supported according to the data that found that high PIK3CA mutated expression would be associated with a lower overall survival rate, making it negatively connected with breast cancer progression and a valuable biomarker for breast cancer. An opportunity for future research on this topic, as an example, could be diving deeper into how to prevent PIK3CA from forming to begin with. Because this study was conducted with available online resources, there is a limitation that the data should be more specific and less counterintuitive. This research highlights the importance of continuing to fund and support breast cancer research because the fight is not even close to over.
Methods
The purpose of this project is to determine the role the oncogene PIK3CA plays in affecting the malignancy and progression of breast cancer as a whole. In order to determine this, data mining was used specifically focused on using online tools available14. The first database used was the UCSC genome browser in order to determine the location of the gene (chromosome). Also, cBioPortal’s TCGA firehose legacy dataset was used to find hotspots, mutation analysis, alteration frequency, and survival curves. The Kaplan Meir plotter was used in addition to cBioPortal to highlight the significance of the change in survival curves of breast cancer cases with the gene PIK3CA involved. DEPMAP was used in order to analyze the gene dependency of cell lines with PIK3CA through the CRISPR/CAS9 method/data. STRING was used for protein-protein interaction analysis and finding what other genes in the PI3K family worked with PIK3CA to affect breast cancer. Finally, the UALCAN database and the TCGA breast invasive carcinoma dataset was used in order to determine the expression of PIK3CA in different forms of BRCA (breast cancer). Online databases were used in order to use general and complex information about PIK3CA in order to find the relationship between it and breast cancer. The measurement tools provided by the databases were the ones used for this research; however, many of them were graphs. The steps for completing this research were choosing a reliable research tool, applying the specific gene and cancer into the data given, and comparing the results to other types of cancers and genes. This process was repeated for all the given bioinformatics tools.
Acknowledgements
I would like to thank Dr. Begum-Akman from Cambridge University and Dr. Taner Tuncer from Cambridge University for giving me some guidance while conducting this project.
References
- Harbeck N, Penault-Llorca F, Cortes J, et al. Breast cancer. Nat Rev Dis Primer. 2019;5(1):66. doi:10.1038/s41572-019-0111-2 [↩]
- Sun YS, Zhao Z, Yang ZN, et al. Risk Factors and Preventions of Breast Cancer. Int J Biol Sci. 2017;13(11):1387-1397. doi:10.7150/ijbs.21635 [↩]
- Wu S, Zhu W, Thompson P, Hannun YA. Evaluating intrinsic and non-intrinsic cancer risk factors. Nat Commun. 2018;9:3490. doi:10.1038/s41467-018-05467-z [↩]
- Orrantia-Borunda E, Anchondo-Nuñez P, Acuña-Aguilar LE, Gómez-Valles FO, Ramírez-Valdespino CA. Subtypes of Breast Cancer. In: Mayrovitz HN, ed. Breast Cancer. Exon Publications; 2022. Accessed January 22, 2024. http://www.ncbi.nlm.nih.gov/books/NBK583808/ [↩]
- Models that matter. Mol Cell. 2023;83(3):315-316. doi:10.1016/j.molcel.2023.01.001 [↩]
- Reinhardt K, Stückrath K, Hartung C, et al. PIK3CA-mutations in breast cancer. Breast Cancer Res Treat. 2022;196(3):483-493. doi:10.1007/s10549-022-06637- [↩]
- Pereira B, Chin SF, Rueda OM, et al. The somatic mutation profiles of 2,433 breast cancers refines their genomic and transcriptomic landscapes. Nat Commun. 2016;7:11479. doi:10.1038/nco mms11479 [↩]
- Yang J, Griffin A, Qiang Z, Ren J. Organelle-targeted therapies: a comprehensive review on system design for enabling precision oncology. Signal Transduct Target Ther. 2022;7(1):1-27. doi:10.1038/s41392-022-01243-0 [↩]
- Redman M, King A, Watson C, King D. What is CRISPR/Cas9? Arch Dis Child Educ Pract Ed. 2016;101(4):213-215. doi:10.1136/archdischild-2016-310459 [↩]
- Riquelme I, Tapia O, Espinoza JA, et al. The Gene Expression Status of the PI3K/AKT/mTOR Pathway in Gastric Cancer Tissues and Cell Lines. Pathol Oncol Res POR. 2016;22(4):797-805. doi:10.1007/s12253-016-0066-5 [↩]
- Chen L, Yang L, Yao L, et al. Characterization of PIK3CA and PIK3R1 somatic mutations in Chinese breast cancer patients. Nat Commun. 2018;9(1):1357. doi:10.1038/s41467-018-03867-9 [↩]
- Lu XQ, Zhang JQ, Zhang SX, et al. Identification of novel hub genes associated with gastric cancer using integrated bioinformatics analysis. BMC Cancer. 2021;21:697. doi:10.1186/s12885-021-08358-7 [↩]
- Stanislawska I, Liwinska W, Lyp M, Stojek Z, Zabost E. Recent Advances in Degradable Hybrids of Biomolecules and NGs for Targeted Delivery. Molecules. 2019;24(10):1873. doi:10.3390/molecules24101873 [↩]
- Mrozek D, St?pie? K, Grzesik P, Ma?ysiak-Mrozek B. A Large-Scale and Serverless Computational Approach for Improving Quality of NGS Data Supporting Big Multi-Omics Data Analyses. Front Genet. 2021;12:699280. doi:10.3389/fgene.2021.699280 [↩]