Bridging Herbal Medicines and Modern Pharmaceuticals Using Machine Learning: A Case Study of Lonicerae flos for Acne

July 31, 2025

743

Abstract

Traditional Chinese Medicine (TCM) has long relied on herbal remedies, but connecting these traditional practices to modern pharmaceutical science remains a challenge. In this study, we propose a machine learning approach to bridge that gap by identifying potentially effective compounds in traditional herbs, using acne treatment as a case study. We focused on Lonicerae flos (LF), a TCM herb traditionally used for detoxification and treating skin conditions. A Random Forest classifier was trained using publicly available molecular descriptors from PubChem and RD Kit, with known anti-acne compounds as positive samples. To address severe class imbalance, we applied data augmentation (ADASYN). Despite a limited dataset, the model achieved promising ROC-AUC performance. Feature analysis showed that molecular properties such as lipophilicity, polarity, and specific functional groups were strongly associated with anti-acne activity. Several LF compounds with high model scores were further supported by literature evidence of dermatological relevance. Our findings demonstrate the feasibility of using machine learning to screen traditional herbal compounds and highlight a scalable framework for integrating traditional medicine with modern drug discovery.

Introduction

Traditional medicines have been utilized worldwide long before the establishment of the science-based modern medicine system. Examples include Traditional Chinese Medicine (TCM), Kampo medicine, Ayurveda, and Unani medicine. A core part of these systems is the use of herbal medicines. The potency and side effects of traditional herbal medicines have been tested and observed through hun dreds, if not thousands, of years of consumption. Though generally considered less potent than modern pharmaceuticals, they still see wide usage due to consideration of their cost, availability, special efficacy, and cultural reasons. In a guideline published in 2000, the World Health Organization (WHO) claimed to see ”tremendous expansion in the use of traditional medicine worldwide” during the previous decade¹. During the COVID-19 pandemic, China’s health organizations recommended the use of TCM prescriptions to help patients recover from COVID-19².

There is, however, a foundational gap between traditional medicine and modern science. Traditional medicine is rooted in holistic and philosophical principles that emphasize the balance of body, mind, and environment. TCM, for instance, is based on concepts like Qi (vital energy), Yin and Yang (the balance of opposing forces), and the Five Elements (Wood, Fire, Earth, Metal, and Water), which are believed to influence health and disease. In contrast, modern medicine is grounded in evidence-based scientific methods, focusing on the molecular and cellular understanding of diseases, with treatment often targeting specific biological mechanisms.

Bridging this gap is crucial for the continued development of traditional medicine. Though it may take a long time to place philosophical concepts like Qi within the modern scientific framework, identifying active compounds in herbal medicines and understanding their therapeutic efficacy are good starting points. With the development of technologies like High-Performance Liquid Chromatography (HPLC) and Mass Spectrometry (MS), it has become increasingly efficient to identify the component compounds within a herb. Many herbs frequently used in traditional medicine, from cranberry to ginger, have been investigated at the molecular level, placing them under the microscope of modern chemistry and biology³,⁴. Such understanding can help build connections between concepts in traditional medicine, such as ’heat’ and ’poison,’ and those in modern pharmaceuticals, such as ’hormone,’ ’virus,’ and ’infection,’ ultimately providing a scientific interpretation of theories and observations from traditional medicine.

It helps the modern pharmaceutical system as well. Drug discovery is a complex and costly process that involves several stages. It has been considered a practical strategy in the compound screening stage to start with substances found in herbs used in traditional medicines, thereby increasing the success rate in in vitro, in vivo, and clinical trials. A notable example of how traditional medicine can inspire scientific breakthroughs is the work of Nobel Prize laureate Youyou Tu, who turned to thousands of years of Chinese medical texts to find a traditional cure for malaria. Following a hint from these texts, her team focused on Sweet Wormwood, from which they ultimately extracted a compound, artemisinin, that later would save millions of lives⁵.

Establishing the connection between a herbal medicine and modern pharmaceuticals, however, is challenging, even after the constituents of the herb have been identified. A single herb can contain hundreds of active substances. Which of these substances is responsible for the herb’s efficacy as observed in traditional practice? While it is possible to subject each substance from the herb to the full drug discovery process to confirm or refute its efficacy, this approach is prohibitively expensive, impractical, and inefficient.

In this article, we propose using machine learning techniques to identify the substances in an herb that are most likely contributing to its efficacy. The basic approach involves collecting a list of compounds known to be effective treatments for disease and training a classification model to capture the characteristics of these compounds, in contrast to those that are ineffective. We then use this model to predict the likelihood of new candidate compounds being effective based on their characteristics.

We demonstrate this idea through a case study involving Lonicerae flos (LF) and its close relative Lonicerae Japonicae Flos (LJF), both are species of honeysuckle native to East Asia and are widely used in TCM prescriptions. Their flowers, based on TCM practice, have the general functions of “clearing heat” and “detoxification”, making them popular ingredients in TCM prescriptions (T. Wang, M. Chen, W. Zhang, T. Wang, Systematic review of Lonicerae japonicae flos: A significant food and traditional Chinese medicine, Frontiers in Pharmacology 13, 1013992 (2022). DOI: 10.3389/fphar.2022.1013992). Particularly, it is recorded to be effective in resolving carbuncles, swelling, and various skin diseases (S. Li, Ben Cao Gang Mu (Compendium of Materia Medica), 1596). In this case study, we used molecular information to train classification models that can identify compounds effective against acne, one of the most common skin conditions affecting high school students. According to the American Academy of Dermatology, acne affects approximately 85% of teenagers⁶. We then applied these classification models to the hundreds of compounds identified in LF and LJF.

The biggest practical challenge of this research, in terms of machine learning, is the limitation of available data. While high-quality data for chemical compounds is accessible through open-source databases such as PubChem (PubChem. PubChem Database. url: https://pubchem.ncbi.nlm.nih.gov/) and RDKit (RDKit. RDKit: Open-Source Cheminformatics Software. url: https://www.rdkit.org/), other essential data for this type of research are limited. For example, the substances identified in Lonicerae flos, one of the few traditional Chinese medicine (TCM) herbs that have been thoroughly analyzed, were manually curated from papers published in the public domain. More importantly, there is only a small number of officially recorded medicines for acne, and the author does not have access to proprietary data from pharmaceutical companies regarding medicines that were explored but never released to the market.

This data limitation constrained the types of machine learning models that could be employed in this study. Although techniques such as multi-fold cross-validation and ADASYN were adopted as best-effort strategies to mitigate these limitations, the author does not claim that the models presented in this article represent the state of the art. Potential avenues for improving the models are discussed as part of the future work. The data and code used in this study are publicly available on GitHub.

2 Related Works

2.1 Substances in Lonicerae flos

There is a wide range of technologies available for identifying the chemical substances in an herb. In chromatographic techniques, such as High-Performance Liquid Chromatography (HPLC) and Gas Chromatography (GC), the substances in herbs are mixed with a liquid or gas, and compounds are separated and identified based on their movement through a medium. This movement depends on their interactions with the medium, influenced by factors such as molecular size, shape, polarity, and absorption. Spectroscopic techniques, such as Mass Spectrometry (MS) and Infrared Spectroscopy (IR), analyze chemical substances by examining their interactions with various forms of electromagnetic radiation.

Many analyses have been conducted to identify the substances in Lonicerae Flos and Lonicerae Japonicae Flos⁷,⁸. In a survey article on LJF alone, the authors identified 212 components isolated from Lonicerae japonicae flos, including 27 flavonoids, 40 organic acids, 83 iridoids, 17 triterpenoids, and 45 other compounds. Since LF and LJF are often not distinguished in TCM texts, we treat them as a combined entity in this article and refer to both as Lonicerae Flos⁸.

2.2 Machine Learning and Its Applications

Machine learning (ML) is a field of artificial intelligence (AI) that enables computers to learn from data and make decisions or predictions. Through the analysis of large datasets, ML algorithms can uncover patterns and provide insights that help address complex problems, all without needing explicit instructions for each task. Machine learning has been applied to many medical domains, e.g., drug discovery, medical record tracking and analysis⁹, and patient management for personalized treatment. In drug discovery, ML models are used to predict the efficacy of new compounds, identify potential drug candidates, and optimize the drug development process by analyzing vast datasets of chemical structures and biological interactions. By doing so, it significantly reduces the time and cost involved in bringing new drugs to market¹⁰.

The work presented in this article aims to bridge traditional medicines with modern pharmaceuticals using machine learning. Machine learning models are often regarded as black boxes, where the emphasis is placed more on the input (e.g., the characteristics of a compound) and the output (e.g., the efficacy and toxicity of the compound) rather than on how the model makes the complex mapping between them. This approach complements traditional medicine, allowing us to focus on its efficacy without needing to fully grasp its holistic and philosophical theories, which have been challenging to interpret through modern science. Furthermore, although not explored in this work, we believe machine learning could also serve as a valuable tool for analyzing the prescriptions and texts of traditional medicines, many of which were written hundreds or even thousands of years ago, before modern herb classification systems were established. This may help uncover the most essential elements of traditional medicine.

3 Compound Screening with Machine Learning Models

3.1 Data Sources and Tools

All data in this work are collected from public databases, open-source toolkits, and published papers.

The names of the compounds identified in Lonicerae japonicae flos and Lonicerae flos are manually collected from papers in the public domain¹¹. The names of the compounds used for certain classes of treatments in modern pharmaceuticals are collected from the ATC database¹² and through keyword searches (e.g., searching “acne”) on PubChem.

Information about compounds is collected through PubChem¹³ and RDKit⁵. PubChem is the world’s largest collection of freely accessible chemical information. It allows searching for chemicals by name, molecular formula, structure, and other identifiers. It provides information such as chemical and physical properties, biological activities, safety and toxicity information, patents, literature citations, etc., about these chemicals. In this work, we use PubChem to search for compounds by name and obtain their SMILES (Simplified Molecular Input Line Entry System) code and InChI (International Chemical Identifier), which are later used for compound lookup in the RDKit database. RDKit is an open-source software toolkit used for cheminformatics, which is the application of computational methods and software tools to solve problems related to chemistry. Among the many functions that RDKit provides, this work mainly takes advantage of the “molecular descriptors,” which are various quantitative values that describe the properties of molecules and are used as additional features for the training of our models. Additionally, all chemical structure images presented in this article are also generated using RDKit.

3.2 Acne: Pathology and Treatments

Acne starts with the overproduction of sebum (oil) by the sebaceous glands. This overproduction is often triggered by hormonal changes, particularly androgens. Hair follicles become clogged due to the accumulation of dead skin cells and excess sebum. This creates an environment conducive to the growth of P. acnes, a bacterium that normally resides on the skin. The immune system responds to bacterial overgrowth and clogged follicles by initiating an inflammatory response. This results in redness, swelling, and pus formation in the affected areas. At different stages, the formation of acne lesions includes non-inflammatory lesions (e.g., blackheads and whiteheads), small inflamed bumps, papules with pus inside, and cysts that can lead to scarring¹⁴.

Several classes of medications are used to treat acne, depending on the severity and type of acne. These medications are either topical or oral and target different steps in the pathogenesis of acne—regulating hormones, exfoliating the skin, unclogging pores, killing bacteria, and helping to reduce inflammation. In the ATC (Anatomical Therapeutic Chemical) classification system, a system developed by WHO for drug classification, they are listed under the Level-2 category “D10 – Anti-acne preparations”¹⁵. In the Nov 2023 release of the ATC database, 36 Level-5 classes (chemical substances) are listed under this category, e.g., Tretinoin, Clindamycin, and Doxycycline, among a total of 5,372 Level-5 classes included in the database¹⁶.

Source	Feature	Description
PubChem	atom stereo count	The number of stereocenters present in a molecule. It influences the drug’s biological activity, efficacy, and safety.
PubChem	volume 3d	Knowing the 3D volume of a molecule helps in understanding how it fits into biological targets, such as enzymes or receptors.
PubChem	xlogp	An indicator of how well the molecule can dissolve in fats versus water
RDKit	fr sulfide(sulfonamd,sulfone …)	Number of thioethers, sulfonamides, sulfone groups, etc.
RDKit	VSA EState3	Sum of the surface areas of atoms with a specific electronic environment within a molecule, which is useful in predicting molecular interactions.

Table 1: Example Features from PubChem and RDKit

3.3 Training Data and Approaches for Models

In the context of machine learning, compound screening is a classification task. Compounds known to be effective for the treatment of acne (e.g., compounds classified under the Level-2 category ”D10 – Anti acne preparations” in the ATC database) were used as positive training samples. Other compounds (e.g., compounds from the ATC database or random compounds from the PubChem database) were used as negative training samples. We collected 230 features from PubChem and RDKit. Due to space constraints, we list some example features in Table 1 to illustrate the scope and nature of these features. We were not able to find all compounds in the PubChem database using medicine names. In the end, after some data cleaning, our training dataset contains 27 positive samples and 4,343 negative samples.

The author experimented with various commonly used machine learning algorithms and ultimately selected Random Forest¹⁷, due to the nature of the data. Tree-based models are generally considered more expressive than linear models and are capable of handling categorical features when they are present. Random Forest, which aggregates the outputs of multiple decision trees, reduces the risk of overfitting to a single tree and provides robust classification performance. Another model in the tree-based family, Boosted Trees, tends to be more prone to overfitting compared to standard Random Forest. Neural Networks¹⁸were not chosen because, given the dataset size used in this study, they do not offer a performance advantage over tree-based models and lack the interpretability that tree models naturally provide

There are two major challenges in training a classification model with this set of data. First, the training data is highly imbalanced — there are only 27 positive samples versus 4,343 negative samples. This makes it difficult to use the limited positive samples for both training and evaluation. Second, the dataset has 230 features, a relatively large number compared to the size of the training data. Not all features will have strong predictive power, and too many features with a relatively small amount of data can make the model prone to overfitting. We address these challenges with the following training process:

Step 1: Exploring the best training parameter. In this step, we used the following techniques to handle the imbalanced dataset.

Multiple-fold cross-validation. The training set is split into 5 parts. We train the model 5 times, each time with a different set of 4 parts for training and the other part for validation. The overall validation performance is the average of the 5 validation scores.
We used a technique called ADASYN (Adaptive Synthetic Sampling) (Haibo He et al. “ADASYN: Adaptive synthetic sampling approach for imbalanced learning”. In: 2008 IEEE International Joint Conference on Neural Networks¹⁹, ²⁰ to increase the ratio of positive samples to 25% by generating high-quality synthetic samples from the dataset.
We instructed the training process to assign a higher weight to positive samples so that the overall weight of positive and negative samples is balanced.
Adjusting training parameters and repeating the above steps to pick the parameters generating the best ROC-AUC²¹.

Step 2: Training the random forest model with the full dataset

Step 3: Trimming features. In this step, we examine the feature importance reported by the training process and remove features that have very low importance, e.g., smaller than 0.001.

Repeat the previous steps

In this design, we use multiple-fold cross-validation, a method where the data is split into several smaller subsets, or ‘folds,’ to explore the best training settings by evaluating the model’s performance across different subsets of data. This method allows us to assess the model’s generalizability, reducing the risk of overfitting by ensuring the model is tested on multiple partitions of the data. During the split of folds, we use stratified splitting²²,²³ to maintain the original distribution of classes, ensuring the positive samples are evenly distributed across all five folds. By doing so, we ensure that each fold is a good representation of the entire dataset, preventing any bias toward one class and providing a more robust evaluation of the model’s capabilities.

Another important aspect of the design is the use of ADASYN to increase the ratio of positive samples. ADASYN generates synthetic data points from existing data points, particularly by creating new samples in regions where the data is more difficult to classify. This allows the model to better distinguish between the classes and improves performance on the underrepresented positive samples. With this approach, we increase the proportion of positive samples from 27/(4343 + 27) = 0.617% to 15%.

Lastly, since a dataset with 25% positive samples is still imbalanced (though less severely), we assign class weights to each class to balance the overall weight. In machine learning practice, this approach often helps the model to learn faster and better by penalizing misclassifications of the minority class more heavily, encouraging the model to pay more attention to underrepresented samples. This leads to improved performance and reduces bias toward the majority class, ultimately yielding more balanced predictions.

Due to the imbalanced dataset, we use ROC-AUC instead of Accuracy as the model performance metric. ROC-AUC measures a classification model’s performance across the whole spectrum of threshold values and is thus a more robust metric. The best validation ROC-AUC²¹from the above training process is 0.8029, with a final model containing 109 features after iterations of feature selection.

Figure 1: (a) Distribution of Feature Importance (b) Distribution of Prediction Scores

Figure 1a shows the distribution of feature importance values. The 15 features with the highest importance are listed in Table 2, along with explanations of their meanings and why they matter for acne treatment. Overall, it is reasonable for the model to select these as top-ranking features. Characteristics such as lipophilicity (e.g., SMR.VSA), molecular flexibility (e.g., rotatable bonds), and polarity (e.g., heteroatoms) influence skin penetration and local activity. Structural elements like aromatic heterocycles, tertiary amines, and nitrogen atoms are commonly found in compounds that target bacterial protein synthesis or disrupt membranes. Retinoid-related scaffolds also share electronic and aromatic properties that modulate cell turnover and inflammation. These features are important because they influence how well a compound penetrates the skin, how it interacts with acne-causing bacteria, how it modulates inflammation or sebum production, and how drug-like and effective the compound is overall.

Feature	Importance	Meaning and Why It Matters for Acne Treatment (This column was renamed and its content is significantly enriched.)
SMR VSA3	0.1017	Surface area weighted by molar refractivity(lipophilicity proxy). Determines how well the drug penetrates sebaceous glands and interacts with lipid-rich skin layers. Important for topical delivery.
fr NH0	0.0606	Number of tertiary amines. Tertiary amines are common in antibiotics and anti-inflammatory drugs, aiding in membrane disruption or receptor targeting.
NumAromatic Heterocycles	0.0405	Number of aromatic heterocyclic rings. These structures are found in many antimicrobials and retinoids. Enhance binding affinity, stability, and bioactivity.
undefinedatom stereo count	0.0403	Number of stereocenters (chiral centers) with undefined configuration. Chirality can affect drug-target fit, influencing potency and metabolism. Critical in receptor-based treatments like retinoids.
fr ether	0.0338	Number of ether groups. Improves solubility and membrane permeability. Key for topical application and skin absorption.
NumAliphatic Heterocycles	0.0338	Number of non-aromatic rings containing heteroatoms (e.g., N, O, S); common in pharmacophores of antimicrobial and anti-inflammatory drugs.
qed	0.0285	Quantitative Estimate of Drug-Likeness. A metric used to evaluate the ”drug-likeness” of a molecule based on several physicochemical properties.
fr Ar N	0.0257	Number of nitrogen-containing aromatic rings. Nitrogen-rich aromatic systems are typical in antibiotics and anti-inflammatory agents, aiding enzyme or receptor binding.
NumRotatable Bonds	0.0247	Number of rotatable single bonds. Influences flexibility, which affects bioavailability, skin penetration, and target binding.
SMR VSA10	0.0240	Similar to SMR VSA3 but with different ranges of SMR.
BCUT2DMWHI	0.0203	Atomic mass distribution (high mass index). Describes atomic weight distribution, influencing binding characteristics and membrane interaction.
EStateVSA5	0.0166	Quantifies the contribution to the overall molecular surface area from atoms that fall within a certain EState index range. Capture the electronic environment and molecular topology, key for binding affinity and enzyme inhibition.
EStateVSA7	0.0141	Similar to EState VSA5 but with a different EState index range
NumHeteroatoms	0.0139	Number of non-carbon atoms (N, O, S, etc.) Het eroatoms are essential for hydrogen bonding, target specificity, and water solubility — all vital in acne treatment.

Table 2: Most Important Features

3.5 Screening Compounds in Lonicerae japonicae for Acne Treatment

We use the above classification model to screen the 238 compounds that have been identified from Lonicerae japonicae flos and Lonicerae flos. Figure 1b shows the distribution of the prediction scores for the 238 compounds, with each prediction score being a value in the range [0, 1], indicating the probability that the compound will have efficacy against acne. The names, prediction scores, and molecule structures of the top 9 compounds with the highest prediction scores (all in the range [0.4, 0.6]) are listed in Figure 2.

Figure 2: Compounds with highest probability to have efficacy against acne

To fully evaluate the model’s performance, it would be ideal to advance some of the top-ranked compounds into later stages of the drug discovery pipeline and experimentally verify their efficacy. However, experimental validation is resource-intensive and not feasible for the authors to pursue within this study. As an alternative, we consulted the existing literature to assess whether any known associations exist between the identified compounds and acne or related dermatological conditions.

4-Hydroxycinnamic acid, also known as p-Coumaric acid, has not been directly linked to acne treatment, but it has been investigated for use in skincare due to its antioxidant and anti-inflammatory properties²⁴. Similarly, Dihydrocaffeic acid (systematic name: 3-(3,4-dihydroxyphenyl)propionic acid) has not been specifically studied for acne, yet multiple in vitro and in vivo studies have reported its protective effects against oxidative stress and inflammation²⁵. Caffeic acid has demonstrated antimicrobial and antioxidant activity, as well as the ability to enhance collagen production—suggesting its potential for use in treating skin-related disorders²⁶. Abscisic acid has been proposed as a treatment for several skin conditions; notably, a patent has been granted for its use in managing diseases associated with microbial biofilms or dysbiosis of the skin microbiome, including psoriasis, acne, and atopic dermatitis²⁷. Lastly, p-Hydroxybenzaldehyde has shown potential as a therapeutic agent in promoting acute wound healing¹².

These findings suggest that the model is not selecting compounds at random; rather, it is identifying candidates with properties that are biologically relevant to acne pathophysiology or broader dermatological applications.

4 Future Works

One of the main limitations of this study is the nature of the dataset, which contains relatively few positive training samples. This is due to our reliance on publicly available data related specifically to acne treatments. Future work should explore the integration of larger and more diverse datasets, including those from private sources, if accessible, to train multi-label classification models. This would likely lead to more robust model performance and more meaningful biological interpretations. Future work should also take advantage of advances in Deep Neural Networks (DNNs), which are typically more effective than tree-based models when applied to large and complex datasets. The true potential of DNNs lies in their ability to bridge Traditional Chinese Medicine (TCM) and modern pharmaceutical knowledge, particularly as richer chemical, textual, and structural data become increasingly available. For example, large language models (LLMs) could be used to analyze centuries of TCM literature and medical illustrations, helping to extract consistent core insights while filtering out anecdotal or non-generalizable information. Likewise, graph neural networks (GNNs) could incorporate both the structural relationships between chemical compounds and the hierarchical taxonomy of herbs, enabling more biologically informed and interpretable models. Together, these deep learning approaches have the potential to unify traditional and modern medical knowledge, significantly advancing data-driven drug discovery.

5 Conclusion

This study demonstrates the potential of using machine learning to bridge the gap between traditional herbal medicine and modern pharmaceuticals. Focusing on Lonicerae flos (LF), a herb commonly used in Traditional Chinese Medicine (TCM), we developed a compound screening model to identify molecules potentially effective against acne, a prevalent skin condition, particularly among adolescents.

Using a Random Forest classifier trained on molecular descriptors from PubChem and RDKit, the model achieved promising predictive performance despite data limitations. Feature importance analysis revealed that properties such as lipophilicity, molecular flexibility, and the presence of key functional groups (e.g., aromatic heterocycles and nitrogen atoms) played critical roles in predicting compound efficacy. These features are biologically plausible and relevant to the pathophysiology of acne, supporting the interpretability of the model.

One of the major challenges encountered was the scarcity of positive training samples, due to limited publicly available data on approved acne treatments. To address this, we applied techniques such as ADASYN for synthetic sample generation to mitigate data imbalance. While these methods improved model robustness, future work would benefit significantly from access to larger and more diverse datasets, including proprietary pharmaceutical data.

Although experimental validation of predicted compounds lies beyond the scope of this study, literature review suggests that several top-ranked LF compounds possess pharmacological properties relevant to dermatological health, such as anti-inflammatory, antioxidant, and antimicrobial activities. This supports the model’s ability to identify biologically meaningful candidates.

Finally, this work provides a scalable and generalizable framework that can be extended to other herbs and disease domains. Future efforts may explore advanced machine learning methods such as graph neural networks and natural language models to mine centuries of TCM texts for deeper integration of traditional knowledge into modern science. Such advances hold the potential to transform traditional herbal medicine into a scientifically grounded, data-driven contributor to modern drug discovery.

Code Availability

The code used in this study is publicly available on GitHub at the following repository: https: //github.com/richardfanpks/drug_discovery.

References

World Health Organization. “WHO Guidelines on Traditional Medicine”. In: (2000). [↩]
Hui Wang et al. “Advances in the application of traditional Chinese medicine during the COVID 19 recovery period: A review”. In: Frontiers in Pharmacology 14 (2023). PMCID: PMC10994423, p. 10994423. doi: 10.3389/fphar.2023.10994423. url: https://www.ncbi.nlm.nih.gov/ pmc/articles/PMC10994423/ [↩]
Iris F. F. Benzie and Sissi Wachtel-Galor, eds. Herbal Medicine: Biomolecular and Clinical Aspects. 2nd. Available from: https://www.ncbi.nlm.nih.gov/books/NBK92773/. CRC Press/Taylor & Francis, 2011. url: https://pubmed.ncbi.nlm.nih.gov/22593937/ [↩]
World Health Organization, General guidelines for methodologies on research and evaluation of traditional medicine, https://iris.who.int/bitstream/handle/10665/66783/WHO_EDM_TRM_2000.1.pdf?sequence=1 (2000). [↩]
RDKit. RDKit: Open-Source Cheminformatics Software. url: https://www.rdkit.org/ [↩] [↩]
American Academy of Dermatology, Skin conditions by the numbers, https://www.aad.org/media/stats-numbers (2022). [↩]
Y.-R. Tang, T. Zeng, S. Zafar, H.-W. Yuan, B. Li, C.-Y. Peng, S.-C. Wang, Y.-Q. Jian, Y. Qin, M. I. Choudhary, W. Wang, Lonicerae flos: A review of chemical constituents and biological activities, Digital Chinese Medicine 1, 173–188 (2021). DOI: 10.1016/S2589-3777(19)30022-9 [↩]
L. Wang, Q. Jiang, J. Hu, Y. Zhang, J. Li, Research progress on chemical constituents of Lonicerae japonicae flos, BioMed Research International 2016, 8968940 (2016). DOI: 10.1155/2016/8968940 [↩] [↩]
S. K. Zhou, H. Greenspan, C. Davatzikos, J. S. Duncan, B. van Ginneken, A. Madabhushi, J. L. Prince, D. Rueckert, R. M. Summers, A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises, Proceedings of the IEEE 109, 820–838 (2021). DOI: 10.1109/JPROC.2021.3054390 [↩]
Suresh Dara et al. “Machine Learning in Drug Discovery: A Review”. In: Artificial Intelligence Review 55.3 (2022). Epub 2021 Aug 11, pp. 1947–1999. doi: 10.1007/s10462-021-10058-4 [↩]
SK Zhou et al. “A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises”. In: Proceedings of the IEEE 109.5 (2021). Epub 2021 Feb 26, pp. 820–838. doi: 10.1109/JPROC.2021.3054390 [↩]
BioPortal. Anatomical Therapeutic Chemical Classification. url:https://bioportal.bioontology.org/ontologies/ATC [↩]
PubChem. PubChem Database. url: https://pubchem.ncbi.nlm.nih.gov/ [↩]
Mallikarjun Vasam, Satyanarayana Korutla, and Raghvendra Ashok Bohara. “Acne vulgaris: A review of the pathophysiology, treatment, and recent nanotechnology-based advances”. In: Biochemistry and Biophysics Reports 36 (2023). eCollection 2023 Dec, p. 101578. doi: 10.1016/j.bbrep.2023.101578. url: https://pubmed.ncbi.nlm.nih.gov/38076662 [↩]
World Health Organization. Anatomical Therapeutic Chemical (ATC) Classification. url: https://www.who.int/tools/atc-ddd-toolkit/atc-classification [↩]
BioPortal. Anatomical Therapeutic Chemical Classification. Url: https://bioportal.bioontology.org/ontologies/ATC [↩]
Leo Breiman. “Random Forests”. In: Machine Learning 45.1 (2001), pp. 5–32. doi: 10.1023/A: 1010933404324. url: https://dl.acm.org/doi/10.1023/A%3A1010933404324 [↩]
Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. Cambridge, MA, USA: MIT Press, 2012 [↩]
IEEE World Congress on Computational Intelligence). 2008, pp. 1322–1328. doi:10.1109/IJCNN.2008.4633969 [↩]
imbalanced-learn. ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning. url:https://imbalanced-learn.org/stable/references/generated/imblearn.over_sampling.ADASYN.html [↩]
Receiver Operating Characteristic. url: https: //en.wikipedia.org/wiki/Receiver_operating_characteristic [↩] [↩]
Iris F. F. Benzie and Sissi Wachtel-Galor, eds. Herbal Medicine: Biomolecular and Clinical Aspects. 2nd. Available from: https://www.ncbi.nlm.nih.gov/books/NBK92773/. CRC Press/Taylor & Francis, 2011. url: https://pubmed.ncbi.nlm.nih.gov/22593937/ [↩]
Scikit-learn developers. sklearn.model selection.StratifiedKFold. https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html [↩]
Youngeun Seo et al. “p-Coumaric Acid as An Active Ingredient in Cosmetics: A Review Focusing on its Antimelanogenic Effects”. In: Antioxidants 8.8 (2019), p. 275. doi: 10.3390 / antiox8080275. url: https://pubmed.ncbi.nlm.nih.gov/31382682/ [↩]
Magdalena Dzia lo et al. “Dihydrocaffeic Acid—Is It the Less Known but Equally Valuable Phenolic Acid?” In: Antioxidants 12.5 (2023), p. 974. doi: 10.3390/antiox12050974. url: https: //www.ncbi.nlm.nih.gov/pmc/articles/PMC10216370/ [↩]
Katarzyna Zdu´nska et al. “Caffeic Acid: A Review of its Potential Use for Medications and Cosmetics”. In: Analytical Methods 6 (2014), pp. 6450–6457. doi: 10.1039/C3AY41807C. url: https://pubs.rsc.org/en/content/getauthorversionpdf/c3ay41807c [↩]
World Intellectual Property Organization. Granted Patent WO2019197580A1: Abscisic Acid for the Treatment of Skin Diseases. https://patents.google.com/patent/WO2019197580A1/en [↩]

Bridging Herbal Medicines and Modern Pharmaceuticals Using Machine Learning: A Case Study of Lonicerae flos for Acne

Abstract

Introduction