Development Of a Deep Learning Model to Accurately and Efficiently Recognize Sentinel Lymph Node Metastases for Histopathological Staging and Prognosis of Metastatic Breast Cancer

0
580

Abstract

Objective: Breast cancer (BC) is the most prevalent cancer and is a global public health challenge. Current approaches for diagnosis of lymph node metastases are time-consuming and increase the workload of pathologists. Deep learning (DL) algorithms with more speed and accuracy have a huge potential to assist pathologists in making the proper diagnosis. This study aimed to develop a deep Learning convolution Neural Networks (CNN) algorithms to identify metastatic areas located on sentinel lymph node scans with metastatic breast cancer with a precision and accuracy of over 80%.

Method: The dataset consisted of 200,000 whole-slide images of patient’s sentinel lymph nodes from an open-access database called Kaggle and was classified into two cohorts: Test and training. Appropriate models were created utilizing different algorithms for CNN and the performance of the finalized model was tested against 57,000 unlabeled images

Results: I successfully developed & validated a deep Learning algorithm to accurately and efficiently identify metastatic areas located on digital pathology scans of lymph nodes. The model performed exceptionally for all the performance metrics evaluated with an accuracy (91%), precision (89% and 94%), recall (83% and 96%), AUC % of 96.6%, and f1-score (88% and 93%).

Conclusion: Early detection and timely therapeutic intervention with this improved protocol can improve the 5-year survival rate of people living with MBC from 4% in 1990 to 40%. This model can be applied to any other cancers involving lymph node metastases like Pancreatic cancer, Colorectal cancer & prostate cancer.

Keywords: Machine learning, Pathologists, Diagnosis, Digital Pathology, Therapeutic Interventions, Early Detection, Histopathological images.

Introduction

Breast cancer (BC) is the most prevalent cancer among women and is a global public health challenge1. The rate of BC incidence shows no declining prospect and in 2021, an estimated 281,550 new cases of invasive BC were expected to be diagnosed in women in the U.S. including 2650 new invasive cases, with an estimated death toll of 43600 women2. Cancer metastases occurs when cancer cells spread through the lymphatic system. The sentinel lymph nodes (SLN) are the first lymph nodes to which the tumor spreads from the primary tumor.

Recognition of lymph node metastases (LNMets) is essential for pathological staging, prognosis, and adoption of appropriate treatment strategy in BC patients3. Previously, dissecting a clinically positive axillary lymph node (ALND) was the gold standard to determine staging and achieving localized control of the disease in patients with breast cancer. More recently clinicians have moved away from this approach due to the associated comorbidities of lymphedema, restricted arm and shoulder movement and extracapsular extension or spread of the tumor outside the lymph-node capsule. Current guidelines recommend the biopsy of SLNs to identify candidates using appropriate mapping techniques and proceed to ALND if specific criteria are met.

In the operating room, surgeons use a gamma probe to locate the SLN with the help of a lymphatic map. The SLN is removed and checked for radioactivity outside the body before being sent out to the pathologist for further examination. Pathologists then recut the lymph nodes and section them at many levels using immunohistochemistry and molecular staining to visualize metastases that were not able to be detected by a single & simple H&E section4. This approach allows pathologists to identify micro-metastases that were previously not visualized.

Metastases detected by histological examination can be classified into macro-metastases, if they have a diameter greater than or equal to 2 mm, micro-metastases if they are between 0.2 and 2 mm, and isolated tumor cells (ITCs) in case of a total diameter of less than 0.2 mm5. This approach of diagnosis allows pathologists to determine the aggressiveness of the cancer at a cellular level, but also increases the workload and complexity substantially during histopathologic cancer diagnosis. With cancer causing 10 million deaths in 2023 and rising patient awareness, regular screening is becoming more important, increasing pathologists’ workload. To manage this workload and prioritize malignant cases, computer-based screening of whole slides is becoming essential. Therefore, artificial intelligence-assisted diagnostic protocols have become a necessity as they focus equally on efficiency and accuracy6. Advances in digital imaging techniques for the assessment of pathology images using computer vision and machine learning methods could automate some of the tasks in the diagnostic pathology workflow. Such automation could be beneficial to obtain fast and precise quantification, reduce observer variability, and increase objectivity7.

Improved deep learning adapted protocol workflows that assist pathologists in conducting a detailed analysis of various samples can help identify metastatic disease early, as metastasis is the principal cause of fatalities in cancer patients.

Results

To measure the pre-trained and hyperparameter tuning classification performance, the whole dataset was divided into testing and training groups of data. After network training, the CNN model was converted to a fully convolutional network to allow fast application to the whole slide image. Figure 1 shows examples of images of lymph node sections used for the acquisition from Kaggle. Applying a fully convolutional network to such a whole slide image resulted in a likelihood map where each pixel has a continuous likelihood between 0 and 1 of containing cancer. The likelihood map generation time was approximately 5 hours.

Figure 1: Histopathological Images of Sentinel Lymph Nodes Sections Used for Analysis (Images courtesy of the Kaggle publicly available Dataset)

A confusion matrix or an error matrix, was used to calculate precision and recall.

Figure 2: Shows a Heat Map for the CNN Model

Figure 2 shows a confusion matrix heatmap. Each row of the heatmap represents the instances in an actual class while each column represents the instances.

The confusion matrix in Figure 2 shows that

  • 25184 images were predicted to have no cancer and were correct.
  • 14798 images were predicted to have cancer and were correct.
  • 3022 images were predicted to have no cancer and were incorrect.
  • 995 images were predicted to have no cancer and were incorrect.
Figure 3: (A)Shows the Accuracy and (B) Loss for the Training and Testing of the Model

An epoch in machine learning refers to a singular iteration during the training process where the entire training dataset is utilized. It is a fundamental unit of measurement during the training phase. The number of epochs is a hyperparameter that determines how many times the learning algorithm will work through the entire training dataset. Each epoch involves adjusting the model’s weights based on the errors computed during the forward and backward passes. Accuracy is a measure of how well a model is performing on the training or validation datasets. As training progresses through multiple epochs, accuracy typically increases, indicating improvements in the model’s ability to make correct predictions. On the other hand, loss is a measure of the model’s prediction error during training. It quantifies how far off the model’s predictions are from the actual values. The goal during training was to minimize the loss. As the model learns from the data through multiple epochs, the loss should decrease, indicating that the model is improving its ability to make more accurate predictions.

Figure 3A shows that for both training and testing, the accuracy kept increasing as the number of epochs increased. Figure 3B shows that the train and test loss overall decreased with increasing epochs. Additionally, during the training and testing, there weren’t any signs of overfitting, as the accuracy never started to decrease.

The classification performance was assessed using the Area Under the Curve (AUC) parameter and a Receiver Operating Characteristic (ROC) analysis to validate the results. The AUC-ROC curve serves as a performance metric for classification problems across diverse threshold settings. The ROC is a probability curve, and the AUC signifies the extent of separability, indicating the model’s ability to differentiate between classes. A higher AUC implies enhanced predictive capabilities in distinguishing between classes 0 and 1, corresponding to patients with and without the disease8.

Figure 4: ROC analysis for breast cancer classification in the CNN Model

The ROC curve shown in Figure 4, is plotted with True Positive Rate (TPR) against the False Positive Rate (FPR) where TPR is on the y-axis and FPR is on the x-axis. The CNN model, optimized with hyperparameter tuning for breast cancer classification, achieved an AUC of 0.966, signifying outstanding discriminative ability between the two classes.

Precision is a metric used in the evaluation of classification models, and it measures the ability of a model to correctly identify positive instances (relevant data points) out of the total instances it predicted as positive. The precision is calculated using the following formula:

Precision = True Positives/True Positives + False Positives

A higher precision indicates that the model is better at avoiding false positives, which are instances incorrectly classified as positive. Along with Precision other metrics, such as recall, accuracy, and F1 score, should also be considered to get a comprehensive understanding of a model’s performance. The balance between precision and recall depends on the specific goals and requirements of the application9.

Recall measures the ability of a model to correctly identify all the positive instances in the dataset. The recall is calculated using the following formula:

Precision = True Positives/True Positives + False Negative

Classes  PrecisionRecallF1 ScoreAccuracy
0  89%96%93% 
1  94%83%88%91%
Table 1: Shows the Precision, Recall, F1-score, and Accuracy of the Model

As shown in Table 1 the CNN Model had an excellent precision score of 89% and 94% for the Class 0 and the Class 1 groups, respectively. The CNN Model had an excellent recall score of 93% and 83% for the Class 0 and the Class 1 groups, respectively. When using classification models in machine learning, a common metric used to assess the quality of the model is the F1 Score.  As shown in Table 1 the CNN Model had an excellent F1 score of 93% and 88% for the Class 0 and the Class 1 groups, respectively. Accuracy is the ratio of the total number of correct predictions made versus the total number of predictions made.  As shown in Table 1 the CNN Model had an excellent Accuracy score of 91%.

Methods

A publicly available dataset from Kaggle consisting of a large collection of 220,000 images was used for the analysis. The data was split into two subsets: train and test. The steps shown in Figures 5 & 6 were followed for the CNN model training process and the model architecture. The values of the dataset were properly labeled as either 0 or 1, meaning no cancer or metastatic breast cancer (MBC) respectively. The dataset had a train-test split of 80% for training and 20% for testing. For training, 104,717 images were in class 0, and 71,284 images were in class 1. For testing, 26,179 images were in class 0, and 17,820 images were in class 1. The appropriate models were created utilizing different algorithms for convolution Neural Networks (CNN). With the CNN model, different parameters were experimented with through trial-and-error. In order to optimize the model architecture for a higher accuracy, the train loss and validation loss were examined over multiple runs of the model with the same number of epochs. Convolutional layers were added to the initial version of the model to minimize underfitting and help the model capture complex patterns in the image data.

Figure 5: The CNN Model Training Process

(a) number of layers,

(b) number of filters per layer,

(c) number of epochs,

(d) batch size amongst other parameters to determine the best parameters to use.

The code was entirely done in Python, using the machine learning library sci-kit-learn and the deep learning framework TensorFlow for the Neural Networks training. A test dataset was used to evaluate the model’s performance. The performance of the trained model was assessed using the following classification metrics: confusion matrix, accuracy, precision, and recall. A positive label indicates that the center 32x32px region of a patch contains at least one pixel of tumor tissue. Tumor tissue in the outer region of the patch does not influence the label. This outer region is provided to enable fully convolutional models that do not use zero-padding, to ensure consistent behavior when applied to a whole-slide image. The model architecture contained 606,898 trainable parameters. The Adam optimizer was used over 10 epochs with a default learning rate of 0.001.

Figure 6: The CNN Model Architecture
Layer TypeInput ShapeActivation FunctionOutput Shape
conv2d_1(96,96,0)ReLU(94,94,16)
conv2d_2(94,94,16)ReLU(92,92,16)
max_pooling2d_1(92,92,16)(46,46,16)
dropout = 0.2(46,46,16)(46,46,16)
conv2d_3(46,46,16)ReLU(44,44,32)
conv2d_4(44,44,32)ReLU(42,42,32)
max_pooling2d_2(42,42,32)(21,21,32)
dropout = 0.2(21,21,32)(21,21,32)
conv2d_5(21,21,32)ReLU(19,19,64)
conv2d_6(19,19,64)ReLU(17,17,64)
max_pooling2d_3(17,17,64)(8,8,64)
dropout = 0.2(8,8,64)(8,8,64)
Flatten(8,8,64)(4096)
dense_1(4096)ReLU(128)
dropout = 0.2(128)(128)
dense_2(128)ReLU(64)
dropout = 0.2(64)(64)
dense_3(64)ReLU(32)
dense_3(32)Softmax(2)
Table 2: Training Parameters for the Model

Discussion & Conclusion

In this study, we successfully developed and validated a deep learning algorithm to accurately and efficiently identify metastatic areas located on digital pathology scans of lymph nodes exceeding my hypothesis criteria of over 80% accuracy & precision. The final performance metric analyzed was f1-score (93% for class 0 and 88% for class 1). The f1-score is the most accurate metric to determine the model performance as there is more data for no cancer than MBC, making the dataset imbalanced. The F1 Score for Class 1 (88%) is lower than that for Class 0 (93%), highlighting the imbalance in performance between the two classes. The model performed exceptionally well for all the performance metrics evaluated. Precision for Class 1 is high (94%), meaning that when the model predicts MBC, it is correct 94% of the time. Recall for Class 1is relatively lower (83%), indicating that the model misses some cases of MBC. Future work will increase the weight of the MBC class during training to make the model more sensitive to the MBC cases. An alternative approach we could take would be to reduce the number of samples for class 0 to balance the dataset better.

StudyTechnique UsedAccuracy
Bayramoglu et al.    Single-task CNN model83.1%
Bayramoglu et al.  Multi-task CNN model82.1%
Das et al.  Multiple instance learning variation of the CNN model89.5%
My MethodCNN model combining convolutional layers, pooling, dropout, dense layers, and activation functions91%
Table 3: Shows the Accuracy of the CNN Model Used in this Study Compared to Other Models

Table 3 shows the comparative accuracy of different methods used to classify sentinel lymph nodes of breast cancer using histopathology images F10. used two models a single-task CNN and a multi-task CNN model to classify the images. A total of 7909 images, 2,480 benign and 5,429 malignant images. were used for this study and the dataset had a train-test split of 80% for training and 20% for testing. The classification accuracy of the single-task and multi-task CNN models were 83.1% and 82.1% respectively11 used a multi-instance-learning variation of the CNN model also using a total of 7909 images and a train-test split of 70% for training and 30% for testing. The classification accuracy of this model was 89.5%. My method performed better than the Bayramoglu et al. models and the Das et al. model. The dataset used for my study had a much larger number of images which tends to improve the classification accuracy due to improved model training. More data points allow the model to learn from different aspects of the images, improving its ability to make accurate predictions, additionally, the model is less likely to overfit. This however is only true when the larger datasets capture a wider variety of features, the images are of high quality and the dataset is well-annotated and balanced. This model is slightly biased towards predicting no cancer due to class 0 representing 59% of the dataset. However, this bias represents real-life data, as around 60–70% of sentinel lymph nodes do not contain any metastases. Classification accuracy for the MBC group can be improved by increasing the weight of the MBC group during training, and this approach will also be tried in future work.

Determining sentinel lymph node metastases by traditional manual methods is tedious and requires the manual inspection of several sections of the lymph node for micro-metastases and macro-metastases. The improved deep learning model adapted protocol proposed in this study can augment the efficiency, and accuracy and diminish the cost of cancer diagnosis for pathologists allowing them to conduct detailed analysis on various BC samples and help in identifying metastatic disease early. Using this improved patient prognosis deep learning (DL) algorithm workflows with better speed and accuracy will assist pathologists in determining LNMets, and adoption of appropriate treatment strategy in BC patients resulting in better outcomes.

An estimated 42,780 people (42,250 women and 530 men) deaths are expected to occur from breast cancer in the United States in 2024 and protocols like the one developed in this study will allow for early detection and timely therapeutic interventions that can improve the 5-year survival rate of people living with metastatic breast cancer. This model can also be applied to histopathological images for any other cancers involving lymph node metastases like pancreatic cancer, colorectal cancer, and prostate cancer to name a few.

For future work, transfer learning approaches will be applied to this model to adapt to a new, smaller dataset containing histopathological images of breast cancer sentinel lymph nodes, and classification accuracy will be determined. Additionally, this model will be applied to the classification of sentinel lymph nodes for other types of cancers that involve sentinel lymph node metastasis. This transfer learning approach on a smaller dataset can reduce training time and improve performance.

Acknowledgement

I would like to thank Mr. Scott DeRuiter & Mr. Diego Iriarte Sainz for their guidance and support during this project.

Abbreviations

  • Deep Learning – DL
  • False Positive Rate – FPR
  • True Positive Rate – TPR
  • Area Under the Curve – AUC
  • Receiver Operating Characteristic – ROC
  • lymph node metastases – LNMets
  • Convolution Neural Networks – CNN Breast cancer – BC

References

  1. Fischer, A. H., Jacobson, K. A., Rose, J. & Zeller, R. Hematoxylin and eosin staining of tissueand cell sections. Cold Spring Harb Protoc 3, (2008). []
  2. Epstein, J. I., Allsbrook, W. C., Amin, M. B. & Egevad, L. L. The 2005 International Society of Urological Pathology (ISUP) Consensus Conference on Gleason Grading of Prostatic Carcinoma []
  3. Genestie, C. et al. Comparison of the prognostic value of Scarff-Bloom-Richardson and Nottingham histological grades in a series of 825 cases of breast cancer: major importance of the mitotic count as a component of both grading systems. Anticancer Res 18, 571–6 (1998). []
  4. Weaver, D. L. Pathology evaluation of sentinel lymph nodes in breast cancer: protocol recommendations and rationale. Mod Pathol 23 Suppl 2, S26-32 (2010). []
  5. Gurcan, M. N. et al. Histopathological image analysis: a review. IEEE Rev Biomed Eng 2, 147–71 (2009). []
  6. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–44 (2015). []
  7. Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet Classification with Deep Convolutional Neural Networks. http://code.google.com/p/cuda-convnet/ []
  8. Szegedy, C. et al. Going Deeper with Convolutions. (2014). []
  9. Cire?an, D. C., Giusti, A., Gambardella, L. M. & Schmidhuber, J. Mitosis detection in breast cancer histology images with deep neural networks. Med Image Comput Comput Assist Interv 16, 411–8 (2013). []
  10. Bayramoglu, N., Kannala, J. & Heikkilä, J. Deep Learning for Magnification Independent Breast Cancer Histopathology Image Classification []
  11. Das, K., Conjeti, S., Roy, A. G., Chatterjee, J. & Sheet, D. Multiple instance learning of deep convolutional neural networks for breast histopathology whole slide classification. in 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018) 578–581 (IEEE, 2018). doi:10.1109/ISBI.2018.8363642 []

LEAVE A REPLY

Please enter your comment!
Please enter your name here