By Zeya Wang, Nanqing Dong, Wei Dai, Sean D’Rosario, Eric P. Xing
Breast cancer is one of the leading causes of death by cancer for women. Early detection can give patients more treatment options. In order to detect signs of cancer, breast tissue from biopsies is stained to enhance the nuclei and cytoplasm for microscopic examination. Then, pathologists evaluate the extent of any abnormal structural variation to determine whether there are tumors.
Since the majority of biopsies find normal and benign results, most of the manual labelling of these microscopic images is redundant. Several existing machine learning approaches perform two-class (malignant, benign) and three-class (normal, in situ, invasive) classification through extraction of nuclei-related information. (Benign lesions lack the ability to invade neighbors, so they are non-malignant. In-situ and invasive carcinoma, however, can spread to other areas, and therefore are malignant. Invasive tissues, unlike in-situ, can reach the surrounding normal tissues beyond the mammary ductal-lobular system.)
At Petuum, we want to leverage advances in machine learning to help with the breast cancer screening process. Computer-aided diagnosis (CAD) approaches for automatic diagnoses improve efficiency by allowing pathologists to focus on more difficult diagnosis cases. Our data scientists have developed a deep learning-based CAD method for breast biopsy tissue classification, which helps reduce the workload of classifying histopathological images.
Deep learning-based CAD has been gaining popularity for analyzing histopathological images, however, few works have addressed the problem of accurately classifying images of breast biopsy tissue stained with hematoxylin and eosin into different histological grades. Our team decided to tackle this problem by exploring better neural network designs to improve classification performance. We designed a loss function that leverages hierarchical information of the histopathological classes and incorporated embedded feature maps with information from the input image to maximize grasp on the global context. These led us to a system that can automatically classify breast cancer histology images into four classes: normal tissue, benign lesion, in situ carcinoma, and invasive carcinoma.
The dataset used in this project was provided by Universidade do Porto, Instituto de Engenharia de Sistemas e Computadores, Tecnologia e Ciência (INESC TEC) and Instituto de Investigação and Inovação em Saúde (i3S) in TIF format, via the ICIAR 2018 BACH Challenge. The dataset consists of 400 high resolution (2048×1536) H&E stained breast histology microscopic images. These images are labeled with four classes: normal, benign, in situ, and invasive, and each class consists of 100 images. Prior to the analysis, we performed normalization on all images to minimize the inconsistencies caused by the staining.
Our cancer-type classification framework consists of a data augmentation stage, a patch-wise classification stage, and an image-wise classification stage. After normalization, we rescale and crop each image to small patches that can be fed as input to the CNN for patch-wise classification (Figure 1).
During the training phase, the cropped patches are augmented to increase the robustness of the model as a method of regularization. A VGG-16 network with hierarchical loss and global image pooling is trained to put the patches into four classes. In the inference phase, we generate patches from each test image and combine patch classification results, through patch probability fusion or dense evaluation methods, to classify the image.
The original images are too large to be fed into the network, so we crop them to 224×224. However, cropping small patches from a 2048×1536 image at 200x magnification can break the overall structural organization of the image and leave out important tissue architecture information. Conventionally, images are resized while training a CNN model, but for microscopic images, resizing could decrease the magnification level. There is currently no consensus on the best magnification level, so we’ve chosen to isotropically resize the original images to a relatively small size, e.g., 1024×768 or 512×384. Each scaled image is then cropped to 224×224 patches with 50% overlap.
We chose a VGG-16 network to classify the 224×224 histology image patches in order to explore the scale and organization features of nuclei and the scale features of the overall structure. Because they do not have complicated high-level semantic information, a 16-layer structure suffices. But, for the sake of comparison, we’ve also used a VGG-19 network. To leverage contextual information from the cropped images, we added global context to the last convolutional layer of the VGG networks. Similar to ParseNet, the input images are passed to two independent branches, our VGG network and a global average pooling layer. The transformed output of the global pooling layer is unpooled to the same shape as that of the feature maps after the last convolutional layer of the VGG network and is then concatenated with the feature maps. These two feature maps are then fused by another 1×1 convolutional layer and then passed through three fully-connected (FC) layers for classification.
Because we can further group them into non-carcinoma and carcinoma, the classes have a tree organization (Figure 2), where normal and benign are leaves from the non-carcinoma node, and in situ and invasive are leaves from the carcinoma node.
Because of this structure, we chose to apply hierarchical loss instead of vanilla cross entropy loss. Hierarchical loss uses an ultrametric tree to calculate the amount of metric “winnings,” — failing to distinguish between carcinoma and non-carcinoma is penalized more than failing to distinguish between normal and benign or between in situ and invasive. The amount of winnings is calculated from the weighted sum of the estimated probability score of each node along the path from the first non-root node to the correct leaf. The probability score of each node is obtained by summing up the scores from its child nodes. These weights are shown in Figure 2. Finally, to calculate the loss (or negative winnings) we apply the negative logarithm used in computing cross entropy loss (Figure 3).
Our final choice of scaled size for the input images is 512x384 because it can maintain most of the nuclei structural information from the original whole image, while also keeping most of the information about tissue structural organization for the cropped patches. To support our heuristic choice of these model settings, we implemented a series of ablation studies by comparing our model to models with each of the following variations: one with deeper VGG-19, one using vanilla cross entropy loss, one without global image pooling, and one that resizes the images to 768x512. Our experimental results (Table 1) demonstrate that the performance of our proposed framework is the better than these alternatives — these results are outlined in detail in our paper.
Magnification is an important factor for analyzing microscopic images for diagnosis. The most informative magnification level is still debatable, so we’ve included two possible scales in our work for comparison. In future work, we plan to study the influence of other scales on the model’s performance.
Our work is a novel design for automatic classification of breast cancer histopathological images that achieves high accuracy. By providing a systematic analysis of influential factors that can affect the classification of histopathological images of other types of cancer, this work can be generalized and applied to the classification of cancers other than breast cancer.
As more and more CAD approaches for medical images are commercialized and turned into products, there is a stronger need for developing a more accurate CAD framework. With greater accuracy and availability, using histopathological images to aid in the diagnosis of cancer can become more prevalent in medical industries and, hopefully, enable more early diagnoses.
Read the full paper here: https://link.springer.com/chapter/10.1007/978-3-319-93000-8_84