Image recognition techniques on digital images of colon and stomach biopsies-converted
Pascale De Paepe
Departement of Pathological Anatomy AZ Sint-Jan Brugge – Oostende AV | Campus Brugge,Europe Ixor, cvba Schuttersvest 75 2800 Mechelen, Europe, E-mail: Pascale.DePaepe@azsintjan.be
[ft_below_content] =>By using Machine Learning (ML) techniques it's possible to acknowledge patterns in digital images and to classify these images supported their contents with high accuracy. Pattern recognition is one among the most parameters employed by pathologists within the analysis of biopsy material. We therefore focused our study on pattern recognition and not on object (cell) detection/ classification. The aim of our study is to create and train one algorithm which can pre-analyse digital images of various sorts of intestinal mucosa. Digital images from gastric (51) and colon (92) mucosal biopsies were labeled normal or abnormal. Images of gastric biopsies were labeled as abnormal when following histological features were present: increased number of inflammatory cells, interstitial oedema and differentiation abnormalities of the epithelial lining. Images of colon biopsies were labeled as abnormal when distortion of the glands, villous structures, differentiation abnormalities of the epithelial lining, increased number of inflammatory cells were found. All images showing no abnormalities were labeled as normal. With these data sets we trained different machine learning algorithms to classify the digital images. the simplest performing algorithm, a support vector machine classifier, achieved an accuracy of 94% on colon images and 75% on gastric images. This pilot study illustrates the likelihood to coach an algorithm on a limited data set in order that it classifies with acceptable accuracy. If the algorithm proves to be as successful on full scanned biopsies it'll be helpful as a pre-analysis tool in daily histopathological workload. 2017 was the breakthrough year of Whole Slide Imaging (WSI) devices. WSI devices are now ready to quickly turn glass slides into high resolution digital images. These digital images are often viewed, analysed and managed in ways physical glass slides can't . The practice of acquiring, managing, sharing and interpreting of pathology information during a digital environment, referred to as digital pathology, has received increased attention since the breakthrough of WSI. Digital pathology offers many new advantages. If analytic tools, machine learning and AI techniques are available and straightforward accessible to pre analyse slides more pathologists ti ill be persuaded to embrace these new technologies, referred to as computational pathology. The data set consists of digitale images taken from parts of whole slide scans of gastric mucosa and colon mucosa biopsies (jpg format, average size 400KB). All patient data was omitted, making the pictures fully anonymous. These digital images were taken at different magnifications 1x, 2x, 5x, 10x and 20x. Each digital image was labelled as gastric mucosa abnormal/normal or colon mucosa abnormal/normal. The series of gastric mucosa biopsies, data set A, included biopsies without abnormalities labelled as normal, and biopsies with inflammatory lesions. Images of the latter were labelled as abnormal if the subsequent features were present:
• Increased number of inflammatory cells
• Interstitial oedema
• Differentiation abnormalities of the epithelial lining
The series of colon mucosa biopsies, data set B, included biopsies without abnormalities, labelled as normal, and biopsies with inflammatory lesions, hyperplastic polyps, adenomatous and villous polyps and malignant lesions, labelled as abnormal. Images were labelled as abnormal if showing one or several of the following histological features:
• Presence of aberrant glands: distortion of the glands (dilatation, branching), presence of villous structures or both
• Differentiation abnormalities of the epithelial lining: (decreased numbers or absence of goblet cells, cellular atypia)
• Increased number of inflammatory cells
Because of the limited number of images we choose to use a pre-trained convolutional neural network[1] to extract features[2] of the images. We used the VGG16 model[3] with weights pre-trained on ImageNet data set[4] with the default input size for this model 224x224x3 (image height, image width, colour channels). We used RGB colour channels and did not convert to grayscale.
• The extracted features were fed to different models classifying the images into 2 categories: abnormal or normal.
• Using all the images at different magnifications did not lead to acceptable accuracy on the training set. Neither did training the models on the images at low magnifications (1x, 2x, 5x). Therefore these images were omitted from the data sets. The remaining data sets comprised images at magnification 10x and 20x. As such data set A contained 20 abnormal and 31 normal gastric mucosa images. Data set B contained 73 abnormal and 19 normal images of colon mucosa. To avoid information loss due to downsizing the images, the images were split into 6 tiles. This resulted for data set A, gastric mucosa, in 116 abnormal tiles and 185 normal tiles.
• In data set B we checked each tile. Some tiles, belonging to an image labelled abnormal, didn’t show any characteristics of abnormal tissue and were relabelled as normal. Some blank tiles were removed from the data sets as well as tiles more than 90% blank. This resulted for data set B, colon mucosa, in 407 abnormal tiles and 134 normal tiles. Both data sets were split into a train set (80%) and a test set (20%).
To train the different models each train set was enlarged 5 times with data augmentation: rotation, horizontal and vertical shift, zoom and flip. We choose these data augmentations because the models have to recognise patterns which are translation insensitive. The resulting ‘empty’ sections were filled with constant mode (background colour) and with ‘reflect mode’. Using ‘reflect’ mode resulted in slightly better performance on the train set. Because of the complexity of the patterns to be recognised, we decided to also extract features using the ResNet50 model, pre-trained on ImageNet. The ResNet50 model is a deeper network resulting in higher-dimensional features than the VGG16 features. We used data set B to make the comparison. The ResNet50 features were fed to the SVM classifier and resulted in a slight improvement: 91,7% accuracy on the training set and 94,9% on the test set.
Note: Joint Event on 33rd International Conference on Oncology Nursing and Cancer Care and 16th Asia Pacific Pathology Congress
September 17-18, 2018 Tokyo Japan
Pascale De Paepe
Departement of Pathological Anatomy AZ Sint-Jan Brugge – Oostende AV | Campus Brugge,Europe Ixor, cvba Schuttersvest 75 2800 Mechelen, Europe, E-mail: Pascale.DePaepe@azsintjan.be
Abstract
By using Machine Learning (ML) techniques it's possible to acknowledge patterns in digital images and to classify these images supported their contents with high accuracy. Pattern recognition is one among the most parameters employed by pathologists within the analysis of biopsy material. We therefore focused our study on pattern recognition and not on object (cell) detection/ classification. The aim of our study is to create and train one algorithm which can pre-analyse digital images of various sorts of intestinal mucosa. Digital images from gastric (51) and colon (92) mucosal biopsies were labeled normal or abnormal. Images of gastric biopsies were labeled as abnormal when following histological features were present: increased number of inflammatory cells, interstitial oedema and differentiation abnormalities of the epithelial lining. Images of colon biopsies were labeled as abnormal when distortion of the glands, villous structures, differentiation abnormalities of the epithelial lining, increased number of inflammatory cells were found. All images showing no abnormalities were labeled as normal. With these data sets we trained different machine learning algorithms to classify the digital images. the simplest performing algorithm, a support vector machine classifier, achieved an accuracy of 94% on colon images and 75% on gastric images. This pilot study illustrates the likelihood to coach an algorithm on a limited data set in order that it classifies with acceptable accuracy.
-By using Machine Learning (ML) techniques it's possible to acknowledge patterns in digital images and to classify these images supported their contents with high accuracy. Pattern recognition is one among the most parameters employed by pathologists within the analysis of biopsy material. We therefore focused our study on pattern recognition and not on object (cell) detection/ classification. The aim of our study is to create and train one algorithm which can pre-analyse digital images of various sorts of intestinal mucosa. Digital images from gastric (51) and colon (92) mucosal biopsies were labeled normal or abnormal. Images of gastric biopsies were labeled as abnormal when following histological features were present: increased number of inflammatory cells, interstitial oedema and differentiation abnormalities of the epithelial lining. Images of colon biopsies were labeled as abnormal when distortion of the glands, villous structures, differentiation abnormalities of the epithelial lining, increased number of inflammatory cells were found. All images showing no abnormalities were labeled as normal. With these data sets we trained different machine learning algorithms to classify the digital images. the simplest performing algorithm, a support vector machine classifier, achieved an accuracy of 94% on colon images and 75% on gastric images. This pilot study illustrates the likelihood to coach an algorithm on a limited data set in order that it classifies with acceptable accuracy. If the algorithm proves to be as successful on full scanned biopsies it'll be helpful as a pre-analysis tool in daily histopathological workload. 2017 was the breakthrough year of Whole Slide Imaging (WSI) devices. WSI devices are now ready to quickly turn glass slides into high resolution digital images. These digital images are often viewed, analysed and managed in ways physical glass slides can't . The practice of acquiring, managing, sharing and interpreting of pathology information during a digital environment, referred to as digital pathology, has received increased attention since the breakthrough of WSI. Digital pathology offers many new advantages. If analytic tools, machine learning and AI techniques are available and straightforward accessible to pre analyse slides more pathologists ti ill be persuaded to embrace these new technologies, referred to as computational pathology. The data set consists of digitale images taken from parts of whole slide scans of gastric mucosa and colon mucosa biopsies (jpg format, average size 400KB). All patient data was omitted, making the pictures fully anonymous. These digital images were taken at different magnifications 1x, 2x, 5x, 10x and 20x. Each digital image was labelled as gastric mucosa abnormal/normal or colon mucosa abnormal/normal. The series of gastric mucosa biopsies, data set A, included biopsies without abnormalities labelled as normal, and biopsies with inflammatory lesions. Images of the latter were labelled as abnormal if the subsequent features were present:
• Increased number of inflammatory cells
• Interstitial oedema
• Differentiation abnormalities of the epithelial lining
The series of colon mucosa biopsies, data set B, included biopsies without abnormalities, labelled as normal, and biopsies with inflammatory lesions, hyperplastic polyps, adenomatous and villous polyps and malignant lesions, labelled as abnormal. Images were labelled as abnormal if showing one or several of the following histological features:
• Presence of aberrant glands: distortion of the glands (dilatation, branching), presence of villous structures or both
• Differentiation abnormalities of the epithelial lining: (decreased numbers or absence of goblet cells, cellular atypia)
• Increased number of inflammatory cells
Because of the limited number of images we choose to use a pre-trained convolutional neural network[1] to extract features[2] of the images. We used the VGG16 model[3] with weights pre-trained on ImageNet data set[4] with the default input size for this model 224x224x3 (image height, image width, colour channels). We used RGB colour channels and did not convert to grayscale.
• The extracted features were fed to different models classifying the images into 2 categories: abnormal or normal.
• Using all the images at different magnifications did not lead to acceptable accuracy on the training set. Neither did training the models on the images at low magnifications (1x, 2x, 5x). Therefore these images were omitted from the data sets. The remaining data sets comprised images at magnification 10x and 20x. As such data set A contained 20 abnormal and 31 normal gastric mucosa images. Data set B contained 73 abnormal and 19 normal images of colon mucosa. To avoid information loss due to downsizing the images, the images were split into 6 tiles. This resulted for data set A, gastric mucosa, in 116 abnormal tiles and 185 normal tiles.
• In data set B we checked each tile. Some tiles, belonging to an image labelled abnormal, didn’t show any characteristics of abnormal tissue and were relabelled as normal. Some blank tiles were removed from the data sets as well as tiles more than 90% blank. This resulted for data set B, colon mucosa, in 407 abnormal tiles and 134 normal tiles. Both data sets were split into a train set (80%) and a test set (20%).
To train the different models each train set was enlarged 5 times with data augmentation: rotation, horizontal and vertical shift, zoom and flip. We choose these data augmentations because the models have to recognise patterns which are translation insensitive. The resulting ‘empty’ sections were filled with constant mode (background colour) and with ‘reflect mode’. Using ‘reflect’ mode resulted in slightly better performance on the train set. Because of the complexity of the patterns to be recognised, we decided to also extract features using the ResNet50 model, pre-trained on ImageNet. The ResNet50 model is a deeper network resulting in higher-dimensional features than the VGG16 features. We used data set B to make the comparison. The ResNet50 features were fed to the SVM classifier and resulted in a slight improvement: 91,7% accuracy on the training set and 94,9% on the test set.
Note: Joint Event on 33rd International Conference on Oncology Nursing and Cancer Care and 16th Asia Pacific Pathology Congress
September 17-18, 2018 Tokyo Japan