APP下载

Nuclear atypia grading in breast cancer histopathological images based on CNN feature extraction and LSTM classification

2022-01-12SanazKarimiJafarbiglooHabibollahDanyali

Sanaz Karimi Jafarbigloo| Habibollah Danyali

1Department of Electrical and Electronics Engineering,Shiraz University of Technology,Shiraz,Iran

2Department of Electrical Engineering-Communication System,Shiraz University of Technology,Shiraz,Iran

Abstract Early diagnosis of breast cancer, the most common disease among women around the world,increases the chance of treatment and is highly important.Nuclear atypia grading in histopathological images plays an important role in the final diagnosis and grading of breast cancer.Grading images by pathologists is a time consuming and subjective task.Therefore,the existence of a computer-aided system for nuclear atypia grading is very useful and necessary.In this study,two automatic systems for grading nuclear atypia in breast cancer histopathological images based on deep learning methods are proposed. A patch-based approach is introduced due to the large size of the histopathological images and restriction of the training data.In the proposed system I,the most important patches in the image are detected first and then a three-hidden-layer convolutional neural network (CNN) is designed and trained for feature extraction and to classify the patches individually. The proposed system II is based on a combination of the CNN for feature extraction and a twolayer Long short-term memory(LSTM)network for classification.The LSTM network is utilised to consider all patches of an image simultaneously for image grading. The simulation results show the efficiency of the proposed systems for automatic nuclear atypia grading and outperform the current related studies in the literature.

K E Y W O R D S breast cancer, CNN, histopathological image, LSTM networks, nuclear atypia

1 | INTRODUCTION

Breast cancer is caused due to an outgrowth of abnormal cells in the breast. This disease is the second leading cause of death among women around the world[1]and is a serious problem for most human societies.Early diagnosis of this disease increases the chance of efficient treatment.After performing mammography and ultrasound imaging,if abnormal tissues are found,a biopsy examination is required.In this test,a small sample of the breast tissue is placed under a microscope to acquire histopathological images for further investigation. Histopathology refers to the visual examination of tissue[2].Histopathological images are available in digital format which is analysed to detect the type and grade of cancer[3].Various magnifications of histological images are available such as 10X,20X and 40X.

Traditional methods for grading breast cancer are manually performed by pathologists. Pathologists examine histological slides under a microscope in order to assign a grade to the texture. Two pathologists individually assign a grade to each texture and if the grades are the same, then the grade is considered as the final grade. However, if they are not the same,a new pathologist assigns a grade and finally,the grade is dedicated based on majority voting. The pathologist examines a glass slide containing a H&E-stained tissue section.This area of the tissue that has to be examined is large[4].Therefore,the examination process is a tedious and time-consuming task [5].

Nowadays, automatic detection and grading systems extensively decrease the processing time and efficiently help pathologists.A special grading system called the Scarf-Bloom-Richardson system is used for breast cancer grading [6]. This system also referred to as the Nottingham's grading system consists of three components: mitotic cell divisions, tubule formation and nuclear atypia (also referred to as nuclear pleomorphism). In mitotic cell divisions, the number of dividing cells is considered and accordingly a grade between 1 and 3 is assigned to the texture. In tubule formation, the tubular structures are evaluated. The more regular structures show the lower grade of cancer. In nuclear atypia, the cell nuclei are assessed and due to the deviation in size,shape and morphology of cell nuclei from the nucleus of normal cells, a grade between 1 and 3 is given to the image [7]. The greater deviation of the cell nucleus reveals the higher grade of cancer[8]. Grading based on nuclear atypia depends on various parameters including shape, variations in size of nuclei and nucleoli,density of chromatin,thickness of nuclear membrane,regularity of nuclear contour and appearance of tumour cells compared to normal cells. Due to many criteria that pathologists need to consider during grading based on nuclear atypia,the assessment is challenging and depends on inter-pathologist discrepancies.A salient example would be assigning a grade to a nucleus with mixed features(for instance,small but irregular shape of cell or vice versa).Although grading based on nuclear atypia depends on many parameters,nuclear atypia in standard textbooks of pathology has been defined as a degree of variability in size and shape (which are the most important parameters)and a histologic characteristic of countless malignant lesions [9]. Despite the importance of nuclear atypia for the daily cytological diagnoses on smears and fine-needle aspiration biopsies (FNA), it is an irrefutable fact that the light microscopy appreciation of nuclear atypia is currently indirect and it needs to be stained (e.g., H&E stain) [10]. After staining, in order to examine nuclear atypia, a microscope at ×20 or ×40 magnification is used [11]. Two of the most important microscopes used for scanning whole slide images (WSI) are Aperio and Hamamatsu which are known as A and H scanners,respectively. These microscopes are utilised to scan tissue and produce high-resolution histopathological images. The pathologist examines a region with a specific area in digital WSIs (which equals to 10 high-power fields) and assigns a grade between 1 and 3 to each nucleus based on variation in the shape and size of the nucleus[8].The following equation is based on the averaging of grades assigned to cells in ten selected high-power fields of the image:

where Siand NPS show the grade of each cell and the grade of texture, respectively. f(size) is a function of the cell's size and g(shape)is a function of the cell's shape.T2and T3are the parameters derived from training data and NPjis the number of cells in the jth high-power field of the image.i and j indicate the number of cells in eachhigh-power field and the number of highpower fields in the texture,respectively.Each texture contains 10 high-power fields and each of them includes NPjcells.To obtain the overall grade of the texture,it is needed to evaluate each cell in each high-power field(e.g.,[1])and ultimately the grade of the texture is obtained based on the grade of all cells(e.g.,[2]).The higher grade indicates more progression of cancer.

In recent years,a vast variety of works have been performed on mitotic detection [12–16] and segmentation of tubule formation[17–21].Despite the important role of nuclear atypia in grading different cancers,especially breast cancer,little attention has been paid to it in the current literatures.In some studies on nuclear atypia,only nuclei have been segmented and classification has not been performed [22, 23]. In [22] only critical cell nuclei have been extracted.For segmenting the cell nuclei,first,the centre of the nuclei have been detected by morphological operations and then the boundary of the nuclei have been segmented by a level set algorithm. Khoshdeli et al. [23] have segmented cell nuclei using a Convolutional Neural Network(CNN). The studies that have classified histopathological images based on nuclear atypia can be divided into two categories.In the first category, the cell nuclei have been detected and features have been extracted from the segmented areas and finally, the images have been classified based on machine learning methods[24,25].Dalle et al.[24]have classified images into grades 2 and 3 and Cosatto et al.[25]have classified images into two classes: benign and malignant. Lu et al. [26] has segmented cell nuclei and extracted features from nuclei, a histogram of the features has been computed and then the images have been classified to grade from 1 to 3.In the second category,the cell nuclei have not been segmented and the features have been extracted directly from the texture and eventually,images have been classified[27,28].In[27]some features have been extracted from the sub-region of the image.Region Covariance (RC) has been calculated for the features of each sub-region and then all the RCs have been combined by geodesic mean. Eventually, the images have been classified to grade from 1 to 3 by using a k-nearest neighbour classifier. In[29] textural features have been extracted and then an SVM classifier has been used for classifying the images into two categories:benign and malignant.In another work,Huang et al.[30]have extracted some features from Region of interest(ROI)and then a PCA has been used to reduce the dimension of the textual feature vector. Finally, the images have been classified into two classes(benign and malignant).In[28,31–34],images have been classified into two categories(benign and malignant)based on the deep learning methods.In[31]images have been classified by using the VGG16 and in[32]a proposed CNN has been utilised to classify images. In [33] a breast cancer histopathology image classification by assembling multiple compact Convolutional Neural Networks(CNNs)has been proposed.In the first method of Abdolahi et al.[34]a simple convolutional neural network(named baseline model)has been used and in the second approach a pre-trained VGG-16 CNN model via feature extraction and fine-tuning for the classification of breast pathology images has been utilised.Lin et al.[28]have used a CNN to classify breast cancer histopathological images to two classes(benign and malignant groups) and a uniform experimental design (UED) is utilised to optimise the CNN parameters. In[35], an active learning technique has been adopted for cancer grading in the batch-mode framework and the Riemannian distance metric has been used for sample selection in the active learning framework.

In this study,we utilise deep learning methods and propose two automatic systems for classifying and grading nuclear atypia in breast cancer histopathological images. Extracting proper features plays the main role in image classification. Deep learning methods have the ability to extract suitable and highlevel features from the image automatically. According to this ability and the considerable success of deep learning methods in pattern recognition,we construct the proposed system based on deep learning.Due to the large size of histopathological images and restriction of the training data,instead of reducing the image size which is mainly used in the current literature,we introduce a patch-based approach in which the input images are divided into patches and the CNN is trained by the patches.Although pathbased processing solves these problems and speeds up the process,it poses a problem in final nuclear atypia grading.The proposed system I detects the more important patches in the image and uses these patches independently for training the CNN.The proposed system II is based on a combination of a CNN for feature extraction and a two-layer Long short-term memory(LSTM)network for classification.The LSTM network facilitates to consider the relationship of all patches simultaneously for image grading.

The remainder of this paper is organised as follows: Section 2 describes the preprocessing and image preparation method.In Section 3 the proposed systems are explained.The experimental results are presented in Section 4 and finally,Section 5 concludes the paper.

2 | PRE‐PROCESSING AND IMAGE PREPARATION

2.1 | Image normalisation

The purpose of colour normalisation is to increase colour constancy in the images. Natural tissue under the microscope has low contrast. Therefore, an H&E stain is used to increase the contrast. The haematoxylin and eosin (H&E) stain is the most commonly used stain in clinical and research laboratories[36].The main purpose of staining is to highlight the structures of cytoplasmic and nuclear components in cells and tissues[37]. Haematoxylin and eosin are injected into the tissue for staining histopathological images. During the staining, haematoxylin affects the cell nucleus and makes it violet and eosin acts on cytoplasm and turns it into pink [38]. The pink or purple colour intensity in different images is not the same.This difference of colour intensity makes a problem for the system because in one image,the pink colour of the cytoplasm is close to the purple colour of the cell nucleus in the other image.Therefore, in this study, we use a normalisation algorithm of the literature [39]to have similar colour intensity for the same colour in all images. This normalisation algorithm consists of four steps: (1) Estimating the stain matrix (2) Computing colour deconvolution (3) Non-linear mapping of channel statistics(4)Reconstruction(see Figure 1).A brief explanation of this algorithm is in the following.

An image with desirable staining is considered as a target image. The purpose is to bring the colour intensity of source images X closer to the colour intensity of target image ζ so that pixels of corresponding stain types become more similar to the target images after normalization than before. First, the stain matrix is estimated for both source and target images(Sxand Sζ,respectively)by using a global SCD.The stain matrix is used to transfer images from RGB colour space to a new colour space with three channels [40]. Consider I =(C,ψ), which I is an image,C is a two-dimensional set of pixels and ψ shows the intensity of red, green and blue to each pixel. ^ψ indicates the amount of each stain in new colour space (s1,s2and s3), the relationship between the two colour spaces is defined as follows:

FIGURE 1 Stain normalisation algorithm: (1) Estimate stain matrix (2) Compute colour deconvolution (3) Non-linear mapping of channel statistics (4)Reconstruction

where D is a colour deconvolution matrix (CD) which is obtained by inversing the stain matrix(S)and φ is optical density(OD) space [40]. OD intensities in each deconvolved channel are divided into three classes(stained,background,other).For each class,three statistics(mean,5thpercentile,95thpercentile)are computed and then the statistics of the source image channel ^xnis mapped to the corresponding statistics of the target image channel in order to normalize H&E staining of the source image. White and black colours in this process are unchanged. A B-spline is used for mapping statistics of the source image to the statistics of the target image [39]. The spline parameters are computed from the nine input-output pairs of values (3 class × 3 statistics) and identity pairs to ensure white and black pixels have not changed. Therefore,each of the stain channels of the target image is normalised separately and finally, they are recombined as follows:

2.2 | Image augmentation and patching

In this section,we want to solve the problems of the large size of input images(see more details about input data in Section 4)and the limitation of training data. The images need to be prepared before entering the network. The number of input data is restricted and the CNN needs an enormous number of inputs for the training part (a small number of data causes over-fitting). Prior to increasing the number of data, the accuracy of the test data did not exceed 60%that it was expected due to the low number of input data. Therefore, after normalising the images (which explained in Section 2.1), image augmentation is performed to solve the problem of image limitation. We rotate the images with angles 180° and the original images are then flipped and added to the training set.

After image augmentation,images are patched due to their large size. The large size of the input images increases the network's training time and requires a system with high processing power to run the programme.To find the suitable size for patches,different patch sizes have been examined and size 64×64(see Section 4.2)pixels have been selected.In addition,to solve the limitation of the input data, images are patched with overlap to increase the number of input data and preserve the frontier's information. By using a patch-based approach,the problem of the large size of the input data was solved,the network's processing time is decreased and moreover, the image limitation problem was also solved. Patches are used as the input of the network. The proposed network for grading the images is explained in the following.

3 | THE PROPOSED SYSTEMS

A convolutional neural network(CNN),as shown in Figure 3,consists of two parts: a feature extraction part and a classification part. In CNN, various filters of size n × n are used to extract features from the input image. So, the dependency between the row and column of pixels is efficiently preserved[41]. This property of the CNN has made it suitable for extracting features from an image.Due to this property,in this study, a CNN-based feature extraction scheme is utilised to extract proper features from histopathological images.

FIGURE 2 Image normalisation. (a) An original RGB image (source image). (b) The target image. (c) The source image after normalising

FIGURE 3 A CNN structure that is made of two parts: feature extraction and classification

FIGURE 4 The architecture of the proposed CNN (Output is score 1 to 3) (a) Overall structure (b) Structure of the Hidden layer

Figure 4 shows our proposed CNN structure.A variety of choices is available for the CNN structure such as the convolution size, kind of activation function and number of hidden layers [42]. Each structure is examined to find out better performance. In the proposed structure, the feature extraction part consists of three hidden layers; each one contains a convolutional, normalisation, RELU and pooling layer.Three convolutional layers are used to extract adequate,appropriate and high-level features from the input image patches. Each convolutional layer is followed by a local response normalisation layer, RELU layer as an activation function and a 2×2 max-pooling layer. This structure has been selected because it has been shown that a convolutional layer followed by a RELU and a 2×2 max-pooling layer provides better performance [23]. The normalisation layer is used to make the same distribution for input and output data in each hidden layer. In the first convolutional layer, 16 filters of size 3×3 pixels are applied to the input patches. The second convolutional layer contains 32 filters of size 3×3 pixels and the last one contains 64 filters of size 3×3 pixels. The classification part consists of two fully connected layers, RELU,drop out and soft max.The first fully connected layer contains 64 neurons followed by a RELU and a dropout layer with a dropout ratio of 0.5.

The dropout technique is used to refrain from over-fitting.The second fully connected layer contains three neurons.Finally, a soft-max function calculates the probability of each image patch that belongs to each class.

Different numbers of hidden layers were tested to find the best one.By increasing the number of hidden layers from one to two or two to three,the accuracy of the network had considerable growth. But increasing the number of hidden layers from three to four or more did not increase the accuracy of the network significantly,and it just slowed down the network.

After normalising the input images in the preprocessing step, they are divided into two categories (training and test data).80%of the total images are considered for training data and the remaining is considered for test data. Training data is divided into patches and the grade of the image is assigned to all the patches. These patches are considered as inputs of the CNN. Furthermore, in the evaluation part, a grade is given to each patch.

To find out the grade of the whole image using the grade of the patches, two methods have been proposed. The first method uses majority voting. In this method, the number of patches in each grade is counted and the grade with the maximum number is assigned to the image. In the second method, if |Gi-Gj| is less than T and Gj<Githen Gjis considered as a grade of the image.Giand Gjare the number of patches in grade i and j,respectively.i={1,2},j ={2,3},T is a threshold and i ≠j.The threshold value depends on the number of patches in each image. If the difference between higher grade and the other grades is less than the threshold and the higher grade does not have the maximum number of patches, this method causes to assign the higher grade to the image. By using this method, the importance of the higher grade is considered and our grading system becomes close to the pathologist's grading method.

Assigning the grade of an image to all of its patches and training the network with all patches independently causes a problem. This is due to this fact that in practice when a pathologist gives a grade to the image,it does not mean that all parts of the image have the same grade.For instance,if only a part of an image shows a higher grade,the pathologist assigns this higher grade to the whole image due to the importance of the higher grade. To solve this problem, two systems are proposed which are explained in the following subsections.

3.1 | System I (finding critical nuclei + CNN‐based system)

The structure of the proposed System I is depicted in Figure 5.The aim of this system is to find the patches which have higher grade and only use these patches for training the CNN to overcome the patch-based grading problem. To achieve this aim, in the proposed system I, we try to find out patches that contain critical nuclei.

The critical nuclei in the image are those with the highest grading score. The process is performed on each patch and if the patch has a critical nucleus, then it is regarded as the training data and the grade of the image is assigned to it.Figure 6 shows a block diagram of the proposed critical nuclei finding method.

A bilateral filter is applied to the input patch.The bilateral filter is a non-linear filter which reduces noise and preserves edge [43]. In this filter, the intensity value of each pixel is replaced by a weighted average of nearby pixels as follows:

where Ifilteredis the filtered image, I is the original image, q is the coordinates of the current pixel, Ω is the window centred in q,Gσris the range kernel for smoothing intensity differences and Gσsis the spatial kernel for smoothing coordinate differences and Wpis assigned using the spatial closeness.

FIGURE 6 Block diagram of the finding critical nuclei method

FIGURE 5 Panels(a and b)are the structure of system I(finding critical nuclei+CNN-based system).Gi and Gj are the number of patches in grade i and j, respectively. i = {1, 2}, j = {2, 3} and T is a threshold

Gamma correction is a non-linear operation which impacts on the brightness intensity of an image [44]. Where r is the input value,Y is the output value;C and γ are constant values.

Gamma correction function and thresholding are then applied to the blue channel of the filtered patch.According to the blue or purple colour of the nuclei, the blue channel is selected. Finally, morphological operations are applied to detect critical nuclei.

The size of the critical nucleus is bigger than that of a normal one and its boundary has been deformed. In addition,there is a hole in most of the critical nuclei and it seems that abnormal nuclei have been combined with each other. The closing operation is applied to fill the hole in the abnormal nucleus and then dilation operation is performed in order to combine the critical nuclei which have overlapped with each other.Finally,an erosion operation is applied in order to obtain the seed point of the abnormal nucleus. The size of the structural element is considered such that to find the highest grade nuclei in the image.For example,in the image with grade 2, we want to remove the patches with grade 1 and only keep the patches with grade 2. Figure 7 shows a visual example of detecting critical nuclei.

The reason for using a bilateral filter is to detect the nuclei boundary and the aim of morphological operations is to eliminate the normal nuclei and retain critical ones. As mentioned before, Colours used for staining textures are generally H&E,in which H dyes the cell nuclei blue-purple and E makes cytoplasm pink. Therefore, blue channel is more appropriate to detect cell nuclei. Due to this issue, gamma correction function is applied on the blue channel of the image to extract the cell nuclei.

3.2 | System II (CNN feature extraction + LSTM‐based system)

The aim of this system is to solve the patch-based grading problem by regarding all patches of the image simultaneously.To achieve this goal, a network with memory is required to enter all patches as a time series.Therefore,a LSTM is a proper candidate for the classification part. Figure 8 represents the structure of the proposed system II. After image preparation,and CNN-based feature extraction, similar to the system I, is considered to extract features from the patches.Then,a dataset is created for each patch by using the extracted features. This dataset is used as an input of the LSTM network.

LSTM network is a subset of Recurrent Neural Networks(RNN).RNN has been actually designed to process time-series signals. In an ordinary neural network, all inputs and outputs are independent of each other.RNN has a kind of memory to use previous information, but it does not mean that the RNN records the information in a long time sequence; it records only a few previous steps information. The LSTM network was developed to solve the short term memory problem of RNN [45]. The major change is to replace the middle layer of the RNN with a block called an LSTM block.The main feature of the LSTM is the ability to learn long-term dependency that was not possible by the RNN [46]. Figure 9 shows the structure of an LSTM network.

FIGURE 8 Structure of system II, a combination of CNN and Long short-term memory (LSTM) network

FIGURE 7 Finding critical nuclei (a) Original image (b) Detecting critical nuclei in the image

Inputs x are entered the network as a time series(xt-1,xt,xt+1,...), the output of the first layer of the LSTM is(ht-1,ht,ht+1,...)and the specified structure within the LSTM block forms the network's memory which contains information of previous inputs.This state is called as a memory of the network and its information is updated occasionally. The network uses this memory to make a relationship between inputs.

For each patch,a dataset is created;the dataset consists of 64 features for each patch, coordinates of each patch in the image, the name and the grade of the original image that the patches belong to, which is shown in Figure 10. The created dataset reduces the volume of input data and increase the speed of the network.

The dataset of all patches of the input image is given to the network as a time series.The LSTM network is trained by this dataset and learns the relationship between all patches. In the LSTM-based classification part, all the patches of the image are regarded together and finally, a grade is assigned to the whole image.The system 2 algorithm is given in Algorithms 1 and 2.

?

?

4 | EXPERIMENTAL RESULTS AND DISCUSSION

In this study, image datasets were obtained from MITOSATYPIA-14 [47]. These images were captured under 20X magnifications using the Aperio Scan scope XT digital scanner. Each image has a size of 1539×1376 pixels with a resolution of 0.2455 μm/pixel. By using the augmentation and patch-based approach, the number of input data in each grade has reached more than 100,000. The 5-fold cross validation has been used for assessing the network.One out of five has been used for test data and the remainder for training data. The total number of training data is 465,000. The network has been trained on 80% of the training data and validated on the remaining data. For training the network, same number of training data has been used for each grade (124,000 training data for each grade).

4.1 | Evaluation criteria

For evaluation, the results obtained by the proposed systems are compared with the results obtained by the pathologists.The following metrics are used: Average accuracy, Precision,Specificity and F-Score.

Average accuracy (Cavg) of the whole test images is calculated as follows:

FIGURE 9 Structure of Long short-term memory (LSTM) networks

FIGURE 1 0 Schematic of creating dataset for each patch

where, TP=True Positive TN =True Negative FP =False Positive FN =False Negative.

4.2 | Classification results

According to the large size of the images and the limitation of the training data, images are first divided into patches and patchbased processing is applied. The patch-based approach decreases the speed of the run-time of the system. To select the proper size for patches,an evaluation wasfirst performed.In this evaluation,five different patch sizes,64×64 pixels,128×128 pixels 227×227 pixels,344×344 pixels and 512×512 pixels were assessed and then the overall accuracy of the system was calculated.Table 1 shows the obtained results.

The results in Table 1 reveal that the patches with size 64×64 pixels are more appropriate than others.Smaller sizes than 64×64 pixels were not used since in these cases,most of the cell nuclei cannot stay entirely in a patch.

Table 2 shows the overall classification accuracy for the proposed system I and II.As mentioned in Section 3.1,in the proposed system I, we tried to solve the patch-based problem by finding the part of the image with a higher grade(ROI).The purpose of this system was to detect the patches which belongto ROI to only use them as training data and assign the grade of the image to these patches.Figure 11 shows a selected ROI in an image.

TABLE 1 Overall accuracy results for different patch sizes

TABLE 2 Overall accuracy of the proposed system I and II

Examining a large number of the patches to detect the patches with critical nuclei was very time consuming. In addition, since the boundary of abnormal nuclei is deformed and the colour of nuclei and cytoplasm is combined, finding the critical nuclei is a challenging task. This fact affected the overall accuracy of the proposed system because patches with the close grade (e.g., grades 1 and 2 or grades 2 and 3) could not be distinguished well.

A combination of two networks, CNN and LSTM, in the proposed system II provided better performance as shown in Table 2. In this, the patching problem was solved by entering all patches as a time series into the LSTM network.Therefore,the relationship between the patches was taken into account;and consequently, the network was trained better. Furthermore, in the classification part, all patches of the image were assessed and then a grade was assigned to the image.

In Table 3 the results of each grade are computed separately.

The results of our proposed systems have been compared with the results of some related methods in Table 4. In [26]first, cell nuclei have been segmented and then some features have been extracted from the segmented areas and finally the SVM classifier has been used to classify the images.In[27]cell nuclei have not been segmented and some features have been extracted from sub-region of the image and eventually,a KNN classifier has been utilised for classification. In [31, 35] images have been classified into two categories(benign and malignant)based on the deep learning methods. In [31] the VGG16 has been used to classify the images,and in[32]images have been classified by using the proposed CNN. In [33] image classification has been performed by assembling multiple compact Convolutional Neural Networks.Abdolahi et al.[34]have been used a pre-trained VGG-16 CNN model via feature extraction and fine-tuning for the classification of breast pathology images. Lin et al. [28] have used a CNN to classify breast cancer histopathological images to two classes (benign and malignant groups) and a uniform experimental design (UED) is utilised to optimise the CNN parameters. In [35] active learning technique has been adopted for cancer grading in the batchmode framework and the Riemannian distance metric has been used for sample selection in the active learning framework.

FIGURE 1 1 A selected Region of interest(ROI)with higher grade in an image in the proposed system I

TABLE 3 Performance evaluation of classifying the images(System II)

In Table 5,system II has been compared with two methods based on deep learning.

Both CNN and LSTM networks have been trained by our data set and parameters of both networks have been changed frequently to find the appropriate ones.Therefore,none of the utilised network is transfer learning. The networks have been tested with different parameters to obtain minimum loss and maximum accuracy. Network parameters have been obtained by numerous tests and the heuristic technique.Training of the network continued until the loss value in the validation set no longer improved.This model trained for 100 epochs and batch sizes of 256.The Adam optimiser used for model optimisation and categorical cross-entropy is considered as the loss function. On both networks (CNN and LSTM), the accuracy of training the networks reached 100%and the loss value became less than 0.01, as we can see in Figure 12.

TABLE 4 Comparison of the accuracy of the system II with other methods

TABLE 5 Performance comparison of the of the proposed system II with other deep learning based methods

This article compared TO articles with a similar purpose,data set and evaluation criteria.

Purpose: The purpose of this article was to propose an automated system for classification of breast cancer histopathological images and all the compared works had a similar purpose.

Dataset:To classify breast cancer,there are different kinds of images to be evaluated. But this article only compared the articles that worked on histopathological images for a fairer comparison.

Evaluation criteria: Same evaluation criteria have been used in this work and all the other compared works.

5 | CONCLUSIONS

In this study,two automatic systems for grading nuclear atypia in breast cancer histopathological images based on deep learning methods are proposed.Despite the key role of nuclear atypia in grading different cancers,especially breast cancer,little attention has been paid to it in the current literatures.Also,due to the limitation of input data, classification based on deep learning methods has become a challenging issue.In this study,we solved this problem using a patch-based approach.Another challenge associated with this issue is that not all nuclei in an image have the same grade and the grade of an image is assigned based on the whole or the more important part of the image.Regarding this issue,it is necessary to consider all or the more important part of the image for evaluation.According to the large size of the images, it is not possible to process the whole image directly. To serve this purpose, two systems are proposed in this study.Moreover,proposing a method to detect critical nuclei (system 1), combination of two deep learning networks (CNN + LSTM) and use of automatically extracted features in the LSTM network(system 2)are the other novelties of this work.The significance of this work is that the mentioned problems (data limitation, large size of images, evaluation the whole or a specific part of the image) are challenges in many medical image processing works.More details about the works done in this study are as follows.

FIGURE 1 2 Accuracy and loss value of the training part, (a) Top curve shows the accuracy and down curve shows the loss value of training the CNN(b) Top curve shows the accuracy and down curve shows the loss value of training the Long short-term memory (LSTM) network

First of all, a normalisation process was applied to all the images in order to make similar colour intensity.Then,a patchbased approach was introduced due to the large size of the histopathological images and restriction of the training data.To overcome the patch-based problem, two systems were proposed. In the proposed system I, the most important patches in the image were detected and only these patches were used as input of the network.Then,a three-hidden-layer convolutional neural network (CNN) was designed and trained for feature extraction and to classify the patches individually. Finally, a grade was assigned to the image by using the grade of the patches. The proposed system II was based on a combination of the CNN for feature extraction and a two-layer LSTM network for classification. The LSTM network was utilised to consider all patches of an image simultaneously for image grading. The simulation results showed the efficiency of the proposed systems for automatic nuclear atypia grading and they outperformed the current related studies in the literature.Given that in the nuclear atypia criterion,the cell nuclei are examined and due to their deviation,a grade is assigned to the texture, if the nuclei are segmented first and then the features are extracted from the segmented area, the system would be more accurate. In the data set we used, only the grade of images was available and data set for nuclei segmentation was not available;hence,nuclei segmentation was not performed in this study. Therefore, nuclei segmentation to increase the performance of the network can be done in future work. Also,finding a better way to pre-process images can help improve system performance. We leave the further investigation of obtaining a more accurate network for future work. However,we proved that our proposed system could already identify the grade of image well without the segmenting nuclei.

DATA AVAILABILITY STATEMENT

The data that support the findings of this study are openly available in https://grand-challenge.org/challenge, MITOSATYPIA-14.

ORCID

Sanaz Karimi Jafarbigloo https://orcid.org/0000-0002-6632-6121